--- eip: XXXX title: Witness gas cost reform for Verkle trees author: Vitalik Buterin (@vbuterin), [TBD] discussions-to: YYYY status: Draft type: Standards Track category: Core created: 2021-ZZ-WW --- ## Simple Summary Redesign how witness gas costs work in a more principled way that covers accounts, storage slots and contract code and the witness lengths of each one after a switch to Verkle trees. ## Motivation There is a strong desire to allow **stateless verification** of blocks. A client should be able to verify the correctness of any individual block without any extra information except for a small file that any state-holding node can generate, called a **witness**, that contains the portion of the state accessed by the block along with proofs of correctness. Stateless verification has important benefits including reducing disk-space requirements, allowing semi-light "trust but verify if alerted" clients, and supporting sharding setups where clients frequently jump between shards. Stateless verification is not viable at present, but the introduction of Verkle trees will make it viable. However, for this to work we need to adjust gas costs so that the gas cost of an operation corresponds to the witness size required for that operation. Operations that require witness sizes include: 1. Account reads (mostly calls but also `EXTCODEHASH` and similar operations) 2. `SLOAD` and `SSTORE` 3. Contract code execution This EIP introduces a cleaner and simpler gas cost schedule that directly charges for accessing a subtree and accessing an element within the subtree. For (1) and (2), this EIP does not greatly change costs and is largely a simplification, as [EIP 2929](./eip-2929.md) already raised gas costs for those operations to a sufficiently high level. For (3), however, it does add substantial gas cost increases, as contract code access is not properly charged for without this EIP. ## Specification ### Parameters | Constant | Value | | - | - | | `CHUNK_SIZE` | 31 | | `WITNESS_BRANCH_COST` | 1900 | | `WITNESS_CHUNK_COST` | 200 | | `ACCOUNT_HEADER_SUBKEY` | `0x01` | | `ACCOUNT_ACCESS_MANDATORY_LEAF_KEYS` | `(0, 1, 4)` | | `BALANCE_LEAF_KEY` | `2` | | `CODEHASH_LEAF_KEY` | `3` | | `CODE_LEAF_OFFSET` | `128` | | `ACCESS_LIST_MAX_ELEMENTS` | `2**24` | | `ACCESS_LIST_DISCOUNT_NUMERATOR` | 7 | | `ACCESS_LIST_DISCOUNT_DENOMINATOR` | 8 | ### Changes When executing a transaction, maintain two sets: * `accessed_subtrees: Set[Tuple[address, int]]` * `accessed_leaves: Set[Tuple[address, int, int]]` When an **access event** of `(address, sub_key, leaf_key)` is made, perform the following checks: * If `(address, sub_key)` is not in `accessed_subtrees`, charge `WITNESS_BRANCH_COST` gas and add that tuple to `accessed_subtrees`. * If `leaf_key is not None` and `(address, sub_key, leaf_key)` is not in `accessed_leaves`, charge `WITNESS_CHUNK_COST` gas and add it to `accessed_leaves` ### Access events for accounts When an `address` is the target of a `CALL`, `CALLCODE`, `DELEGATECALL`, `SELFDESTRUCT`, `EXTCODESIZE`, or `EXTCODECOPY` opcode, process an access event of the following form for each `key` in `ACCOUNT_ACCESS_MANDATORY_LEAF_KEYS`: ```python (address, ACCOUNT_HEADER_SUBKEY, key) ``` If the call is _value-bearing_ (ie. it transfers nonzero wei), additionally process two events of the following form: ```python (caller_address, ACCOUNT_HEADER_SUBKEY, BALANCE_LEAF_KEY) (callee_address, ACCOUNT_HEADER_SUBKEY, BALANCE_LEAF_KEY) ``` If the `BALANCE` opcode is called targeting some `address`, process an access event of the form: ```python (address, ACCOUNT_HEADER_SUBKEY, BALANCE_LEAF_KEY) ``` If the `SELFDESTRUCT` opcode is called by some `caller_address` targeting some `target_ddress` (regardless of whether it's value-bearing or not), process access events of the form: ```python (caller_address, ACCOUNT_HEADER_SUBKEY, BALANCE_LEAF_KEY) (callee_address, ACCOUNT_HEADER_SUBKEY, BALANCE_LEAF_KEY) ``` If the `EXTCODEHASH` opcode is called targeting some `address`, process an access event of the form: ```python (address, ACCOUNT_HEADER_SUBKEY, CODEHASH_LEAF_KEY) ``` ### Access events for storage `SLOAD` and `SSTORE` opcodes with a given `address` and `key` process an access event of the form ```python (address, tree_key, sub_key) ``` Where `tree_key` and `sub_key` are computed as follows: ```python def get_storage_slot_tree_keys(storage_key: int): if storage_key < 64: return (0, 64 + storage_key) else: return (2**248 + storage_key // 256, storage_key % 256) ``` Note that this is the same calculation as made [in the Verkle tree storage mapping spec](https://ethereum-magicians.org/t/proposed-verkle-tree-scheme-for-ethereum-state/5805). ### Access events for code In the conditions below, "chunk `chunk_id` is accessed" is understood to mean an access event of the form ```python (address, (chunk_id + 128) // 256, (chunk_id + 128) % 256) ``` #### Conditions * At each step of EVM execution, chunk `PC // CHUNK_SIZE` (where `PC` is the current program counter) of the callee is accessed. In particular, note the following corner cases: * The destination of a `JUMP` (or positively evaluated `JUMPI`) is considered to be accessed, even if the destination is not a jumpdest or is inside pushdata * The destination of a `JUMPI` is not considered to be accessed if the jump conditional is false. * The destination of a jump is not considered to be accessed if the execution gets to the jump opcode but does not have enough gas to pay for the gas cost of executing the `JUMP` opcode (including chunk access cost if the `JUMP` is the first opcode in a not-yet-accessed chunk) * If the current step of EVM execution is a `PUSH{n}`, all chunks `(PC // CHUNK_SIZE) <= chunk_index <= ((PC + n) // CHUNK_SIZE)` of the callee are accessed. * If a nonzero-read-size `CODECOPY` or `EXTCODECOPY` read bytes `x...y` inclusive, all chunks `(x // CHUNK_SIZE) <= chunk_index <= (min(y, code_size - 1) // CHUNK_SIZE)` of the accessed contract are accessed. * Example 1: for a `CODECOPY` with start position 100, read size 50, `code_size = 200`, `x = 100` and `y = 149` * Example 2: for a `CODECOPY` with start position 600, read size 0, no chunks are accessed * Example 3: for a `CODECOPY` with start position 1500, read size 2000, `code_size = 3100`, `x = 1500` and `y = 3099` * `CODESIZE`, `EXTCODESIZE` and `EXTCODEHASH` do NOT access any chunks. ### Replacement for access lists We replace EIP 2930 access lists with an SSZ structure of the form: ```python class AccessList(Container): addresses: List[AccountAccessList, ACCESS_LIST_MAX_ELEMENTS] class AccountAccessList(Container): address: Address subtrees: List[AccessSubtree, ACCESS_LIST_MAX_ELEMENTS] class AccessSubtree(Container): subtree_key: uint256 elements: List[uint8, ACCESS_LIST_MAX_ELEMENTS] Address = Bytes20 ``` Validity conditions are: * The `address` of each element in the `access_list.addresses` list must be unique, and addresses must be lexicographically increasing * In every `account_access_list`, the `subtree_key` of each element in the `account_access_list.subtrees` list must be unique, and the values must be in increasing order * In every `access_subtree`, each element in the `elements` list must be unique and must be numerically increasing * All lists must be nonempty The access list is processed according to the following pseudocode: ```python def process_access_list(access_list): for account_access_list in access_list.addresses: for subtree in account_access_list.subtrees: for element in subtree.elements: access_event( account_access_list.address, subtree.subtree_key, element ) ``` The `access_event` function triggers an access event, but with two modifications: * Instead of charging `WITNESS_BRANCH_COST` gas for a new subtree, charge `(WITNESS_BRANCH_COST * ACCESS_LIST_DISCOUNT_NUMERATOR / ACCESS_LIST_DISCOUNT_DENOMINATOR)` * Instead of charging `WITNESS_CHUNK_COST` gas for a new chunk, charge `(WITNESS_CHUNK_COST * ACCESS_LIST_DISCOUNT_NUMERATOR / ACCESS_LIST_DISCOUNT_DENOMINATOR)` ## Rationale In the current proposed Verkle tree specification, all "account header" fields are stored in a single commitment at the bottom of the state trie. Code and storage are also grouped into subtrees, so adjacent code chunks and adjacent storage chunks will be in the same commitment. On average, it is expected that a witness access will be 200 bytes, so the proposed costs charge `1900 / 200 = 9.5` gas per byte. In the worst case, an attacker could expend `2**70` computing power to create two addresses that share 140 prefix bits (or 17 prefix bytes), which would mean accessing one account will have a size `48 * 15.5 = 744` byte witness (subtracting 1.5 bytes for shared prefixes in a ~5000 entry block witness); in that case, this proposal charges ~2.5 gas per byte. A chunk access will be 32 bytes, so for chunks this EIP charges `200 / 32 = 6.25` gas per byte. A zero-value `CALL` to a yet-untouched account would cost `1900 + 200 * 3 = 2500` gas, and an `SLOAD` to a random key would cost `1900 + 200 = 2100` gas, very similar to the status-quo gas costs of those opcodes (though on-net calls would get more expensive because of the need to charge for code chunks). Note also that this EIP is an important stepping stone to state expiry, because it defines and clearly specifies a concept of "access events", which would also be used in a state expiry system to determine which accessed old chunks to promote into the latest-epoch state. ## Backwards Compatibility This EIP will increase the cost of calling some contracts significantly. The access list extension ensures that this does not break backwards compatibility, much like the original introduction of EIP 2930 ensured that EIP 2929 does not break backwards compatibility. [Empirical analysis TBD] ## Security Considerations [TBD] ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).