--- eip: XXXX title: State expiry author: Vitalik Buterin (@vbuterin), [TBD] discussions-to: YYYY status: Draft type: Standards Track category: Core created: 2021-ZZ-WW --- ## Simple Summary Replace the single state tree with a list of state trees, with one tree per roughly-year-long period. State edits are stored in the tree corresponding to the current period, and trees older than the most recent two are no longer stored by clients. Transactions that use old state, which was not modified in the most recent two periods, are required to provide witnesses. ## Prerequisites: * **Verkle tree EIP**: https://notes.ethereum.org/@vbuterin/verkle_tree_eip * **Address extension**: https://ethereum-magicians.org/t/increasing-address-size-from-20-to-32-bytes/5485 and https://notes.ethereum.org/@ipsilon/address-space-extension-exploration ## Motivation This EIP assumes that Verkle trees have been implemented using the scheme in [EIP XXXX](https://notes.ethereum.org/@vbuterin/verkle_tree_eip), so the `states` object contains a Patricia tree and a Verkle tree. This EIP adds state expiry by extending the state list to a list of N Verkle trees, where clients are expected to store only the last two. Older state is still accessible, the transaction sender is required to provide witnesses to verify that state. ## Specification ### Parameters | Constant | Value | | - | - | | `PERIOD_LENGTH` | `2**21` (= 2,097,152) | | `FORK_BLKNUM` | 99,999,999 | | `PERIOD_0_VERKLE_ROOT` | TBD | `PERIOD_0_VERKLE_ROOT` is set to the root of a Verkle tree storing equivalent data to the period 0 Patricia tree (which will be immutable after the Verkle tree fork). ### Transition At the fork block height, change the `states` structure from `Tuple[PatriciaTree, VerkleTree]` to `List[VerkleTree]`, where: * `states[0] = VerkleTree(root_hash=PERIOD_0_VERKLE_ROOT)` * `states[1] = pre_states[1]` * `states[2] = EmptyVerkleTree()` `PERIOD_0_VERKLE_ROOT` can only be computed after the Verkle tree hard fork, at which point the period 0 tree is fixed. Clients do not need to store any states except the last two, so clients knowing the `PERIOD_0_VERKLE_ROOT` is sufficient. ### Address periods We introduce a new address type, as defined by [EIP XXXX](https://ethereum-magicians.org/t/increasing-address-size-from-20-to-32-bytes/5485). This address type still uses the 0x01 tag, but allows bytes `3..5` to be nonzero. Bytes `3..5` represents the _address period number_ of an address in little-endian form. That is, we define: ```python def get_address_period_number(address: Address32) -> int: assert address[0] in (0, 1) return address[3] + address[4] * 256 + address[5] * 65536 ``` We introduce a new transaction type: ```python class SignedNewTransaction(Container): payload: NewTransaction old_state_proof: VerkleProof signature: ECDSASignature class NewTransaction(Container): nonce: uint64 target_address_period: uint24 gas: uint64 priority_fee: uint64 max_gasprice: uint64 to: Address32 value: uint256 data: bytes claimed_states: List[StateClaim, 2**24] class ECDSASignature(Container): yParity: bool r: uint256 s: uint256 class StateClaim(Container): address: Address32 from_period: uint64 subtree_claims: List[SubtreeClaim, 2**24] class SubtreeClaim(Container): tree_index: uint256 sub_indices: BitVector[VERKLE_NODE_WIDTH] values: Vector[Optional[bytes32], VERKLE_NODE_WIDTH] ``` `VerkleProof` is defined as in the Verkle tree specification. The `claimed_states` structure replaces and serves the function of access lists. The origin of this transaction type is considered to be: ```python def compute_origin(tx: SignedNewTransaction) -> Address32: pubkey = ecrecover_to_pubkey(hash_tree_root(tx.payload), tx.signature) period = tx.payload.target_address_period return Address32( # If period = 0, address is old-style (version 0) (1 if period >= 1 else 0).to_bytes(3, 'little') + # Bytes 3..5 = period period.to_bytes(3, 'little') + # Bytes 6..31 = hash keccak256(pubkey)[6:] ) ``` Note that only with this transaction type is it possible to create a transaction whose origin is in a nonzero address period. **When state period >= 2, all transaction types introduced before this EIP was introduced become invalid.** We also introduce a new opcode, `CREATE3`, which allows us to create contracts with address period >= 1. `CREATE3` takes as stack arguments (from bottom-to-top in the stack order): ``` address_period salt memory_size memory_start value ``` The contract address is calculated as follows: ```python def compute_create3_address(address_period: uint24, creator: Address32, salt: bytes32, init_code: bytes): address_hash = keccak256( b'\xfe' + creator + salt + keccak256(init_code) ) return Address32( bytes([1, 0, 0]) + address_period.to_bytes(3, 'little') + address_hash[6:] ) ``` We define the **current state period** as follows: ```python def get_current_state_period(block: ExecutionBlock): if block.number < VERKLE_FORK_BLKNUM: return 0 elif block.number < FORK_BLKNUM: return 1 else: return (block.number - FORK_BLKNUM) // PERIOD_LENGTH + 2 ``` **Transactions whose origin has address period `N` are illegal in blocks with `state_period < N`. A `CREATE3` invocation that uses address period `N` throws an exception if `state_period < N`.** A `CREATE` or `CREATE2` invocation in contexts were the address period number of the `ADDRESS` (ie. callee) of the current context is nonzero throws an exception. ### State tree modification rules We build on the logic in [EIP XXXX](https://notes.ethereum.org/@vbuterin/verkle_tree_eip), where access events are clearly defined and every piece of state in an address is mapped to a particular `tree_key: bytes32` position in the tree. Note particularly that addresses with different address periods, even if they are otherwise "the same" (eg. same EOA pubkey or same initcode+salt), are stored at different `tree_key` positions. Here, we extend the logic to define _which_ tree to write to. If `get_current_state_period(block) > get_current_state_period(get_parent(block)) >= 1`, at the start of the block execution set `states.append(EmptyVerkleTree())`. This maintains the invariant that `len(states) == get_current_state_period(block) + 1`. If, at the end of a transaction execution, the value stored at a particular `tree_key` is modified, that modification is _always_ stored in the most recent state tree (ie. `states[len(states) - 1]`). If the value at some `tree_key` is _accessed_ (ie. an _access event_ occurs), the accessed value is also stored in the most recent state tree, despite the lack of a change. If the value at a `tree_key` is changed to 0, the value is not deleted from the tree; instead, 0 is stored (Verkle trees treat "0" and "empty" differently). ### State tree reading rules We read any property of an account as follows. The core logic is to take the most recent value that is nonempty (remember: an explicitly saved 0 is not empty). Values that are older than the previous state period are read from the `state_claims`, which must be included (along with proofs) in the transaction. ```python def read_recent_state(states: Sequence[Union[PatriciaTree, VerkleTree]], state_claims: Sequence[StateClaim], address: Address32, tree_index: int, sub_index: int) -> Optional[bytes32]: tree_key = get_tree_key(address, tree_index, sub_index) if get_address_period_number(address) == len(states) - 1: if tree_key in states[-1]: return states[-1][tree_key] else: return Bytes32() elif get_address_period_number(address) == len(states) - 2: if tree_key in states[-1]: return states[-1][tree_key] elif tree_key in states[-2]: return states[-2][tree_key] else: return Bytes32() else: if tree_key in states[-1]: return states[-1][tree_key] elif tree_key in states[-2]: return states[-2][tree_key] else: claimed_value = find_claimed_state( state_claims, address, tree_index, sub_index ) # Returns None if claimed state cannot be found return claimed_value ``` We define `find_claimed_state` as follows: ```python def find_claimed_state(tx: SignedNewTransaction, address: Address32, tree_index: int, sub_index: int) -> Optional[bytes32]: for claim in tx.payload.state_claims: if claim.address != address: pass for subtree_claim in claim.subtree_claims: if subtree_claim.index != tree_index: pass if subtree_claim.sub_indices[sub_index] == 0: return None return subtree_claim.values[sub_index] return None ``` ### Proof validation We validate the `state_claims` of a transaction as follows. First, a basic well-formedness check: ```python def check_state_claim_well_formedness(state_claims: Sequence[StateClaim]) -> bool: # Check (address, from_period) uniqueness and sorted order for i in range(len(state.claims) - 1): cur_claim, next_claim = state.claims[i], state.claims[i+1] assert ( (next_claim.address, next_claim.from_period) > (cur_claim.address, cur_claim.from_period) ) for claim in state_claims: claim_ahead_of_earliest = ( claim.from_period > get_address_period_number(claim.address) ) # Check subtree claim uniqueness and sorted order for i in range(len(claim.subtree_claims) - 1): assert claim.subtree_claims[i+1].index > claim.subtree_claims[i].index for subtree_claim in claim.subtree_claims: for bit, value in zip(subtree_claim.sub_indices, subtree_claim.values): # Don't claim not-None values in positions that you're saying # are not part of the proof if bit == 0: assert value is None # A None in the state only really means None if it's in the same # state period as the address's address period. Otherwise, there # could be an entry in an older state period, so this proof is # meaningless if bit == 1 and claim_ahead_of_earliest: assert value is not None ``` Now, we check the proof. First, a helper function to actually check what claims need to be made. This helper function is defined separately because we also use it to compute the intrinsic gas of a new-style transaction. ```python HEADER = 0 CODE = 1 class VerkleClaim(Container): root: bytes32 tree_key: bytes32 value: Optional[bytes32] def get_tree_claims(state_claims: Sequence[StateClaim], states: List[VerkleTree]) -> List[VerkleClaim]: current_period = len(states) - 1 claims = [] for claim in state_claims: for subtree_claim in claim.subtree_claims: for i in range(VERKLE_NODE_WIDTH): if subtree_claim.sub_indices[i] == 1: # Claim the value in the claimed state period claims.append(VerkleClaim( get_root_hash(states[claim.from_period]), get_tree_key(claim.address, subtree_claim.index, i), subtree_claim.values[i] )) for period in range(claim.from_period, current_period - 1): # Also claim emptiness in all state periods since then claims.append(VerkleClaim( get_root_hash(states[period]), get_tree_key(claim.address, subtree_claim.index, i), None )) return claims ``` Now, the proof verification function: ```python def verify_proofs(state_claims: Sequence[StateClaim], verkle_proof: VerkleProof, states: List[VerkleTree]): patricia_claims, verkle_claims = get_tree_claims(state_claims, states) assert verify_verkle_multi_proof(verkle_claims, verkle_proof) ``` `verify_verkle_multi_proof` is defined at **[TBD]**. ### New transaction intrinsic gas cost calculation ```python def concat(arrays: List[List[Any]]) -> List[Any]: return sum(arrays, []) def get_intrinsic_gas_cost(tx: SignedNewTransaction, states: List[VerkleTree]) -> int: base_cost = 21000 data_cost = 16 * len(tx.payload.data) verkle_claims = get_tree_claims( tx.payload.state_claims, states ) all_subtree_claims = concat([ claim.subtree_claims for claim in tx.payload.state_claims ]) branch_cost = WITNESS_BRANCH_COST * len(all_subtree_claims) chunk_cost = WITNESS_CHUNK_COST * sum( claim.sub_indices.count(1) for claim in all_subtree_claims ) old_verkle_proof_branches = len(set( (claim.root, claim.tree_key[:31]) for claim in verkle_claims )) old_verkle_proof_chunks = len(verkle_claims) old_verkle_proof_cost = ( WITNESS_BRANCH_COST * old_verkle_proof_branches + WITNESS_CHUNK_COST * old_verkle_proof_chunks ) return ( base_cost + data_cost + branch_cost + chunk_cost + old_verkle_proof_cost ) ``` ### Prepopulated accessed set When processing the transaction we also pre-populate `accessed_subtrees` and `accessed_leaves`: ```python def fill_accessed_positions(tx: SignedNewTransaction, states: List[VerkleTree]) -> Tuple[ Set[Tuple[address, int]], Set[Tuple[address, int, int]] ]: subtrees = set() leaves = set() for claim in tx.payload.state_claims: for subtree_claim in claim.subtree_claims: subtrees.add((claim.address, subtree_claim.tree_index)) for i, bit in enumerate(subtree_claim.subtree_keys): if bit == 1: leaves.add((claim.address, subtree_claim.tree_index, i)) return subtrees, leaves ``` ## Rationale This is an implementation of the state expiry mechanism proposed [here](https://ethresear.ch/t/resurrection-conflict-minimized-state-bounding-take-2/8739). The core idea is that there would be a state tree per state period (think: 1 state period ~= 1 year), and when a new state period begins, an empty state tree is initialized for that period and any state updates go into that tree. Full nodes in the network would only be required to store the most recent two trees, so on average they would only be storing state that was read or written in the last ~1.5 periods ~= 1.5 years. This puts a permanent cap (proportional to the gaslimit) on how much state clients would need to store. An absolute worst case theoretical max can be estimated as 2.5m blocks * 15m gas per block / 2100 gas cost of reading (and hence re-updating) a new storage slot = 17.8b objects ~= 700 GB maximum, but under normal circumstances it would be much smaller, perhaps 15-30 GB. If we want to reduce the difference between average and maximum further, we could just [add a separate gasprice for storage filling](https://ethresear.ch/t/should-storage-be-priced-separately-from-execution/9101). ![](https://i.imgur.com/a6ToQZE.png) There is also a separate notion of an _address period_; all existing accounts and contracts have address period 0. A new CREATE opcode is added that takes an address period as an argument and creates an account with that address period. Addresses with different address periods can coexist in the same trie: we implicitly mix the address period index into the trie key to avoid collisions. ### Rules for reading and writing There are two key principles: * **Only the most recent tree (ie. the tree corresponding to the current state period) can be modified**. All older trees are no longer modifiable; objects in older trees can only be modified by creating copies of them in newer trees, and these copies supersede the older copies. * **Full nodes (including block proposers) are expected to only hold the most recent two trees, so only objects in the most recent two trees can be read without a witness**. Reading older objects requires providing witnesses. Creation of _new_ objects (accounts or storage slots) is governed by the address period mechanism: addresses with address period `n` can only be modified in periods `>= n`. A new object with address period `n` can be created in periods `n` or `n+1` without providing a witness, but creating an object with address period `n` in some period `e > n+1` requires witnesses to prove that an object in that same position was not already created. Expressed in precise terms, the rules are as follows. When we refer to "a state object `(e, s)`", `e` is the state period number of the object, and `s` is the address itself (including the storage key if we are talking about a storage slot). The trees are referring to as `S_0`, `S_1`, `S_2`... * If an object `(e, s)` is modified or created during period `e`, this can be done directly with a modification to tree `S_e` * If an object `(e, s)` is modified or created during period `e+1`, this can be done directly with a modification to tree `S_(e+1)`. The object is put into tree `S_(e+1)`; tree `S_e` is not modified. * If an object `(e, s)` is modified during period `f > e+1`, and this object is already part of tree `S_f` or `S_{f-1}`, then this can be done directly with a modification to tree `S_f` * If an object `(e, s)` is first created during period `f > e+1` and was never before touched, then the sender of the transaction creating this object must provide a witness showing the object's absence in all trees `S_{e} ... S_{f-2}` * If an object `(e, s)` is modified during period `f > e+1`, and this object is not yet part of tree `S_f`, and the object was most recently part of tree `S_(e')` with `e <= e' < f-1`, then the sender of the transaction creating this object must provide a witness showing the state of the object in tree `S_(e')` and its absence in all trees `S_{e'+1} ... S_{f-2}` Note one additional implementation detail: when an object from an old period is read or written to, and its post-transaction value is zero, the object still needs to be saved in the new state tree. "Zero" and "absent" are no longer synonyms, as "zero" means there's nothing there and "absent" means "the latest version of this object might be in older trees, check those first". <center><br> <img src="https://i.imgur.com/VGeyjZ2.png" /> </center><br><br> <small>Suppose the dark-blue object was last modified in period 0, and you want to read/write it in a transaction in period 3. To prove that period 0 really was the last time the object was touched, we need to prove the dark-blue values in periods 0, 1 and 2. Full nodes still have the full period 2 state, so no witness is required. For periods 0 and 1, we do need witnesses: the light blue nodes, plus the purple nodes that can be regenerated during witness verification. After this operation, a copy of the object is saved in the period 3 state.</small>