# Gas cost increases for state access opcodes ``` --- eip: <to be assigned> title: Gas cost increases for state access opcodes author: Vitalik Buterin (@vbuterin), Martin Swende (@holiman) discussions-to: <URL></URL> status: Draft type: Standard Track category: Core created: 2020-08-22 --- ``` ## Simple Summary Increase the gas cost of SLOAD to 2100, and the CALL opcode family, BALANCE and the EXT* opcode family to 2600. Exempts (i) precompiles, and (ii) addresses and storage slots that have already been accessed in the same transaction. Additionally reforms SSTORE metering to ensure "de-facto storage loads" inherent in SSTORE are priced correctly. This is done as a short-term security improvement to reduce the effectiveness of what is currently the most effective DoS strategy, reducing the theoretical max processing time of a block by ~3x, and also has the effect of being a stepping stone toward [bounding stateless witness sizes](https://ethereum-magicians.org/t/protocol-changes-to-bound-witness-size/3885). ## Motivation Generally, the main function of gas costs of opcodes is to be an estimate of the time needed to process that opcode, the goal being for the gas limit to correspond to a limit on the time needed to process a block. However, storage-accessing opcodes (SLOAD, as well as the CALL, BALANCE and EXT* opcodes) have historically been underpriced. In the 2016 Shanghai DoS attacks, once the most serious client bugs were fixed, one of the more durably successful strategies used by the attacker was to simply send transactions that access or call a large number of accounts. Gas costs were increased to mitigate this, but recent numbers suggest they were not increased enough. Quoting [https://arxiv.org/pdf/1909.07220.pdf](https://arxiv.org/pdf/1909.07220.pdf): > Although by itself, this issue might seem benign, EXTCODESIZE forces the client to search the contract ondisk, resulting in IO heavy transactions. While replaying the Ethereum history on our hardware, the malicious transactions took around 20 to 80 seconds to execute, compared to a few milliseconds for the average transactions This proposed EIP increases the costs of these opcodes by a factor of ~3, reducing the worst-case processing time to ~7-27 seconds. Improvements in database layout that involve redesigning the client to read storage directly instead of hopping through the Merkle tree would decrease this further, though these technologies may take a long time to fully roll out, and even with such technologies the IO overhead of accessing storage would remain substantial. A secondary benefit of this EIP is that it also performs most of the work needed to make [stateless witness sizes](https://ethereum-magicians.org/t/protocol-changes-to-bound-witness-size/3885) in Ethereum acceptable. Assuming [a switch to binary tries](https://ethresear.ch/t/binary-trie-format/7621), the theoretical maximum witness size not including code size (hence "most of the work" and not "all") would decrease from `(12500000 gas limit) / (700 gas per BALANCE) * (800 witness bytes per BALANCE) = 14.3 MB` to `12500000 / 2500 * 800 = 4 MB`. Pricing for code access could be changed when code Merklization is implemented. In the further future, there are similar benefits in the case of SNARK/STARK witnesses. Recent numbers from Starkware suggest that they are able to prove 10000 Rescue hashes per second on a consumer desktop; assuming 25 hashes per Merkle branch, and a block full of state accesses, at present this would imply a witness would take `12500000 / 700 * 25 / 10000 ~= 44.64` seconds to generate, but after this EIP that would reduce to `12500000 / 2500 * 25 / 10000 ~= 12.5` seconds, meaning that a single desktop computer would be able to generate witnesses on time under any conditions. Future gains in STARK proving could be spent on either (i) using a more expensive but robust hash function or (ii) reducing proving times further, reducing the delay and hence improving user experience of stateless clients that rely on such witnesses. ## Specification ### Parameters | Constant | Value | | - | - | | `FORK_BLOCK` | TBD | | `COLD_SLOAD_COST` | 2100 | | `COLD_ACCOUNT_ACCESS_COST` | 2600 | | `WARM_STORAGE_READ_COST` | 100 | For blocks where `block.number >= FORK_BLOCK`, the following changes apply. ### Storage read changes When executing a transaction, maintain a set `accessed_addresses: Set[Address]` and `accessed_storage_keys: Set[Tuple[Address, Bytes32]]` (this is a transaction-context-wide set, implemented identically to self-destructs, in particular reverting similarly to how self-destructs revert). When a transaction execution begins, `accessed_storage_keys` is initialized to empty, and `accessed_addresses` is initialized to include the `tx.sender`, `tx.to` (or the address being created if it is a contract creation transaction) and the set of all precompiles. When an address is either the target of a (`EXTCODESIZE`, `EXTCODECOPY`, `EXTCODEHASH` or `BALANCE`) opcode or the target of a (`CALL`, `CALLCODE`, `STATICCALL`, `DELEGATECALL`) opcode, the gas costs are computed as follows: * If the target is not in `accessed_addresses`, charge `COLD_ACCOUNT_ACCESS_COST` gas, and add the address to `accessed_addresses`. * Otherwise, charge `WARM_STORAGE_READ_COST` gas. In all cases, the gas cost is charged and the map is updated at the time that the opcode is being called. When a `CREATE` or `CREATE2` opcode is called, immediately (ie. before checks are done to determine whether or not the address is unclaimed) add the address being created to `accessed_addresses`, but gas costs of `CREATE` and `CREATE2` are unchanged. For `SLOAD`, if the `(address, storage_key)` pair (where `address` is the address of the contract whose storage is being read) is not yet in `accessed_storage_keys`, charge `COLD_SLOAD_COST` gas and add the pair to `accessed_storage_keys`. If the pair is already in `accessed_storage_keys`, charge `WARM_STORAGE_READ_COST` gas. ### SSTORE changes When calling `SSTORE`, check if the `(address, storage_key)` pair is in `accessed_storage_keys`. If it is not, charge an additional `COLD_SLOAD_COST` gas, and add the pair to `accessed_storage_keys`. Additionally, modify the parameters defined in [EIP 2200](https://eips.ethereum.org/EIPS/eip-2200) as follows: | Parameter | Old value | New value | | - | - | - | | `SLOAD_GAS` | 800 | `= WARM_STORAGE_READ_COST` | | `SSTORE_RESET_GAS` | 5000 | `5000 - COLD_SLOAD_COST` | The other parameters defined in EIP 2200 are unchanged. ### SELFDESTRUCT changes If the ETH recipient of a `SELFDESTRUCT` is not in `accessed_addresses` (regardless of whether or not the amount sent is nonzero), charge an additional `COLD_ACCOUNT_ACCESS_COST` on top of the existing gas costs, and add the ETH recipient to the set. ### Contract breakage mitigations See the [Security Considerations](#Security-Considerations) section for details on the tradeoffs between these versions. #### Version 1 Do nothing (ie. implement only the changes described above). #### Version 2 Add a `POKE` precompile, with cost 4500 gas, that takes two stack arguments, `address` and `storage_slot`, and adds `address` to `accessed_addresses` and `(address, storage_slot)` to `accessed_storage_keys`. #### Version 3 Access lists: https://hackmd.io/@HWeNw8hNRimMm2m2GH56Cw/r1EzLEvmD ## Rationale ### Opcode costs vs charging per byte of witness data The natural alternative path to changing gas costs to reflect witness sizes is to charge per byte of witness data. However, that would take a longer time to implement, hampering the goal of providing short-term security relief. Furthermore, following that path faithfully would lead to extremely high gas costs to transactions that touch contract code, as one would need to charge for all 24000 contract code bytes; this would be an unacceptably high burden on developers. It is better to wait for [code merklization](https://medium.com/ewasm/evm-bytecode-merklization-2a8366ab0c90) to start trying to properly account for gas costs of accessing individual chunks of code; from a short-term DoS prevention standpoint, accessing 24 kB from disk is not much more expensive than accessing 32 bytes from disk, so worrying about code size is not necessary. ### Adding the accessed_addresses / accessed_storage_keys sets The sets of already-accessed accounts and storage slots are added to avoid needlessly charging for things that can be cached (and in all performant implementations already are cached). Additionally, it removes the current undesirable status quo where it is needlessly unaffordable to do self-calls or call precompiles, and enables contract breakage mitigations that involve pre-fetching some storage key allowing a future execution to still take the expected amount of gas. ### SSTORE gas cost change The change to SSTORE is needed to avoid the possibility of a DoS attack that "pokes" a randomly chosen zero storage slot, changing it from 0 to 0 at a cost of 800 gas but requiring a de-facto storage load. The `SSTORE_RESET_GAS` reduction ensures that the total cost of SSTORE (which now requires paying the `COLD_SLOAD_COST`) remains unchanged. Additionally, note that applications that do `SLOAD` followed by `SSTORE` (eg. `storage_variable += x`) _would actually get cheaper_! ### Change SSTORE accounting only minimally The SSTORE gas costs continue to use Wei Tang's original/current/new approach, instead of being redesigned to use a dirty map, because Wei Tang's approach correctly accounts for the actual costs of changing storage, which only care about current vs final value and not intermediate values. ### How would gas consumption of average applications increase under this proposal? #### Rough analysis from witness sizes We can look at [Alexey Akhunov's earlier work](https://medium.com/@akhounov/data-from-the-ethereum-stateless-prototype-8c69479c8abc) for data on average-case blocks. In summary, average blocks have witness sizes of ~1000 kB, of which ~750 kB is Merkle proofs and not code. Assuming a conservative 2000 bytes per Merkle branch this implies ~375 accesses per block (SLOADs have a similar gas-increase-to-bytes ratio so there's no need to analyze them separately). Data on [txs per day](https://etherscan.io/chart/tx) and [blocks per day](https://etherscan.io/chart/blocks) from Etherscan gives ~160 transactions per block (reference date: Jul 1), implying a large portion of those accesses are just the `tx.sender` and `tx.to` which are excluded from gas cost increases, though likely less than 320 due to duplicate addresses. Hence, this implies ~50-375 chargeable accesses per block, and each access suffers a gas cost increase of 1900; `50 * 1900 = 95000` and `375 * 1900 = 712500`, implying the gas limit would need to be raised by ~1-6% to compensate. However, this analysis may be complicated further in either direction by (i) accounts / storage keys being accessed in multiple transactions, which would appear once in the witness but twice in gas cost increases, and (ii) accounts / storage keys being accessed multiple times in the same transaction, which lead to gas cost _decreases_. #### Goerli analysis A more precise analysis can be found by scanning Goerli transactions, as done by Martin Swende here: https://github.com/holiman/gasreprice The conclusion is that on average gas costs increase by ~2.36%. One major contributing factor to reducing gas costs is that a large number of contracts inefficiently read the same storage slot multiple times, which leads to this EIP giving a few transactions gas cost _savings_ of over 10%. ## Backwards Compatibility These gas cost increases may potentially break contracts that depend on fixed gas costs; see the [security considerations section](#Security-Considerations) for details and arguments for why we expect the total risks to be low and how if desired they can be reduced further. ## Test Cases Some test cases can be found here: https://gist.github.com/holiman/174548cad102096858583c6fbbb0649a Ideally we would test the following: * SLOAD the same storage slot {1, 2, 3} times * CALL the same address {1, 2, 3} times * (SLOAD | CALL) in a sub-call, then revert, then (SLOAD | CALL) the same (storage slot | address) again * Sub-call, SLOAD, sub-call again, revert the inner sub-call, SLOAD the same storage slot * SSTORE the same storage slot {1, 2, 3} times, using all combinations of zero/nonzero for original value and the value being set * SSTORE then SLOAD the same storage slot * `OP_1` then `OP_2` to the same address where `OP_1` and `OP_2` are all combinations of (\*CALL, EXT\*, SELFDESTRUCT) * Try to CALL an address but with all possible failure modes (not enough gas, not enough ETH...), then (CALL | EXT*) that address again successfully ## Implementation A WIP early-draft implementation for Geth can be found here: https://github.com/holiman/go-ethereum/tree/access_lists ## Security Considerations As with any gas cost increasing EIP, there are three possible cases where it could cause applications to break: 1. Fixed gas limits to sub-calls in contracts 2. Applications relying on contract calls that consume close to the full gas limit 3. The 2300 base limit given to the callee by ETH-transferring calls These risks have been studied before in the context of an earlier gas cost increase, EIP 1884. See [Martin Swende's earlier report](https://github.com/holiman/eip-1884-security) and [Hubert Ritzdorf's analysis](https://gist.github.com/ritzdorf/1c6bd72955391e831f8a397d3152b4e0/) focusing on (1) and (3). (2) has received less analysis, though one can argue that it is very unlikely both because applications tend to very rarely use close to the entire gas limit in a transaction, and because gas limits were very recently raised from 10 million to 12.5 million. EIP 1884 in practice [did lead to a small number of contracts breaking](https://www.coindesk.com/ethereums-istanbul-upgrade-will-break-680-smart-contracts-on-aragon) for this reason. There are two ways to look at these risks. First, we can note that as of today developers have had years of warning; gas cost increases on storage-accessing opcodes have been [discussed for a long time](https://ethereum-magicians.org/t/protocol-changes-to-bound-witness-size/3885), with multiple statements made including to major dapp developers around the likelihood of such changes. EIP 1884 itself provided an important wake-up call. Hence, we can argue that risks this time will be significantly lower than EIP 1884. A second way to look at the risks is to explore mitigations. First of all, the existence of an `accessed_addresses` and `accessed_storage_keys` map (present in this EIP, absent in EIP 1884) already makes some cases recoverable: in any case where a contract A needs to send funds to some address B, where that address accepts funds from any source but leaves a storage-dependent log, one can recover by first sending a separate call to B to pull it into the cache, and then call A, knowing that the execution of B triggered by A will only charge 100 gas per SLOAD. This fact does not fix all situations, but it does reduce risks significantly. Options (2) and (3) in "contract breakage mitigations" in the EIP attempt to further expand the usability of this pattern. Option (2), the `POKE` precompile, allows transactions that attempt to "rescue" stuck contracts by pre-poking all of the storage slots that they will access. This works even if the address only accepts transactions from the contract, and works in many other contexts with present gas limits. The only case where this will not work would be the case where a transaction call _must_ go from an EOA straight into a specific contract that then sub-calls another contract. Option 3 (in-transaction access lists) has a similar effect to `POKE` but is more general: it also works for the EOA -> contract -> contract case, and generally should work for all known cases of breakage due to gas cost increases. Option (3) is more complex, though it is arguably a stepping stone toward access lists being used for other use cases (regenesis, account abstraction, SSA all demand access lists). ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).