owned this note
owned this note
Published
Linked with GitHub
# Next steps in the Purge
One of the less well-known EIPs in the recent Dencun hard fork is [EIP-6780](https://eips.ethereum.org/EIPS/eip-6780), which removed most of the functionality of the `SELFDESTRUCT` opcode.
<br>
[![](https://storage.googleapis.com/ethereum-hackmd/upload_370895b4283964d0d9a9ff874a37c100.png)](https://eips.ethereum.org/EIPS/eip-6780)
<br>
This EIP is a key example of an often undervalued part of Ethereum protocol development: **the effort to simplify the protocol by removing complexity and adding _new_ security guarantees**. This is a big part of what I have labeled as ["The Purge"](https://twitter.com/VitalikButerin/status/1741190491578810445): the project of slimming down Ethereum and clearing technical debt. There will be more EIPs that have a similar spirit, and so it is worth understanding both how EIP-6780 in particular accomplishes the goal, and what other EIPs there might be in the future.
## How does EIP-6780 simplify the Ethereum protocol?
EIP-6780 reduces the functionality of the `SELFDESTRUCT` opcode, which destroys the contract that calls it and empties its code and storage, so that it only works if the contract was created during the same transaction. This by itself is not a complexity decrease to the _specification_. However, it does improve _implementations_, by introducing two new **invariants**:
1. Post EIP-6780, there is a maximum number of storage slots (roughly: gas limit / 5000) that can be edited in a single block.
2. If a contract has nonempty code at the start of a transaction or block, it will have the same code at the end of that transaction or block.
Before, neither of these invariants were true:
1. A `SELFDESTRUCT` on a contract with a large number of storage slots could clear an unlimited amount of storage slots within a single block. This would have made it much harder to implement [Verkle trees](https://verkle.info/), and it was making Ethereum client implementations much more complicated, because they needed to have extra code to handle that special case efficiently.
2. A contract's code could go from nonempty to empty through `SELFDESTRUCT`, and in fact the contract could even be re-created with different code immediately after. This made it harder for transaction verification in [account abstraction](https://www.erc4337.io/) wallets to use code libraries without being vulnerable to DoS attacks.
Now, these invariants _are_ both true, making it significantly easier to build an Ethereum client and other kinds of infrastructure. A few years down the line, hopefully a future EIP can finish the job and eliminate `SELFDESTRUCT` entirely.
## What are some other "purges" that are happening?
* Geth has recently deleted thousands of lines of code by [dropping support for pre-merge (PoW) networks](https://twitter.com/peter_szilagyi/status/1765016675131301958).
* [This EIP](https://eips.ethereum.org/EIPS/eip-7523) which formally enshrines that fact that we no longer need to have code to worry about "empty accounts" (see: [EIP-161](https://eips.ethereum.org/EIPS/eip-161), which introduced this concept as part of a fix to the [Shanghai DoS attacks](https://www.youtube.com/watch?v=nhr5nlMNvRQ))
* The 18-day storage window for blobs in Dencun, which means that an Ethereum node only needs ~50 GB to store blob data and this amount does not increase over time
The first two significantly improve life for client developers. The latter significantly improves life for node operators.
## What are some other things that might need to be purged?
### Precompiles
[Precompiles](https://www.evm.codes/precompiled?fork=cancun) are Ethereum contracts that, instead of having EVM code, have logic that must be directly implemented by clients themselves. The idea is that precompiles can be used to implement complex forms of cryptography that cannot be implemented efficiently within the EVM.
Precompiles are used very successfully today, notably to enable ZK-SNARK-based applications with the elliptic curve precompiles. However, there are other precompiles that are being used very rarely:
* `RIPEMD-160`: a hash function that was introduced to support better compatibility with Bitcoin
* `Identity`: a precompile that returns the same output as its input
* `BLAKE2`: a hash function that was introduced to support better compatibility with Zcash
* `MODEXP` modular exponentiation with very big numbers, introduced to support RSA-based cryptography
It turns out that the demand for these precompiles is _far_ lower than was anticipated. `Identity` was used a lot because it was the easiest way to copy data, but since Dencun [the `MCOPY` opcode](https://eips.ethereum.org/EIPS/eip-5656) has superseded it. And unfortunately, **these precompiles are all a huge source of consensus bugs, and a huge source of pain for new EVM implementations**, including ZK-SNARK circuits, formal-verification-friendly implementations, etc.
There are two ways to remove these precompiles:
1. Just remove the precompile, eg. [EIP-7266](https://eips.ethereum.org/EIPS/eip-7266) which removes BLAKE2. This is easy, but breaks any applications that do still use it.
2. Replace the precompile with a chunk of EVM code that does the same thing (though inevitably at a higher gas cost), eg. [this draft EIP](https://github.com/ethereum/EIPs/pull/8366) to do this for the identity precompile. This is harder, but almost certainly does not break applications that use it (except in very rare cases where the gas cost of the new EVM code exceeds the block gas limit for some inputs)
### History (EIP-4444)
Today, each Ethereum node is expected to store all historical blocks forever. It has been understood for a long time that this is a highly wasteful approach, and makes it needlessly difficult to run an Ethereum node due to the high storage requirements. With Dencun, we introduced blobs, which only need to be stored for ~18 days. With [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444), Ethereum blocks will also get removed from default Ethereum nodes after some time.
One key issue to resolve is: if old history does not get stored by literally every node, what _does_ store it? Realistically, large-scale entities such as block explorers will. However, it is also possible and not that difficult to make p2p protocols to store and pass around that information, which are more optimized for the task.
**The Ethereum blockchain is permanent, but requiring literally every node to store all of the data forever is a very "overkill" way of achieving that permanence**.
Simple peer-to-peer torrent networks for old history are one approach. Protocols that are more explicitly optimized for Ethereum use, such as the [Portal Network](https://www.ethportal.net/), are another.
Or, in meme format:
<center><br>
![](https://storage.googleapis.com/ethereum-hackmd/upload_f0139deb09828ff33f4e390b0c4ac366.png)
</center><br>
Reducing the amount of storage needed to run an Ethereum node can greatly increase the number of people who are willing to do so. Reducing node sync time, which EIP-4444 also does, also simplifies many node operators' workflows. Hence, EIP-4444 can greatly increase Ethereum's node decentralization. **Potentially, if each node stores small percentages of the history by default, we could even have roughly as many copies of each specific piece of history being stored across the network as we do today.**
### LOG reform
Quoting from [this draft EIP](https://github.com/ethereum/EIPs/pull/8368) directly:
> Logs were originally introduced to give applications a way to record information about onchain events, which decentralized applications (dapps) would be able to easily query. Using bloom filters, dapps would be able to quickly go through the history, identify the few blocks that contained logs relative to their application, and then quickly identify which individual transactions have the logs that they need.
>
> In practice, this mechanism is far too slow. Almost all dapps that access history end up doing so not through RPC calls to an Ethereum node (even a remote-hosted one), but through centralized extra-protocol services.
What can we do? We can remove bloom filters, and simplify the `LOG` opcode so that _all_ it does is create a value that gets hashes into the state. We can then build separate protocols that use ZK-SNARKs and incrementally-verifiable computation (IVC) to generate provably-correct "log trees", that represent an easily-searchable table of all logs for a given `topic`, and applications that need logs and want to be decentralized can use these separate protocols.
### Moving to SSZ
Today, much of the Ethereum block structure, including transactions and receipts, is still stored using outdated formats based on [RLP](https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/) and Merkle Patricia trees. This makes it needlessly difficult to make applications that use that data.
The Ethereum consensus layer has moved to the cleaner and more efficient [SimpleSerialize (SSZ)](https://ethereum.org/en/developers/docs/data-structures-and-encoding/ssz/):
<center><br>
![](https://storage.googleapis.com/ethereum-hackmd/upload_7e6dd951a920d8e9076e353ad450b58c.png)
_Source: [https://eth2book.info/altair/part2/building_blocks/merkleization/](https://eth2book.info/altair/part2/building_blocks/merkleization/)_
</center><br>
However, we still need to complete the transition, and move the execution layer over to the same structure.
Key benefits of SSZ include:
* Much simpler and cleaner specification
* 4x shorter Merkle proofs in most cases, compared to status-quo hexary Merkle Patricia trees
* Bounded length for Merkle proofs, compared to extremely long worst-cases (eg. proving contract code or long receipt outputs)
* No need to implement complicated bit-twiddling code (which RLP requires)
* For ZK-SNARK use cases, can often reuse existing implementations that have been built around binary Merkle trees
Today, we have three types of cryptographic data structures in Ethereum: SHA256 binary trees, SHA3 RLP hashed lists, and hexary Patricia trees. Once we complete the transition to SSZ, we'll be down to having two: SHA256 binary trees and Verkle trees. **In the longer-term future, once we get good enough at SNARKing hashes, we may well replace both SHA256 binary trees and Verkle trees with binary Merkle trees that use a SNARK-friendly hash - one cryptographic data structure for all of Ethereum**.