# Code Merkleization Practical Implementation & Analysis **Author(s)**: Hugo De la cruz (Geth CM implementation/Analysis), Ipsilon Team. **Acknowledgement**: Sina Mahmoodi for contribution to Geth/FastSSZ code merkleization implementation **TLDR:** In this work we implemented Code Merkleization in go-ethereum, including code proof generation (using FastSSZ). We have analyzed the performance (in terms of proof size) of different proof implementations on ~1M mainnet blocks. Lastly we provide direction for further work using Verkle Trees. [TOC] ## Introduction According to [EIP-2926](https://eips.ethereum.org/EIPS/eip-2926#motivation), after transitioning from hexary to binary trie, proof hashing is reduced by 3x, making contracts' bytecode the first contributor to block witness size. The EIP proposes and defines how to break and merkleize the contract's bytecode into `32` bytes chunks, this way stateless clients would only need the chunks touched during transaction execution. This document shows an analysis made in ~1M non-empty blocks from Ethereum mainnet (blocks from `8,052,259` to `9,101,941`, `1,016,695` in total). In order to get the information needed some modifications were added to the [geth](#geth-changes) to analyze transaction execution during mainnet full sync. For each of the interpreted instructions in the executed transaction it calculates the chunk corresponding to the current opcode position, the chunks are merkleized and a proof is generated for each transaction based on the touched chunks list. This allows to calculate the proof size for a given block by summing up the proof size of each transaction. As per EIP-2926, the contract code is merkleized and the resulting coderoot is stored in the account. We have implemented two merkleization approaches: an earlier version of EIP-2926 using RLP, and the one presented in this document using go-ethereum with [SSZ merkleization](https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md#merkleization). The proof for the touched chunks (code proof) is encoded using the [Proof Format](#Proof-Format). Finally the proof can be serialised using various methods, including RLP and SSZ. While we planned to also use SSZ for proof serialisation, this work only implements RLP serialisation. ## Proof Format To generate proofs, the relevant tree is built in memory. We assign an index to each node going left-to-right top-to-bottom, and collect information which node is a leaf, is empty, or is an intermediate hash. Before encoding/serialisation, fastssz generates two kinds of proofs, which we named Baseline Proofs and Slim Proofs. **Baseline Proofs** is the proof format defined in [the SSZ specification as Merkle multiproofs](https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/merkle-proofs.md#merkle-multiproofs). Merkle Multiproofs refer to the minimum subset of nodes needed to authenticate that a set of nodes are part of a Merkle tree. For example, in the following tree with 8 leaves, in order to prove that leaves 8, 9 and 14 are part of the tree, we need to provide the auxiliary nodes 5, 6 and 15. By providing these nodes we can calculate the tree root and authenticate the nodes are part of the Merkle tree. ```graphviz digraph G { graph [ordering="out"] empty1 [label="e1", style="invis"] empty2 [label="e2", style="invis"] empty3 [label="e3", style="invis"] empty4 [label="e4", style="invis"] empty5 [label="e5", style="invis"] empty6 [label="e6", style="invis"] one [label="1"] two [label="2"] three [label="3"] four [label="4"] five [label="5", color="red"] six [label="6", color="red"] seven [label="7"] eight [label="8", color="green"] nine [label="9", color="green"] ten [label="10"] eleven [label="11"] twelve [label="12"] thirteen [label="13"] fourteen [label="14", color="green"] fifteen [label="15", color="red"] one->two one->three two->four two->empty1 [style="invis"] two->empty1 [style="invis"] two->five three->six three->empty2 [style="invis"] three->empty2 [style="invis"] three->seven four->eight four->empty3 [style="invis"] four->empty3 [style="invis"] four->nine five->ten five->empty4 [style="invis"] five->empty4 [style="invis"] five->eleven six->twelve six->empty5 [style="invis"] six->empty5 [style="invis"] six->thirteen seven->fourteen seven->empty6 [style="invis"] seven->fifteen } ``` We defined Baseline proofs containing the following fields: - **Indices**: The indices in the trie of all elements needed in the proof (Metadata and Touched Chunks). - **Hashes**: (Referred as **Proof** in SSZ spec): The auxiliary nodes which corresponds to the intermediate hashes of the auxiliary leaves and the hashes of leaves being proved. - **Leaves**: The leaves corresponding to metadata or touched chunks (FIO+Code chunk) **Slim Proofs** (aka Optimized Merkle Proofs) are proofs based on Baseline/Merkle multiproofs, and have the optimization of not including empty/zero leaves and the resulting hashes of empty leaves. We precalculate the result of hashing zero leaves, and subsequent hashes at different levels from bottom to the top of the tree. In case a hash corresponding to a zero element is found, we store the "hash level" in the `ZeroLevels` field and put an empty record for that level in the `Hashes` field. Slim proofs contain the following fields: - **Indices**: The indices in the trie of all elements needed in the proof (Metadata and Touched Chunks) - **ZeroLevels**: The levels that are known to be the result of the hash of zero elements. - **Hashes**: The intermediate hashes of the elements needed in the proofs, in this case it contains `nil` for the hashes of "Zero elements/levels". - **Leaves**: Leaves corresponding to metadata or touched chunks (FIO+Code Chunk) For these two kinds of proofs (Baseline and Slim Proofs) we are storing the following information in a log file: - **Slim Proof Size**: Sum of the sizes of **Slim** proof fields (Indices, ZeroLevels, Hashes, Leaves) from all the executed transactions. - **Size of Slim Proofs encoded as RLP**: Sum of Slim Proof from all executed transactions encoded as RLP. - **Size of Baseline proofs encoded as RLP**: Sum of Baseline proofs from all executed transactions encoded as RLP. - **Size of Baseline proofs compressed using [Snappy compression](https://en.wikipedia.org/wiki/Snappy_(compression))**: Sum of Baseline proofs from all executed transactions compressed using snappy compression. *Note: Snappy Compressed Slim Proofs were not considered as they are only better than Snappy Compressed Baseline Proofs for small proof sizes*. Based on this log file we extracted and interpreted the information presented in the [results section](#Results). ## Geth Changes The changes were implemented in a custom [Geth](http://github.com/sina/go-ethereum) branch ([code-merkleization-ssz-stats](https://github.com/s1na/go-ethereum/tree/code-merkleization-ssz-stats)) based on commit [80abd8a7](https://github.com/ethereum/go-ethereum/tree/80abd8a72e352c3730ba5c9a9502ef1682a3ddc9). A new schema was created to represent the Code Trie: - Hash: `[]byte` - Metadata: - Version: `uint8` - CodeHash: `Hash`, size: `32` - CodeLength: `uint16` - Chunk - FIO (First Instruction Offset): `uint8` - Code: `[]byte`, size: `32` - CodeTrie - Metadata: `Metadata` - Chunks: `[]Chunk`, size: `1024` Based on this schema, [fastssz](#SSZ-Changes) can generate the needed code to merkleize the tree, as well as generate and verify proofs. A new flag `--codemerkleization` was added to geth to indicate we want to collect code merkleization data when running the node. When this flag is set, the node will start collecting information about the touched chunks for each transaction executed in a block, at the end of block execution fastssz generates a tree containing the following structure (based on the merkleization schema) for each of the contracts: ```graphviz digraph G { graph [ordering="out"] empty1 [label="e1", style="invis"] empty2 [label="e2", style="invis"] empty3 [label="e3", style="invis"] empty4 [label="e4", style="invis"] empty5 [label="e5", style="invis"] one [label="1"] two [label="2"] three [label="3"] four [label="4"] five [label="5"] six [label="6", color="blue"] seven [label="7", color="green"] eight [label="8", color="green"] nine [label="9", color="green"] ten [label="10", color="green"] eleven [label="11", color="orange"] twelve [label="...", color="white"] thirteen [label="...", color="white"] one->two one->three two->four two->empty1 [style="invis"] two->empty1 [style="invis"] two->five three->six three->empty2 [style="invis"] three->empty2 [style="invis"] three->seven four->eight four->empty3 [style="invis"] four->empty3 [style="invis"] four->nine five->ten five->empty4 [style="invis"] five->empty4 [style="invis"] five->eleven six->twelve six->empty5 [style="invis"] six->empty5 [style="invis"] six->thirteen } ``` - :green_book: Leaves 7 to 10 corresponds to metadata: - 7: Number of chunks - 8: Version - 9: Code Hash - 10: Code Length - :orange_book: 11: Always empty - :blue_book: 6: Root of a tree of depth 10 containing the Code Chunks - Current maximum code size is 24576 bytes (768 x 32-byte-chunks) - A depth 10 binary tree allows to have 1024 leaves, enough to store the maximum number of chunks (768) - Each leaf corresponds to `[FIO, Chunk]` Based on this trie a proof is generated for all the touched chunks during block execution. Information about the proof sizes in different representations as well as the total contract's code size is logged into a [file (`results.csv`)](https://github.com/hugo-dc/cm-analysis/tree/master/data) containing the following fields: - **Block Number** - **Code Size**: The sum of all executed contract's code sizes in the specific block. - **ProofSize**: The sum of the lengths of `Indices`, `ZeroLevels`, `Hashes` and `Leaves`. - **RLPSize**: Length of a **Slim** proof encoded as RLP - **UnRLPSize**: Length of a **Baseline** proof encoded as RLP - **SnappySize**: Length of a **Baseline** compressed using [Snappy compression](https://en.wikipedia.org/wiki/Snappy_(compression)) - **Indices**: Total indices in the **Slim** proof - **ZeroLevels**: Total ZeroLevels (**Slim** proof) - **Hashes**: Total Hashes in the **Slim** proof - **Leaves**: Total Leaves ## SSZ Changes Chunks are merkleized using SSZ Merkleization which was implemented as part of this project in the [FastSSZ](https://github.com/ferranbt/fastssz) library. This implementation includes: - New structures/scheme to represent a binary tree - Proof Generation - Proof Verification [Appendix A](#Appendix-A-FastSSZ-changes) shows the list of changes implemented in FastSSZ. ## Results Analyzing `1,016,695` blocks (from block `8,052,259` to `9,101,941`) shows the following results. Summing up sizes of contracts executed in blocks we get an average `383,297` bytes per block. The following table shows the average sizes of various types of proofs and sizes relative to the total size of contracts executed in blocks: | Proof Type | Avg. Size |Saving compared to verbatim bytecode| |---------------------------|-------------:|-----------------------------:| | Baseline | 405,677| -5.2% | | Slim | 279,160| 27.56% | | Snappy Compressed Baseline| 159,031| 58.66% | Overall we see Baseline proofs on average have an increase of 5% compared to verbatim bytecode, Slim proofs save about `27%`, and `58.3%` savings in Snappy compressed Baseline proofs. We have observed some exceptions (see [Appendix B](#Appendix-B-Cases-where-Proof-Size-is-bigger-than-Code-Size)) where the Slim proofs (`0.78%` of the total analyzed blocks) and the Snappy compressed Baseline proofs (`0.0098%` of the total analyzed blocks) get a bigger size than the total code size. This is a usual pattern when we find blocks executing contracts with very small code size. In this case it is more convenient to send the actual contract byte code for execution instead of the generated proofs. ## Code Merkleization and Verkle Trees There is a new [proposal](https://notes.ethereum.org/@vbuterin/verkle_tree_structure) that supersedes Code Merkleization. In it the entire state is unified into a one single *[Verkle Tree](https://dankradfeist.de/ethereum/2021/06/18/verkle-trie-for-eth1.html#introduction-to-verkle-tries)*, where the code will be stored in one or a few subtrees. It too, uses `32` byte chunks, but with the difference that the first byte corresponds to `FIO` (First Instruction Offset, or, index of the first byte which is not part of `PUSH` data), and the last `31` bytes correspond to the actual code chunk. Another major difference is that Verkle Trees are used for the code trie instead of Merkle Trees. An experiment similar to the one presented in this work could be done for Verkle Trees, replacing the FastSSZ library and using a Verkle Tree implementation. This would allow to have an estimate for Proof Sizes when using Verkle Trees, which may vary because of the differences mentioned before. Should this be done in geth, the code chunk analysis implementation can be reused. ## Work on other chunk sizes Althought this document presents the results of Code Merkleization for `32` byte chunks according to the specification in [EIP-2926](https://eips.ethereum.org/EIPS/eip-2926#motivation), different chunk sizes (24, 32, and 40) were also implemented in the mentioned [geth branch](https://github.com/s1na/go-ethereum/tree/cm/chunk-size) and in fastssz. We have encountered a hard to debug issue resulting in "bad blocks", when using smaller than 32-byte chunks. This problem was not fixed due to time limitation. Assuming it is fixed, the work presented here can be reused to analyse the chain using various chunk sizes, including 31 used by the Verkle proposal. --- ## Appendix A: FastSSZ changes The following is a list of changes made in FastSSZ in order to add the features that allows proof generation and verification: - [#26 - Custom hash function (keccak256)](https://github.com/ferranbt/fastssz/pull/26). - [#27 - Fix hashing non-64-bit uints](https://github.com/ferranbt/fastssz/pull/27). - [#28 - Add testing schema, add single branch verification for Metadata struct](https://github.com/ferranbt/fastssz/pull/28). - [#32 - Verify Multiproof](https://github.com/ferranbt/fastssz/pull/32). - [#33 - Generate Multiproof](https://github.com/ferranbt/fastssz/pull/33) - [#38 - Feature: tree-backing, multiproof generation and verification](https://github.com/ferranbt/fastssz/pull/38). - [#39 - Remove duplicate leaves from proof](https://github.com/ferranbt/fastssz/pull/39). ## Appendix B: Cases where Proof Size is bigger than Code Size ### Cases where Slim proof encoded as RLP is bigger than Code Size of executed contracts in a block In the results we also found `8,024` blocks (about the `0.78%` of total analyzed blocks) where the RLP size is bigger than the actual Code Size. On average it is bigger by `10%` in these blocks. |Increase |Blocks | %| |:---------|-------:|----------:| | 0-9% | 5537 | 69.01% | | 10-19% | 1446 | 18.02% | | 20-49% | 892 | 11.12% | | 50-99% | 110 | 1.37% | | 100-399% | 37 | 0.46% | | 400%+ | 2 | 0.025% | | **Total**|**8024**|**100.00%**| We can see the majority of these cases (`98%`), the size of proof encoded as RLP is increased between `0` and `50%`. We also see cases where the proof size is many times bigger than the contract's bytecode size, this happens when blocks are executing some contracts with very small code size: | Block | CodeSize | RLPSize | RLPSize / CodeSize | |------------:|-----------:|----------:|---------------:| | 8654146 | 45 | 414 | 920.0% | | 8904950 | 45 | 414 | 920.0% | Block [8654146](https://etherscan.io/txs?block=8654146) only contains two transactions, one transaction calling an EOA, and the other one calls a [contract](https://etherscan.io/address/0xa3296c06d9caf0c332f5718aa310831bb15a23e0#code) where the bytecode length is 45. ### Cases where Snappy compressed Baseline proof is bigger than Code Size of executed contracts in a block We see that Snappy compressed proofs is much better, in this case we only found `100` blocks (about `0.0098%` of total analyzed blocks) where the proof compressed using Snappy compression is bigger than the Code size. |Increase | Blocks | % | |----------|--------:|--------:| | 0-99% | 86 | 86.00% | | 100-399% | 12 | 12.0% | | > 400% | 2 | 2% | |**Total** | **100** | **100%**| As in the previous case, the size of snappy compressed proofs are usually big in blocks executing contracts with very short code size. Examples: | Block | CodeSize | SnappySize | SnappySize / CodeSize | |--------:|-----------:|-------------:|------------------:| | 8074331 | 723 | 1531 | 211.757% | | 8103738 | 723 | 1531 | 211.757% | | 8229250 | 723 | 1531 | 211.757% | | 8364819 | 723 | 1531 | 211.757% | | 8820145 | 723 | 1531 | 211.757% | | 8836466 | 723 | 1531 | 211.757% | | 8939806 | 723 | 1531 | 211.757% | | 8363515 | 365 | 790 | 216.438% | | 8860675 | 365 | 790 | 216.438% | | 8729492 | 540 | 1258 | 232.963% | | 8735355 | 540 | 1258 | 232.963% | | 8728364 | 536 | 1385 | 258.396% | | 8654146 | 45 | 477 | 1060% | | 8904950 | 45 | 477 | 1060% |