owned this note
owned this note
Published
Linked with GitHub
# Notes on Merkle -> Verkle Transition
see also Pari's notes https://notes.ethereum.org/@parithosh/verkle-transition
and current open questions:
https://notes.ethereum.org/@rudolf/verkle-questions
The purpose of this note is to consolidate proposed conversion methods and put some more focus on details that are underspecified, notably interactions between the conversions happening due to
- reads in post-freeze blocks
- writes in post-freeze blocks
- sweeping (i.e. automatic conversion of all "old" Merkle stuff to Verkle stuff)
We focus on conversion methods that, at least for some period of time, have multiple trees storing either Merkle or Verkle entries, with some trees having precendece over others (i.e. some variant of overlay tree).
This means that if we want to look up a value, we first look it up the higher-precendece tree. If it's there, fine.
Otherwise, look it up in a lower-precendence trees.
For simplicity, we do not distinguish the account from the storage tree.
**Note: The purpose of the Verge is to enable statelessness by having a data structure more amenable to proofs about the state. As such, the conversion Merkle to Verkle and consolidating trees are treated as separate, but related concerns.**
## Q1: How many trees do we want to end up with
Options are really
- 1 Verkle tree
- 1 Verkle tree for new nodes and 1 Merkle for old
- 1 Verkle tree for new nodes and 1 Verkle for old
## Q2: Freezing Merkle
At some point, we want to freeze the Merkle tree and make all new data go into the overlay tree.
Note that when taking a closer look at some of the proposed methods, it turns out that it might be possible to only have to freeze the set of Merkle **keys** (not the values). This allows certain tradeoffs.
Options:
- Freeze Merkle tree at transition start
- Freeze Merkle tree keys at transition start
## Q3: How to distribute preimages
Realistic options are
- Start recording new preimages prior to transition. Distribute historic preimages somehow (EIP 6873)
- Start sweeping only some time after the transition has begun. This means that the set of Merkle keys has finalized and we have a lot of time to collect, sort and distribute them.
Due the added advantage that there is much less that can happen with reorgs around the transition boundary changing the set of Merkle keys, I strongly favor the second option, with enough time to squeeze in an emergency release.
**We currently have 3 proposed interacting ways to trigger a conversion of Merkle to Verkle:**
- **Reading**: When *reading* data that was not found in the highest-priority (Verkle) tree, move it to highest-priority tree (thereby converting to Verkle)
- **Writing**: When *writing* new data, it should go into some tree, probably the highest-priority (Verkle) tree.
*Note: This is actually not neccessarily true or the best idea, depending on whether there was a prior value for that key or not*
- **Sweeping**: We may have some proposed way to convert all old Merkle data into Verkle data in key-order. This works by each (non-empty?) block converting some amount of data.
## Q4: Do we convert on read?
Most proposals ask/imply that, during the transition, any key that was read (and found in the old Merkle tree) will be converted to Verkle and added to the newest tree. If we have sweeping anyway, this is really an optimization to avoid having to lookup in two trees for keys that are frequently accessed. In theory, we could do without, so options are
- have reads trigger a conversion
- do not convert upon read-only access and rely on sweeping
## Q5: Where to write new data?
New data for keys that have not preexisted before should go into the newest Verkle tree. For keys that have preexisted, there are actually more options:
- Write all new data to newest Verkle tree. Due to the nature of being an overlay tree, this makes whatever the Merkle tree stored irrelevant
- Write data for pre-existing keys into the tree that currently holds that data, which might be the Merkle tree. Let sweeping then take care of the conversion.
*Note that this modifies the old tree, but not its keys, so this is only an option if we answered Question 2 accordingly*
## Q6: Number of nodes to convert by sweeping fixed or not
Currently the number of entries to be converted due to sweeping is informed by what IO can manage. There are multiple options to set its value, however:
- fix a number to be converted via sweeping per slot
- make the number to be converted via sweeping dependent on gas consumed / IO per slot
- make that number to be converted via sweeping dependent on conversions due to reads / writes (e.g. make their total number constant)
## Q7: What to do when sweeping encounters values that were converted by other means
When converting via sweeping, there may be values that were already converted due to reads/writes.
These reads/writes may be in this or in previous slots (may need differentiating)
This needs to be detected and accounted for. While there are many ways to do this (check Verkle tree or mark the Merkle tree), a relevant question is how this affects the number of entries to be converted. Options are:
- During sweeping, make a certain number of successful conversions
- During sweeping, iterate the boundary between converted/uncoverted by a fixed amount. This means that when values have already been converted by other means, the total number of writes to the Verkle tree is lower.
- If we don't convert on read (Q4) and write to the tree where data was (Q5), we can guarantee this never happens.
## Q8: Should we sweep during empty blocks
With some strategies, it may be possible to perform sweeping during empty blocks. Practically this means that for consensus, the block after an empty block has double the number converted due to sweeping.
## Q9: Where to write when sweeping
In principle, it would be possible for sweeping to target NOT the highest significance Verkle tree.
In such a setup, we have
* New Verkle tree.
* Old Merkle tree.
* Old Verkle tree.
Sweeping converts Old Merkle -> Old Verkle
Read/Write converts Old -> New (Note that lookups in the old trees always know which tree to target)
This ends up with two Verkle trees, one being an overla over the other, but decouples sweeping from read/write conversions. It is a mixture between the proposed state-expiry/local bulk conversion/overlay method that I did not find mentioned before.
Note that this may be done in way that reorgs don't affect sweeping (the issue is to avoid extra IO to undo and redo the same work)
So we have options
- Sweep to final (single) Verkle tree.
- Sweep to overlayed Verkle tree.
Note: I (Gotti) favor
having only 1 tree at the end (Q1, Q9)
freeze the Merkle tree (not just the keys) (Q2)
delaying the start of sweeping (Q3)
don't care about conversion from reads (Q4)
have all new stuff go to the Verkle tree (Q5)
fix the number of conversion *attempts* due to sweeping (Q6, Q7)
Not sweep during empty blocks (Q8)