owned this note
owned this note
Published
Linked with GitHub
# More efficient snap
After we complete a snap sync, we need to go through the entire trie (at root `X`) and generate the snapshot base layer (for root `X`).
This is problematic. Here's from one of the bootnodes:
```
Jan 22 11:25:07 bootnode-aws-ap-southeast-1-001 geth INFO [01-22|10:25:07.548] Resuming state snapshot generation root="bad86c…809831" in="0a2f1a…987b1a" at="f496f8…838431" accounts=4581561 slots=18608062 storage=1.63GiB elapsed=16h29m49.102s eta=398h11m43.786s
```
It has been generating for `16h`, and estimates another `~400h` to complete the generation. It will probably go a lot faster than that, but it's safe to say it's a matter of several days.
```
Jan 24 07:24:55 bootnode-aws-ap-southeast-1-001 geth INFO [01-24|06:24:55.010] Resuming state snapshot generation root="4d056b…703e42" in="0ef08c…34bc7d" at="efb101…ee8cb2" accounts=6722662 slots=36128084 storage=2.95GiB elapsed=60h29m36.564s eta=976h5m59.624s
```
After `60` hours, it estimates a further `976` hours until completion.
## How to not solve it
There's a naive way to avoid that: storing the snapshot data upon receiption. During the heal-phase, as before, we write the new values into the trie,
and also make sure to update the base-layer with any new changes.
The naive way suffers from two problems:
1. If an account has been selfdestructed from the time we first got it, to the time we do the trie-heal, then we won't successfully delete it
from the base layer.
2. If a storage value has been changed in the same interval, then
- We will successfully update the `storageRoot` of the account object, but not successfully update the slot-value in the base layer.
Can we work around these limitations?
## SELFDESTRUCTS handling
When healing, we obtain knowledge of all paths that were modified, starting with root.
Let's say we have root `X`, wich we fetch.
We then discover that we indeed already have path `0` (the leftmost 1/16th part of the trie).
We can now say, with certainty, that any keys `0x0xxxx` in our snap db are correct.
### Phase 0
What we can do, is create a `healtrie`. The `healtrie` is constructed during the healing-phase, and contains everything that was healed. Any parts of the `healtrie` which already existed, remains unresolved. This means that a `hash` in the `healtrie` represents a path-subset where our snab-db is correct.
Any non-hash node represents a path-subset where the snapdb was changed.
It's not a requirement to go "all the way", but we can choose to abort the tracking after a certain level (for example it might be good to stop once we reach the first shortnode, but experimentation will tell)
Note: The `healTrie` does not need to be a Merkle-Patricia trie: based on the cryptographic hash of the children. It can be a regular trie (prefix tree) with two types of leaves that just differentiates between "was present" (`hash`) and "had to be resolved" (`node`).
Note 2: the `healtrie` does not _have_ to include any/all leaf-data, such as actual accounts.
After the trie healing is done, we now have a `healtrie` of paths where things were modified. Perform these steps:
We can now traverse the healtrie, along the outer perimemeter.
- From the baselayer, delete all items that in the keyspaces for those paths.
- For each sub-path, iterate the trie, and fill the leaves into the baselayer.
### Phase 1
- Start at root. This path `0` marks the beginning of a delete-section. Traverse to the first `hash`, at path `pN`. This segment can be deleted.
- Continue from `pN+1`. Continue until the next `hash`, at path `pM`. Delete this segment.
- Continue until the trie is done.
### Phase 2
Our original state was:
> we have a base layer which is 99.9% correct, but it has some extra elements which are not present in the trie.
Our state after the deletion is:
> We now have a base layer which is 99.9% correct, but it has some missing elements.
So filling from the trie is a matter of iterating maybe only `1%` of the trie, instead of the whole thing.
Do the same iteration as phase 1, once again, but instead of deleting, fill the snap-db from the trie.
### Phase 1 + 2
In practice, I think phase 1+2 can be more optimally performed in one step (one iteration), but it's simpler to reason about as two separate phases.
## Storage modification handling
I haven't yet thought this fully through, but I think it can be handled analogous to the case above.