owned this note
owned this note
Published
Linked with GitHub
# "Merry Go Round" sync
## Motivation
Here we define a sync algorithm with the following goals and properties.
### Bittorrent style swarming
The network should be healthy even in the face of a small number of full nodes serving an arbitrarily large number of syncing nodes. A syncing node which is only connected to other syncing nodes should still be highly performant.
### Limit protocol abuse
We aim for a protocol that mitigates against un-intended use cases such as a stateless client attempting to dynamically fetch state data to serve an `eth_call` JSON-RPC request.
### Efficiency
Efficient bulk transmission of the trie data while maintaining minimal transmission of intermediate tree nodes.
## Terminology
### Sync Epoch
We define *Sync Epoch* or more concisely an *Epoch* to be the number of blocks during which a certain part of the state trie is actively being synced. The constant `EPOCH_SIZE` is used to represent this number of blocks.
We define *Epoch Boundary* to be any block which satisfies the condition `Block.number % EPOCH_SIZE == 0`.
### Prefix Ranges
We treat the main account state trie and all of the contract storage tries as a single tree with a 65 byte keyspace. The first 32 bytes of a key represent the path into the main account state trie. The 33rd byte is to namespace the sub-tries, currently only supporting the state trie. The last 32 bytes of a key represent the path into the contracts storage.
```
ACCOUNT_TRIE_PREFIX{1,32}
ACCOUNT_TRIE_KEY{32} [+ 0x00 + STORAGE_TREE_PREFIX{1,32}]
```
We define the term *Prefix* to mean a key which has had some of it's trailing bits truncated. We use prefixes to refer to contiguous ranges of the state trie. A prefix covers a leaf of the tree if its path in the tree begins with the given prefix.
TODO: once we have merklized code we will need a mechanism to differentiate between storage and merklized code.
Nodes use prefixes to communicate which sections of the tree they have. This is done by constructing the minimal set of prefixes which cover the sections of the tree they have fully synced.
TODO: examples of prefixes and what they cover
### Hot Spot
The *Hot Spot* is a path in the state trie from where clients will sync the state. A hot spoc is active for the duration of an epoch. The protocol makes no concrete rules about what state should be accessible for a given hot spot, but the general rule is that state which is too distant from the hot spot is not likely to be available.
The path for a hot spot is determined from the block hash for the boundary block in a given epoch. We then expand this out to a full 65 byte key with the later 33 bytes being set to zero.
## Synchronization Algorithm
Nodes will continually keep each other up to date which what sections of the tree they have fully synced. A fully synced client would transmit an array with the empty string `['']` to denote that they have a full copy of the state trie. A client with an empty database would transmit an empty list to denote that they have no data in their trie.
As clients complete new sections of the tree they should occasionally update their connected peers with an update prefix list.
At the beginning of each epoch we use the 32 byte hash of the boundary block to dertermine the current hot spot. The full path for the hot spot is defined as the 64 byte key such that the first 32 bytes are denoted by the block hash with the last 32 bytes set to zero.
A client would then construct a proof which covers the section of the tree around the hotspot. The client then iterates through their connected peers, transmitting the chunks of the proof that the connected client does not have.
TODO: deterministic chunking so that multiple seeders seed the same data.
> TODO: This section needs more detail and thought put into the proof production. Specifically, what is the algorithm for constructing a witness for a peer given the set of prefixes they have broadcasted to indicate what parts of the trie they already have.
### Proof Availability
###
## Protocol
### `Announce`: `0x00`
- Probably include forkid
- List of prefixes (with some reasonable max size)
- Maybe include a flag for whether the node is interested in proofs. This would allow a node with an incomplete state database to still act as a seeder.
### `ProofsAvailable`: `0x01`
A client serving proofs would advertise the availability of a proof as a 2-tuple of `[state_root, prefix]`.
- Probably anchor to a specific block hash (epoch)
- Probably specify a prefix for which the advertised proof will cover
### `GetProofs`: `0x02`
A client requesting a proof asks for it using a 2-tuple of `[state_root, prefix]` which is subject to a validity check by the server that the `prefix` falls under one of the advertised prefixes from a previous `ProofsAvailable` message. The requested `prefix` may be longer than the advertised prefix to specify an more precise part of the tree.
> TODO: In order to allow omission of intermediate tree data that the requester already has, maybe this should have a 3rd parameter that indicates where in the tree the proof should begin.
- Request id
- Request proofs that were advertised by a `ProofsAvailable` message.
### `Proofs`: `0x03`
The response to a `GetProofs` request. Contains one of more complete proofs for the requested proofs.
> TODO: Make sure we support the case where a server advertises a very broad prefix, the client requests the whole prefix which is too large to transmit in a single chunk. The server should then construct a depth-first proof from the given prefix that covers at least one leaf. Upon receiving an impartial (but provable) response to their request, the client can then re-request using a new more precise prefix
- Request id (from the corresponding `GetProofs` request)
- The proofs
## Paths not taken
### Fully push centric protocol
Originally, proofs would just be pushed to those that need them. This proves problematic in many cases. The simplest is a node with an empty database connected to multiple nodes with the data for the current hotspot. In a pure push-based model the syncing node would receive duplicate proofs for the same data from their connected peers.
By using a pull model, the syncing peer can distribute requests across its connected peers to balance out requests for the current hotspot across multiple nodes.
### Fully pull based protocol
The protocol *could* just have `GetProofs` and `Proofs` messages. This proved problematic because it places *implicit* requirements on those serving data. An *overloaded* full node operating in this context would have to either not respond, or issue empty responses to requests for proofs. This could result in clients over requesting data from their peers since they have fewer guarantees about what data a given peer will respond with.
Adding the `ProofsAvailable` message allows those serving data to explicitely broadcast what parts of the state they are willing to serve to any given peer.
### Hard restrictions on state access far away from the hotspot
The protocol intentionally doesn't put any firm rules on how close the hotspot data must be to be available within the current epoch. This allows the protocol to self-adjust based on the overall size of the tree. For very small chains, the entire tree could be served in a single epoch. For extremely large trees, servers can choose to heavily limit state availability to the data that is very close to the current hotspot.
In addition, this allows for clients which are very close to having synced the entire tree to finish syncing even when the hotspot is not close to the data they are missing. This would still require a node with that data to be willing to offer the data.
### No fixed cycle length
You may notice that the use of block hashes results in there being no defined cycle length for full coverage of the state. This is intentional. Knowledge of the state size is both imprecise and only available to those with a full copy. Similarly, the size of the state changes over time and across different chains. Additionally, nodes have different amounts of available bandwidth and processing power. These variables mean that we cannot define a number of epochs that will work for any chain size, nor can we use the known state size to derive number of epochs since different clients sync at different rates.
The use of block hash to define hot spots and providing the flexibility to servers to decide how much of the state to serve around any given hotspot allows the protocol to scale smoothely for both different sized states, and clients with different amounts of bandwidth and processing power.