owned this note
owned this note
Published
Linked with GitHub
# DHT based State Network
This note attempts to *loosely* specify a design for a DHT network that satisfies the use cases outlined in the [state network use cases forum post](https://ethresear.ch/t/state-network-use-cases/7634).
## Overview
The network is based [Kademlia](https://en.wikipedia.org/wiki/Kademlia).
### Keyspace
The network operates on a 256-bit keyspace. Both nodes and content share this keyspace.
### Nodes
A **Node** is an individual network participant. They are uniquely identified by a `node_id` which is a 256-bit value derived from their public key.
### Content
Content on the network is identified by a key `K` or it's content identifier `content_id`. The content identifier is derived from the key using `hash(K)` where `hash` could be any agreed upon cryptographically secure hash function.
The null content, aka, the empty bytestring is explicitely forbidden in this network.
### Distance Function
The distance function for computing the distance between two nodes is a bitwise *or* operation.
### Routing Table
Each node maintains a routing table with 256 buckets, deliniating the key space by `log(distance)`.
> TODO: expand on this in more detail or link out to an existing description (discv5?)
### Advertisement Table
Each node maintains information about where content can be found on the network. These advertisements *may* be specific keys, **or** could use an ARF to advertise contiguous ranges.
Nodes **may** populate their advertisement table with any data they choose.
By *convention* nodes populate their advertisement table with advertisements for content whos `content_id` is *close* to their `node_id`. Recommended client design is to maintain three of these tables.
1. content that I have
2. LRU cache of content that was recently accessed/requested
3. Size bounded store of content advertisements for content that is *near* my `node_id`, evicting the content that is furthest away when the storage limit is reached.
TODO: Advertisements need some sort of TTL.
TODO: Conventions around verifying content is indeed available at the advertised location.
TODO: Analysis of what advertisements would look like for different storage profiles. Broadcasting these advertisements for large amounts of content will require a prohibitively large number of advertisement messages if the advertisement system is not optimized (such as with ARF).
## Conventions
The network operates on the following conventions.
### Finding Content
Nodes with a `node_id` that is *close* to the `content_id` for a piece of content are expected to know *where* that content can be found. Those nodes are **not** responsible for hosting the content.
### Hosting Content
The network makes no assumptions about what content is hosted by what nodes.
Conventions *may* make it more likely that content is hosted by the nodes which a `node_id` close to the content's `content_id`, but this cannot be assumed, and will often not be the case.
By *convention* nodes would likely maintain three stores of content.
1. Persistent/Pinned content that is perminently hosted by the node and is not necessarily subject to eviction or size limits.
2. LRU cache of content that was recently seen.
3. Size bounded store of content that is *near* my `node_id`, evicting the content that is furthest away when the storage limit is reached.
By convention, nodes would allocate a fixed size memory store that they populate with advertised content which is *near* their location in the network.
### Layering Semantic Meaning onto the keyspace
They keyspace of the network is intended allow nodes to assign semantic meaning to sections of the keyspace. This has two main implications.
#### Extra-Protocol Validation
There is extra-protocol validation that may be necessary for nodes to implement. For example, if block headers are stored under the key `block-header/<header-hash>`, then a node on the network should *reject* storing content that does not validate against the `<header-hash>` that is implied by the key.
The network itself makes no assumptions about what `key/content` pairs are valid, however, for most, if not all of the content we intend to store, the key will contain additional information that would allow the content to be validated.
#### Arbitrary keys
A client may only recognize certain ranges of the keyspace. The network **must** be built under the assumption that some nodes may refuse to support keys outside of supported ranges.
This **might** mean that we need to give nodes the ability to announce the keyspace they support.
## Networking
The details of the networking are TBD. Leading options include:
- UDP
- libp2p
### Messages
Nodes on the network operate using the following messages.
#### `Ping`
- Fields
- `request_id`: unsigned 16-bit integer
Sent by nodes with the expectation of a `Pong` message being sent in reply.
Used to measure latency or to check if a node is still online.
#### `Pong`
- Fields:
- `request_id`: unsigned 16-bit integer (from the `Ping` message)
Sent by nodes in response to a `Ping` message.
#### `FindNodes`
- Fields
- `request_id`: unsigned 16-bit integer
- `node_id`: 256-bit integer
- `distance`: 8-bit integer
Sent by nodes to request information about other nodes on the network.
Used to populate and maintain the routing table as well as finding nodes on the network that are near a specific location in the network.
> Note: we may need to use the raw key `K` instead of the `node_id` if we choose to use ARF to advertise whole sections of the keyspace.
#### `FoundNodes`
- Fields
- `request_id`: unsigned 16-bit integer (from the `FindNodes` message)
- `nodes`: Array of 256-bit integers
> Note that in a UDP based network, this message either needs a maximum number of nodes that can be returned **or** needs to be capable of sending multiple responses with the payload split up across those responses.
#### `Advertise`
- Fields
- TBD: Array of keys or maybe ARF
Sent by nodes to advertise content that they serve. Specifics about how we broadcast advertisements are TBD, but they should take the following into account.
- Ability to advertise whole sections of the keyspace:
- I have block headers 1-1000.
- Ability to to advertise a single piece of content:
- I have a specific block header.
Content likely needs to be advertised by `key` so that nodes can choose how they wish to handle it based on the semantic meaning implied by they key.
#### `Locate`
- Fields
- `request_id`: unsigned 16-bit integer
- `key`: The key of the content to be located.
Sent by nodes to request the node ids of nodes that are known to host the content indicated by the provided `key`.
#### `Locations`
- Fields
- `request_id`: unsigned 16-bit integer (from the `Locate` message)
- `nodes`: Array of node IDs that are known to have the content.
Sent in response to a `Locate` request.
#### `GetContent`
- Fields
- `request_id`: unsigned 16-bit integer
- `key`: The key of the content to be located.
Sent by nodes to request a piece of content
#### `ContentData`
- Fields
- `request_id` : unsigned 16-bit integer (from the `GetContent` message)
- `data`: raw bytes of the content.
Sent in response to a `Locate` request. Contains the raw bytes of the content.
> Note: If using `UDP` this message will need to be able to be spread across multiple responses to accomodate content sizes that exceed the UDP maximum packet size.
## The `eth` keyspace
As mentioned above, the keyspace of the network carries semantic meaning. The following keyspaces are defined to support the use cases outlined [here](https://ethresear.ch/t/state-network-use-cases/7634)
TBD: For each of these keyspaces, gather data on current total size for current mainnet history as well as projected growth rate. Use this to determine total number of keys that need to be stored in the network and the rate new keys will be added. This data will help inform the amount of data each node in the network would need to be responsible for in order for the network to be "healthy".
### Common Prefix
- `<CP>` (short for common-prefix)
All keys will begin with a common prefix to establish a namespace for the 1.x network.
The exact common prefix is still TBD but is likely to contain some or all of the following:
- `eth1`: To denote the "Ethereum" use case and specifically the 1.x use case.
- `version`: To allow "upgrades" to the key semantics used by the network.
- network ID: To differentiate between data from different networks.
- fork ID: to differentiate between data from different chain or forks (???)
### Headers
- key: `<CP>/block/<hash>/header`
- content: RLP encoded block header
Content can be verified by computing the expected block hash from the content and checking it against the hash from the key.
### Canonical Header Index
- key: `<CP>/canonical-header/<number>`
- content: TBD
- header hash
- inclusion proof
Content cannot currently be verified. Verification would require a protocol change to add a header acccumulator to the block header.
### Block Transactions
- Bulk:
- key:`<CP>/block/<hash>/transactions`
- content: RLP encoded ordered array of the transactions from the block.
- Single
- key: `<CP>/block/<hash>/transaction/<index>`
- content: RLP encoded single transaction plus inclusion proof against corresponding header `transaction_root` field.
- TBD: format of inclusion proof.
> Note: The "single" transaction key is not strictly necessary. We should evaluate whether it is justified by the agreed upon use cases.
### Block Uncles
Same as "Block Transactions" but with `/uncles/` in the key.
### Receipts
Same as "Block Transactions" but with `/receipts/` in the key.
### Canonical Transaction Index
- key: `<CP>/canonical-transaction/<hash>`
- content: TBD
- block hash and transaction index
- ??proof??
> TBD: How would the content be verified and what optimizations can we make to reduce the number of requests needed to fully verify the response data.
### Witnesses
TBD
### Accounts and State
TBD
At this time accounts and contract state are not specified. Further research is needed to understand how this data can be addressed, retrieved, and kept up-to-date efficiently.
Problems that need to be solved here are:
- assuming all of the data is present in the network, how does it stay up-to-date
- what format does the data take when stored in the network
- Doing it `GetNodeData` style with trie hashes is bad for the same reasons that `GetNodeData` is bad, enforcing the naive database layout and prohibitive towards flat database layout.
- Storing leaves individually means a very large number of small pieces of data. This would stress the advertisement system as well as causing a huge number total number of round trips necessary to retrieve large parts of the data.
- Storing the data in chunks is viable for accounts since the data is uniformly distributed. It's unclear how this can be done for contract storage since those tries are not balanced.
- Each piece of data needs an accompanying proof. We need to look at use cases to better understand this requirement.
#### Account Data
Usage profile:
- JSON-RPC use cases:
- `eth_getBalance`, `eth_getTransactionCount`, `eth_getCode` all exibit behaviors of granular data access. Clients serving these requests would be hitting the network with requests for very small pieces of data.
- Access a single account.
- No bulk access.
- Patching up state database
- SNAP sync has a phase at the end where it patches up the state database using `GetNodeData`. Need to look into what this usage profile looks like.
- We cannot address the data using trie node hashes like `GetNodeData` so we need to look at how requests would be shaped (trie path?)
- Beam Sync on-demand stateless block execution
- Will be difficult since beam sync is very sensitive to latency.
- Might be improved by being able to request the exact data we want rather than having to step into the trie like is done with `GetNodeData`.
- Achieving viable efficiency might require having access lists available to be able to parralellize the requests up front.
## Terminology
### Adaptive Range Filters (ARF)
Abbreviated to ARF
See: https://ethresear.ch/t/sharding-state-with-discovery-and-adaptive-range-filter-ads/7573
A data structure similar to bloom filters but for checking whether a specific *range* is included. Potentially useful for a more efficient advertisement mechanism for nodes that serve entire ranges of the keyspace.
### Skip Graph (SG)
- See:
- https://en.wikipedia.org/wiki/Skip_graph
- https://ethresear.ch/t/explorations-into-using-dht-skipgraph-for-chain-and-state-data-retrieval/7337
A distributed data structure which allows for range queries over the keyspace in a DHT. Potentially useful for efficient retrieval of entire ranges of the keyspace.