DHT based State Network

This note attempts to loosely specify a design for a DHT network that satisfies the use cases outlined in the state network use cases forum post.

Overview

The network is based Kademlia.

Keyspace

The network operates on a 256-bit keyspace. Both nodes and content share this keyspace.

Nodes

A Node is an individual network participant. They are uniquely identified by a node_id which is a 256-bit value derived from their public key.

Content

Content on the network is identified by a key K or it’s content identifier content_id. The content identifier is derived from the key using hash(K) where hash could be any agreed upon cryptographically secure hash function.

The null content, aka, the empty bytestring is explicitely forbidden in this network.

Distance Function

The distance function for computing the distance between two nodes is a bitwise or operation.

Routing Table

Each node maintains a routing table with 256 buckets, deliniating the key space by log(distance).

TODO: expand on this in more detail or link out to an existing description (discv5?)

Each node maintains information about where content can be found on the network. These advertisements may be specific keys, or could use an ARF to advertise contiguous ranges.

Nodes may populate their advertisement table with any data they choose.

By convention nodes populate their advertisement table with advertisements for content whos content_id is close to their node_id. Recommended client design is to maintain three of these tables.

  1. content that I have
  2. LRU cache of content that was recently accessed/requested
  3. Size bounded store of content advertisements for content that is near my node_id, evicting the content that is furthest away when the storage limit is reached.

TODO: Advertisements need some sort of TTL.
TODO: Conventions around verifying content is indeed available at the advertised location.
TODO: Analysis of what advertisements would look like for different storage profiles. Broadcasting these advertisements for large amounts of content will require a prohibitively large number of advertisement messages if the advertisement system is not optimized (such as with ARF).

Conventions

The network operates on the following conventions.

Finding Content

Nodes with a node_id that is close to the content_id for a piece of content are expected to know where that content can be found. Those nodes are not responsible for hosting the content.

Hosting Content

The network makes no assumptions about what content is hosted by what nodes.

Conventions may make it more likely that content is hosted by the nodes which a node_id close to the content’s content_id, but this cannot be assumed, and will often not be the case.

By convention nodes would likely maintain three stores of content.

  1. Persistent/Pinned content that is perminently hosted by the node and is not necessarily subject to eviction or size limits.
  2. LRU cache of content that was recently seen.
  3. Size bounded store of content that is near my node_id, evicting the content that is furthest away when the storage limit is reached.

By convention, nodes would allocate a fixed size memory store that they populate with advertised content which is near their location in the network.

Layering Semantic Meaning onto the keyspace

They keyspace of the network is intended allow nodes to assign semantic meaning to sections of the keyspace. This has two main implications.

Extra-Protocol Validation

There is extra-protocol validation that may be necessary for nodes to implement. For example, if block headers are stored under the key block-header/<header-hash>, then a node on the network should reject storing content that does not validate against the <header-hash> that is implied by the key.

The network itself makes no assumptions about what key/content pairs are valid, however, for most, if not all of the content we intend to store, the key will contain additional information that would allow the content to be validated.

Arbitrary keys

A client may only recognize certain ranges of the keyspace. The network must be built under the assumption that some nodes may refuse to support keys outside of supported ranges.

This might mean that we need to give nodes the ability to announce the keyspace they support.

Networking

The details of the networking are TBD. Leading options include:

  • UDP
  • libp2p

Messages

Nodes on the network operate using the following messages.

Ping

  • Fields
    • request_id: unsigned 16-bit integer

Sent by nodes with the expectation of a Pong message being sent in reply.

Used to measure latency or to check if a node is still online.

Pong

  • Fields:
    • request_id: unsigned 16-bit integer (from the Ping message)

Sent by nodes in response to a Ping message.

FindNodes

  • Fields
    • request_id: unsigned 16-bit integer
    • node_id: 256-bit integer
    • distance: 8-bit integer

Sent by nodes to request information about other nodes on the network.

Used to populate and maintain the routing table as well as finding nodes on the network that are near a specific location in the network.

Note: we may need to use the raw key K instead of the node_id if we choose to use ARF to advertise whole sections of the keyspace.

FoundNodes

  • Fields
    • request_id: unsigned 16-bit integer (from the FindNodes message)
    • nodes: Array of 256-bit integers

Note that in a UDP based network, this message either needs a maximum number of nodes that can be returned or needs to be capable of sending multiple responses with the payload split up across those responses.

  • Fields
    • TBD: Array of keys or maybe ARF

Sent by nodes to advertise content that they serve. Specifics about how we broadcast advertisements are TBD, but they should take the following into account.

  • Ability to advertise whole sections of the keyspace:
    • I have block headers 1-1000.
  • Ability to to advertise a single piece of content:
    • I have a specific block header.

Content likely needs to be advertised by key so that nodes can choose how they wish to handle it based on the semantic meaning implied by they key.

Locate

  • Fields
    • request_id: unsigned 16-bit integer
    • key: The key of the content to be located.

Sent by nodes to request the node ids of nodes that are known to host the content indicated by the provided key.

Locations

  • Fields
    • request_id: unsigned 16-bit integer (from the Locate message)
    • nodes: Array of node IDs that are known to have the content.

Sent in response to a Locate request.

GetContent

  • Fields
    • request_id: unsigned 16-bit integer
    • key: The key of the content to be located.

Sent by nodes to request a piece of content

ContentData

  • Fields
    • request_id : unsigned 16-bit integer (from the GetContent message)
    • data: raw bytes of the content.

Sent in response to a Locate request. Contains the raw bytes of the content.

Note: If using UDP this message will need to be able to be spread across multiple responses to accomodate content sizes that exceed the UDP maximum packet size.

The eth keyspace

As mentioned above, the keyspace of the network carries semantic meaning. The following keyspaces are defined to support the use cases outlined here

TBD: For each of these keyspaces, gather data on current total size for current mainnet history as well as projected growth rate. Use this to determine total number of keys that need to be stored in the network and the rate new keys will be added. This data will help inform the amount of data each node in the network would need to be responsible for in order for the network to be “healthy”.

Common Prefix

  • <CP> (short for common-prefix)

All keys will begin with a common prefix to establish a namespace for the 1.x network.

The exact common prefix is still TBD but is likely to contain some or all of the following:

  • eth1: To denote the “Ethereum” use case and specifically the 1.x use case.
  • version: To allow “upgrades” to the key semantics used by the network.
  • network ID: To differentiate between data from different networks.
  • fork ID: to differentiate between data from different chain or forks (???)

Headers

  • key: <CP>/block/<hash>/header
  • content: RLP encoded block header

Content can be verified by computing the expected block hash from the content and checking it against the hash from the key.

Canonical Header Index

  • key: <CP>/canonical-header/<number>
  • content: TBD
    • header hash
    • inclusion proof

Content cannot currently be verified. Verification would require a protocol change to add a header acccumulator to the block header.

Block Transactions

  • Bulk:
    • key:<CP>/block/<hash>/transactions
    • content: RLP encoded ordered array of the transactions from the block.
  • Single
    • key: <CP>/block/<hash>/transaction/<index>
    • content: RLP encoded single transaction plus inclusion proof against corresponding header transaction_root field.
      • TBD: format of inclusion proof.

Note: The “single” transaction key is not strictly necessary. We should evaluate whether it is justified by the agreed upon use cases.

Block Uncles

Same as “Block Transactions” but with /uncles/ in the key.

Receipts

Same as “Block Transactions” but with /receipts/ in the key.

Canonical Transaction Index

  • key: <CP>/canonical-transaction/<hash>
  • content: TBD
    • block hash and transaction index
    • ??proof??

TBD: How would the content be verified and what optimizations can we make to reduce the number of requests needed to fully verify the response data.

Witnesses

TBD

Accounts and State

TBD

At this time accounts and contract state are not specified. Further research is needed to understand how this data can be addressed, retrieved, and kept up-to-date efficiently.

Problems that need to be solved here are:

  • assuming all of the data is present in the network, how does it stay up-to-date
  • what format does the data take when stored in the network
    • Doing it GetNodeData style with trie hashes is bad for the same reasons that GetNodeData is bad, enforcing the naive database layout and prohibitive towards flat database layout.
    • Storing leaves individually means a very large number of small pieces of data. This would stress the advertisement system as well as causing a huge number total number of round trips necessary to retrieve large parts of the data.
    • Storing the data in chunks is viable for accounts since the data is uniformly distributed. It’s unclear how this can be done for contract storage since those tries are not balanced.
    • Each piece of data needs an accompanying proof. We need to look at use cases to better understand this requirement.

Account Data

Usage profile:

  • JSON-RPC use cases:
    • eth_getBalance, eth_getTransactionCount, eth_getCode all exibit behaviors of granular data access. Clients serving these requests would be hitting the network with requests for very small pieces of data.
      • Access a single account.
      • No bulk access.
  • Patching up state database
    • SNAP sync has a phase at the end where it patches up the state database using GetNodeData. Need to look into what this usage profile looks like.
    • We cannot address the data using trie node hashes like GetNodeData so we need to look at how requests would be shaped (trie path?)
  • Beam Sync on-demand stateless block execution
    • Will be difficult since beam sync is very sensitive to latency.
    • Might be improved by being able to request the exact data we want rather than having to step into the trie like is done with GetNodeData.
    • Achieving viable efficiency might require having access lists available to be able to parralellize the requests up front.

Terminology

Adaptive Range Filters (ARF)

Abbreviated to ARF

See: https://ethresear.ch/t/sharding-state-with-discovery-and-adaptive-range-filter-ads/7573

A data structure similar to bloom filters but for checking whether a specific range is included. Potentially useful for a more efficient advertisement mechanism for nodes that serve entire ranges of the keyspace.

Skip Graph (SG)

A distributed data structure which allows for range queries over the keyspace in a DHT. Potentially useful for efficient retrieval of entire ranges of the keyspace.

Select a repo