HackMD
    • Sharing Link copied
    • /edit
    • View mode
      • Edit mode
      • View mode
      • Book mode
      • Slide mode
      Edit mode View mode Book mode Slide mode
    • Note Permission
    • Read
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • Write
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • More (Comment, Invitee)
    • Publishing
    • Commenting Enable
      Disabled Forbidden Owners Signed-in users Everyone
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Invitee
    • No invitee
    • Options
    • Versions and GitHub Sync
    • Transfer ownership
    • Delete this note
    • Template
    • Save as template
    • Insert from template
    • Export
    • Google Drive Export to Google Drive
    • Gist
    • Import
    • Google Drive Import from Google Drive
    • Gist
    • Clipboard
    • Download
    • Markdown
    • HTML
    • Raw HTML
Menu Sharing Help
Menu
Options
Versions and GitHub Sync Transfer ownership Delete this note
Export
Google Drive Export to Google Drive Gist
Import
Google Drive Import from Google Drive Gist Clipboard
Download
Markdown HTML Raw HTML
Back
Sharing
Sharing Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
More (Comment, Invitee)
Publishing
More (Comment, Invitee)
Commenting Enable
Disabled Forbidden Owners Signed-in users Everyone
Permission
Owners
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Invitee
No invitee
   owned this note    owned this note      
Published Linked with GitHub
Like BookmarkBookmarked
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# (OLD) Alexandria: Chain History over Discovery V5.1 DHT Welcome to the Alexandria specification! > Note that this is a work in progress and is subject to incompatible changes without notice Alexandria is an overlay network to the Discovery V5.1 network which facilitates distributed storage and retrieval of historical chain data for the Ethereum blockchain. The design of the system loosely follows the *"standard"* Kademlia DHT structure with each piece of content being associated with a *key* and each *key* being mapped to a specific *node-id* in the network. Nodes on the network perform the following functions. * storing content and serving requests for that content. * maintaining an index of where content can be found on the network. ## Overview Each ### Overlay Network The overlay network is a Kademlia routing table as documented in the [Discovery Wire Protocol](https://github.com/ethereum/devp2p/blob/4ba94c8a2ab5781d754d56ba840abfe38f502342/discv5/discv5-theory.md). The overlay network routing table is maintained separate from the core Discovery V5 routing table and comprised only of nodes that support the Alexandria protocol. > TODO: How does one populate this routing table. ENR records, or do we implement a separate `FIND_NODES/FOUND_NODES` set of messages within the sub-protocol. ### Content and Keys Each piece of content `C` is associated with a key `K`. Each key is a sequence of bytes. Each key is mapped to a 32 byte *contentId* via the cryptographic hash function `hash(K)`. The *contentID* and *nodeID* share the same keyspace and are effectively interchangeable, however this document will attempt to use *contentID* when referring to content, and *nodeID* when referring to network nodes. > TODO: `hash(...)` needs to be defined. Probably `sha256` for simplicity. ### Distance Function > TODO: same as routing table distance function ### Finding Content Each node in the network is uniquely identified by their *nodeID*. Each node on the network is responsible for maintaining an index of which nodes on the network are known to have the data for content that maps to a *contentID* that is in *close* proximity to their own. To find a piece of content `C` with *contentID* `CID` they would perform an iterative lookup to find the nodes in the network that are closest to `CID`. These nodes can then be sent a `LOCATE` request with the desired *contentID*. The queried nodes will inspect their index and return then *nodeID* of any nodes who have advertised that they have the desired content. Note that content can be stored *anywhere* on the network, and it is only the knowledge of where content is located that is tied to the content's *contentID*. ### Advertisements by the numbers The current Ethereum mainnet has approximately 10,000 nodes. We can establish this as a rough estimate of the lower bounds of projected network participation. As an arbitrary upper bound we'll assume potential for 100,000 nodes. Depending on how you look at the data that needs to be stored, there are a bit over 2 billion items that need to be indexed in the DHT. 1 billion transactions, 1 billion receipts, and 10 million headers and blocks. This means that each node will *on average* need to make available between 20,000 - 200,000. We'll use the number 100,000 for simplicity. Each of these items is individually addressable, meaning that a node coming online will need to to disseminate 100,000 advertisements around the network. For this to be efficient, we'll need it to be able to happen relatively quickly, so lets put an upper bound of 10 minutes on the "bootstrapping" process where a node comes online with an existing "store" of data. That means that it has 600 seconds to push 100,000 advertisements, requiring a rate of 166 advertisements per second. Each advertisement will contain a key which is likely to be between 64-128 bytes long. In terms of total data this means only about 6-100mb of data depending on how big the advertisement payload is. These numbers seem attainable, but there are likely efficiency gains that can be made. ### Replication factor Using the gossip numbers from above, let us examine what the expected replication factor would be for the network. As of September 2020, the storage of an Ethereum node looks roughly as follows. ``` +-----------------+--------------------+------------+ | DATABASE | CATEGORY | SIZE | +-----------------+--------------------+------------+ | Key-Value store | Headers | 50.49 MiB | | Key-Value store | Bodies | 3.41 GiB | | Key-Value store | Receipts | 4.08 GiB | | Key-Value store | Difficulties | 6.17 MiB | | Key-Value store | Block number->hash | 5.06 MiB | | Key-Value store | Block hash->number | 421.48 MiB | | Key-Value store | Transaction index | 27.54 GiB | | Key-Value store | Bloombit index | 1.64 GiB | | Key-Value store | Trie nodes | 80.33 GiB | | Key-Value store | Trie preimages | 1.27 GiB | | Key-Value store | Account snapshot | 0.00 B | | Key-Value store | Storage snapshot | 0.00 B | | Key-Value store | Clique snapshots | 0.00 B | | Key-Value store | Singleton metadata | 151.00 B | | Ancient store | Headers | 4.52 GiB | | Ancient store | Bodies | 98.20 GiB | | Ancient store | Receipts | 44.57 GiB | | Ancient store | Difficulties | 166.26 MiB | | Ancient store | Block number->hash | 387.27 MiB | | Light client | CHT trie nodes | 1.21 GiB | | Light client | Bloom trie nodes | 1.99 GiB | +-----------------+--------------------+------------+ | TOTAL | 269.77 GIB | +-----------------+--------------------+------------+ ``` Ignoring the nuances of this data we can place a rough upper bound of 500GB of unique data that needs to be store for the near-term future. Assuming the network has between 10,000 - 100,000 nodes, and each node on average contributes 1GB of storage to the network, we can estimate the network's total capacity to be between 10TB-100TB. From this we can derive a replication factor somewhere between 20-200 This means that a naive advertisement scheme in which each advertisement must be sent directy to each node that stores it would require 20x-200x more messages. #### Gossip Theory Each advertisement relates to a specific `content_id` which inturn, maps to exactly one node which is *closest*, and thus ultimately responsible. One desirable approach would be for nodes that receive an ad to relay it onwards towards nodes that are closer to the `content_id`. Naively implemented, this scheme poses an amplification vector since a single message with a bogus ad would trigger the receiving node to relay it to whatever nodes it knows that are closer to the the `content_id`. For this reason, we would need a mechanism to combat this type of malicious amplification. Ideally, we would want a system that has the following properties. - advertisements are only sent to nodes that are willing/interested in storing them. - only *valid* advertisements are sent - not forged - ideally validate that the have the necessary data A node on the network would allocate some amount of storage `S` towards ads. Each add corresponds to a specific `content_id`. We can define the *radius* of their stored ads as the maximum distance for any ad from their local `node_id`. As a node encounters more content and their storage becomes full, they will begin evicting the ads that are furthest from their `node_id` as they encounter new ads which are closer and thus more interesting to them. If nodes published their current ad radius, then other nodes can know whether an ad is interesting. My maintaining up-to-date information about the ad radius for the nodes in your routing table, we can then ensure that ads are only sent to nodes that would be *interested* in them. Current the `ADVERTISE` message doesn't have a defined response. Under this scheme, the response could contain the ad radius so that the sender then knows whether subsequent advertisements should be sent. For an advertisement to be valid we want to ensure that the node *serving* then content actually has the data and that the ad was not forged. We can used digital signatures to prevent forgeries at the expense of roughly 65 bytes per advertisement. As for ensuring someone has the data they claim to have... **if** we had a merkle root of the data, then it would be relatively easy to do some sort of proof of custody by requesting a random chunk of the data. The problem here seems to be that we typically won't have a merkle root, but rather a boring `keccak(data)` which doesn't allow us to verify the data until we have it in its entirety. See below in the "Transmitting Data" section on using SSZ trie hashing which would facilitate data audits. ### Gossip style PUSH Advertisements In order to avoid requiring nodes to send each advertisement to 20-200x nodes due to replication, we want a scheme where each advertisement need only be sent once, and it will be automatically replicated in the appropriate region of the network. Let us explore a potential algorithm for doing this. Given a piece of content with content ID `C` we expect between 20-200x nodes near `C` to be storage locations. We'll suppose that each node maintains a Kademlia routing table and that for each node in their routing table, they also maintain the node's advertised storage radius `R`. Our gossip system should aim for two properties. - A: The node in the network that is **closest** to `C` will find out about the content. - B: The majority of nodes in close proximity to `C` will find out about the content. To accomplish objective A, a node wishing to advertise `C` will perform an iterative lookup for the nodes closest to `C` and send the advertisement to the `n` closest lively nodes (probably `n=3` for some redundancy). Upon receiving an advertisement, a node will do the following. 1. *if* `C` is not of interest to them, they will ignore it. 2. otherwise, they will verify the advertisement, store it, and forward it onto the node in their routing table that is closest to `C`. The above ensures we satisfy objective A. To satisfy objective B we need the advertisement to be forwarded to the majority of nodes `N` each with advertisement radius `R` such that `distance(N, C) < R`. In order to do this efficiently we would probably want to spread the load across as many nodes as possible. This is typically done by doing something like computing the set `[n1, n2, ..., n]` of nodes in the routing table that satisfy the broadcast condition and then randomly selecting a logarithmic or square-root number of those peers and only broadcasting to them. This mechanism ensurse that objective B is met, propogating each advertisement to the broader area of the network where it's content Id resides. ### What each node tracks #### Advertisement Raidus: `R` Each node should track `R` which is the largest distance from the nodes `node_id` to the `content_id` for any of its stored advertisements. This is a value that needs to be semi-regularly updated. We could potentially transmit this in something like a `Pong` message. #### Verified Merkle Roots Assuming advertisements include merkle roots, nodes should retain at minimum an ephemeral mapping of `Key -> root` for all merkle roots that have been fully validated. #### Which Advertisements have been sent To avoid re-broadcasting an advertisement to the same peer multiple times, a recent cache of which advertisements have been sent to which nodes should be kept. A simple LRU cache would probably be fine. ### Transmitting Data We need to make some decisions on how content payloads will be transferred in the DHT. We can expect a rough upper limit of 1kb transfer per datagram. A brief survey of recent blocks suggests block sizes around 30kb meaning that it will take 30x messages to transmit a single block body. A naive approach would be to request the data with one message, and then wait for all of the response messages to come in, after which you can re-assemble content. Presumably clients would also then validate that the content is well formed such as reconstructing the transaction and uncle tries and checking them against he header. One of the simplest ways that this approach fails is if a single message is lost. Without the ability to request specific "chunks" of the data, the full data would need to be re-requested. This also means that you must get *all* of the data from a single source even though there may be multiple sources available that you could pull data from concurrently. * property a: ability to request specific pieces of the data * property b: ability to retrieve data concurrently from multiple sources. Another problem with this approach is that the recipient cannot validate the data until they have received the full payload, meaning they may be holding onto many bad chunks before they are able to determine they are junk. * property c: ability to validate data on a per-message granularity. Putting these together, we would want a scheme with roughly the following shape. 1. Content would be identified by something like a merkle root. 2. Content requests can be targeted at specific chunks from the merkle tree. With these two things, we should be able to have all of the desired properties. The problem is that most of our content isn't identified by a merkle root. Maybe we can address this in our advertisement system. Each advertisement will contain a `key` for some specific piece of content. The key carries semantic meaning which allows each key to be tied to a validation scheme. For example, the header key would contain the canonical hash of that header. Verification of the data would involve checking that `keccak(rlp(header)) == expected_hash`. Other data however, such as single transactions or a block of transactions require more specialized validation, either re-constructing the `transactions_root` and verifying that it matches the corresponding field in the header, or potentially, for single transactions, verifying that an accompanying proof is valid against the `Header.transactions_root`. We could add a merkle root to each advertisement. Verification of this root would be possible by fetching the underlying data. Since the hash is a merkle root, it would also allow fetching individual pieces of the trie, each of which would be verifiable individually against the merkle root. Upon recieving two advertisements for the same `content_id` but with different merkle roots, it would be straight forward to figure out which one (if any) was the correct one, and purge all of the remaining ones. Under this scheme the content retrieval algorithm would be roughly this. - Gather up advertisements - Group advertisements by merkle root - Starting with the most popular root, fetch the data. If the complete data payload does not pass validation, move onto the next most popular root and repeat. - This process will eventually either succeed at the final validation phase, or will discredit all of the advertisements. From the perspective of someone storing advertisements, once a merkle root has been verified it would not need to be re-verified. A standard trigger for verification could simply be anytime that two advertisements are found for the same `content_id` but with mismatched merkle roots. Even in an adversarial situation, a client would eventually settle into an equilibrium where all of their advertisements have been validated (and thus they will be able to reject any advertisements for that `content_id` which don't match the already known merkle root) This all builds to an Advertisement model that looks roughly like: ``` advertisement := key || valid_until || merkle_root || signature valid_until := timestamp when this advertisement should be discarded. valid_until := tim signature :=valid_until key := full key for the content merkle_root := something like an SSZ merkle hash ``` The signature will be 65 bytes. The Merkle root will be 32 bytes. The key will tend to be less than 100 bytes. More research TBD on the exact mechanism for retrieval. ### Content Keys and Semantic Meaning While the keyspace is unbounded and allows for *any* arbitrary byte strings as keys, the protocol dictates how the keyspace is used, and thus restricts the allowed keys as well as embedding semantic meaning into each key. All keys begin with a common prefix `<prefix>`: `eth/v1` > TODO: differentiation between different chains/networks/etc in the prefix. `network-id`, `fork-id`, `chain-id`? The following keyspaces are defined for storing the following different pieces of content. #### Headers `<prefix>/header/<hash>` > `rlp(header)` This path simply returns the RLP encoded header. Validation involves checking that the returned data hashes to the `<hash>` from the key. `<prefix>/header-accumulator/<epoch-number>` > EPOCH_ACCUMULATOR WORK-IN-PROGRESS: Use Eth2 header accumulator pattern. Can probably use fixed size SSZ list here and use built-in SSZ merklization and hashing. We should develop this in conjunction with an EIP to modify the block header to include this hash. These would be used for inclusion proofs to allow clients to discard the header chain after validating it. #### Block Uncles `<prefix>/block/<hash>/uncles` > `rlp([uncle_0, uncle_1, ..., uncle_n])` `<prefix>/block/<hash>/uncles/<idx>` > `rlp(uncle)` (proof?) #### Transactions `<prefix>/block/<hash>/transactions` > `rlp([txn_0, txn_1, ..., txn_n])` `<prefix>/block/<hash>/transactions/<idx>` > `rlp(txn)` (proof?) #### Receipts `<prefix>/block/<hash>/receipts` > `rlp([receipt_0, receipt_1, ..., receipt_n])` `<prefix>/block/<hash>/receipts/<idx>` -> `rlp(receipt)` (proof?) #### Canonical Indices `<prefix>/canonical/header/<number>` -> `<hash>` `<prefix>/canonical/transaction/<hash>` -> `<hash>/<idx>` ## Messages All messages are transported over the `TALKREQ/TALKRESP` messages provided by the base Discovery v5.1 wire protocol. Messages are encoded as: ``` payload = request-id || message-id || message request-id = 32 bit big endian message-id = 8 bit big endian message = encoded message bytes ``` All messages use the protocol identifier `TODO` ### FIND_NODES | message-id == 1 TODO: Same as base protocol but in SSZ ### FOUND_NODES | message-id == 2 TODO: Same as base protocol but in SSZ ### ADVERTISE | message-id == 3 TODO: List of keys that node is advertising they provide > TODO: Should this be signed to prevent counterfits? Since node-ids are cheap, does counterfit prevention get us anywhere? > TODO: Should this have a response message??? ### LOCATE | message-id == 4 TODO: Query the nodes advertisements for any nodes they know about which serve a given key (or keys). Expects a FOUND_NODES in response. ### RETRIEVE | message-id == 5 TODO: request a piece of content by its key ### DATA | message-id == 6 TODO: multi-part response to a RETRIEVE > TODO: Griefing vectors, merklization and chunking of content ## Network Data Requirements TODO: Work out the following. - Total size of all data stored in the network broken down by each type of data. - Rate of data growth bith in GB and number of items - Total number of entries for each piece of data to be stored. - Table/Chart showing number of nodes and individual node storage requirements across the following three axis: - Number of network nodes - Replication factor - Basic Performance Projections - Number of advertisements a standard node will have to make when coming online.

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lost their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.


Upgrade

All
  • All
  • Team
No template.

Create a template


Upgrade

Delete template

Do you really want to delete this template?

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Sign in via SAML

or

Sign in via GitHub

Help

  • English
  • 中文
  • 日本語

Documents

Tutorials

Book Mode Tutorial

Slide Example

YAML Metadata

Resources

Releases

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions

Versions and GitHub Sync

Sign in to link this note to GitHub Learn more
This note is not linked with GitHub Learn more
 
Add badge Pull Push GitHub Link Settings
Upgrade now

Version named by    

More Less
  • Edit
  • Delete

Note content is identical to the latest version.
Compare with
    Choose a version
    No search result
    Version not found

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub

      Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

       Sign in to GitHub

      HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Available push count

      Upgrade

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Upgrade

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully