-
-
Published
Linked with GitHub
# DAS building blocks
Informal discussion of the tools we might utilized in DAS constructions.
See [DAS requirements](https://notes.ethereum.org/@djrtwo/das-requirements) for context.
[toc]
## Tools at our disposal
Here are the currently known peer to peer building blocks to implement data availability sampling:
### Peered topic-based gossip
Generally performed via libp2p gossipsub. Utilizes existing peers, communicates over TCP, allows for mesh "overlays" to send data of particular topics.
Gossips data to D peers (thus D amplification factor). Utilizes IHAVE/IHAVE (announcements and requests) to help redundancy and attack resilience
Note: Peered gossip and req/resp seem better equipped for scoring than DHTs. Fewer number of peers, selected and negotiated, whereas DHT is the entire table. Far more potential entities and less consistent comms. (higher resistance to attacks)
#### push data vs announce and pull
Gossipsub is not very tunable wrt push vs announce, but tuning this strategy for certain types of DA dissemination might be a fruitful path to investigate.
### Peered req/resp
Generally performed via libp2p. Utilizes existing peer connections to request particular data, communicates over TCP. Limited to existing peers (20 to 50 in normal case) or finding new ones and openning up new relationship
### Discv5 udp queries
Similar to libp2p req/resp but utilizes direct UDP query to some node in the DHT. Does not require a prior connection/relationship, but we might consider an "extended" set of peers we go to and prioritize in the DHT via some sort of scoring (help with reliability and the removes ability for an attacker to quickly turn over the DHT with sybil)
Open questions
* Can you abuse particular nodes (can scoring help)?
* How much can various sized attackers disrupt DHT query mechanism
* Can you meaningfully use peer scoring in this type of transient comms?
### Discv5 structured information passing
Nascent idea. Use the DHT structure/proximity metrics to disseminate data in a structured way (e.g. via a tree).
Open questions
* guarantees wrt delivery (under normal and attack scenarios)
* min and max data requirements (e.g. can you abuse "honest" nodes by giving them a ton of data to then split and disseminate)
### Nodes vs Validators
For most networking mechanisms, we don't want to tie a validator identity to a physical node identity to avoid anonymity issues as well as network/validator setup rigidity (e.g. a validator can run many nodes, or dynamically swap them, etc), but we can consider breaking this to create more structured and/or privileged networking mechanisms.
### More central/incentivized DA provider
This could be an in protocol role (e.g. a superset of validating via an election) or kicked out to the nebulous 'highly incentivized builder'.
This is a very nascent idea with a yet to be analyzed set of incentives and motives. Worth pondering a bit more.
## Requirements/tools discussion
### 1. disseminate rows/columns
Peered topic-based gossip is likely the correct tool for this.
It is worth running gossipsub experiments to better quantify the *load* (gossip multiplier and message overhead) and the *limitations* of this approach (e.g. how many topics can X-size network handle, how quickly swap topics, etc)
Side note, large data consumers would use row/column gossip to get large amounts of data (e.g. L1 block explorers, infura, etc), but users of the data will likely get the data (and proofs against L1) from alternative data dissemination networks (e.g. a roll-up specific p2p network to get the recent roll-up block and pointers/proofs into the L1 data-TXs). Point being -- L1 p2p network does not need to generally figure out how "end users" get application specific chunks of data-blobs.
### 2. disseminate samples
We generally assume that we will require "honest" nodes to support some amount of sample querying on an altruistic basis. This is not a far departure from the existing assumption that honest nodes will serve historic blocks and/or state sync queries. Just instead of an honest node being expected to serve all block-data historic queries, it is expected to serve a subset of blob-data queries on a fixed range (e.g. 1 month).
Note, there may be a difference between the data you are expected to serve for requests and the data you randomly want to request. For the former, you want stability so you can get the data and others know to query for it. For the latter, you want to randomly change each slot (or at lesat very frequently). This means that you can't necessarily naively combine dissemination of the samples with querying of samples.
We thus need to get data to nodes without requiring them to touch disproportionately large amounts of the data. To do this, we need a mapping between something public about the node (e.g. node-id or some sort of advertisement (e.g. ENR, topic adv, etc)) or need the node to be able to query efficiently for the data.
#### gossip
Gossip will likely breakdown for this type of data dissemination due to the granularity of the data various nodes want to receive -- a small amount (dozens or ~100) of on the order of hundreds of thousands of samples. This leads to gossip overlay meshes that are potentially very small -- decreasing their resilience to attack and ability to be found efficiently.
Size of topic meshs *might* not be as bad as we sometimes discuss. If there are 4000 nodes on the network and 200k samples per block, then samples could be grouped into chunks and disseminated as sets onto subnets, where your peer-id defines which subnet/chunk you should be listening to. If you for example create chunks of 100 samples, then you'd have 2000 subnets in total. If nodes download just one chunk per slot, then you'd have ~2 nodes per subnet.... This seems too few. If you instead put 1000 samples in each chunk, you have 200 subnets which puts 20 nodes per subnet. Maybe beginning to be in the "safe" range.. unclear.
The main problem here is that you are now downloading 1000 samples per block as an honest node when you only need to sample 70 samples. Any asymmetry here breaks down the scalability of sharding.
Naively, finding of peers of a particular sample topic might be slow, but if combined with some sort of peer-id determinism or ENR advert, it might be tractable.
Nonetheless thousands of topics makes gossip overlays not a very obvious choice.
#### structured dht information passing
The ideas in this domain look something like a structured gossip utilizing a dht node-table and a distance metric for data. The basic idea here is to pass in large amounts of data to some nodes and have them split it up to pass to additional peers "near" the data. The data continues to be passed and split in a tree-like manner.
Open questions
* guarantees wrt delivery
* min and max data requirements (e.g. can you abuse "honest" nodes by giving them a ton of data to then split and disseminate)
#### Hybrid
Leverage nodes that have received large subsets of the data via row/column dissemination to be then split and passed to peers or dht participants.
For example, utilize all validators and members of row/column topics to then give peers with certain properties (peer-id, ENR adv, etc) samples of data via req/resp. Or utilize such members to seed dht structured data passing.
Another idea is to utilize more granular topics than rows/columns but less granular than samples to have better mesh properties than full sample granularity, then to pass information directly to peers in req/resp to fill in the gaps.
Any sort of hybrid needs to be analyzed and experimented on via network sizes and various assumptions to understand load and delivery guarantees.
### 3. Support sample queries
Nodes must be able to query random samples for each block. This *can* (but doesn't have to be) divided into "live" vs "historic" sampling. In the event that "historic" sampling is safe and fast enough, you can also use the same mechanism for "live" sampling.
A node must be able to choose a different random set of points to sample each slot. This makes the utilization of more persistent network structures (such as gossip overlay meshes) more difficult to use because of the high churn to such structures. Nonetheless, gossip should at least be considered as an option for "live" sampling.
DHTs do seem like a natural mechanism for sampling due to the ability to quickly search and the low overhead of UDP (in discv5) but data dissemination is still an open problem. As well as resilience to attacks. DHTs do not have BFT properties and portions of the table can be trivially attacked through the generation of additional sybil'd records. The attackability here can wreck liveness of the chain. Additionally, disseminating data to a DHT might take longer than other methods.
Another option to consider is req/resp to existing peers. If we can assume that data is disseminated well to all/many nodes, then we might be able to use req/resp to existing peers (e.g. 50 of them) as a primary mechanism for queries and just fallback to the DHT. This might allow for a better utilization of peer-scoring in the primary (due to having a known and limited set of peers/connections) but the TCP/libp2p overhead is high.
#### Hybrid
Utilizing gossip or req/resp as a primary and DHT as a fallback mechanism might help improve resilience while also relying on potentially "faster" mechanisms for live sampling
### 4. Identify and reconstruct missing data
There has been very little investigation here. Other requirements at least have some known paths even if there might be latency/attack trade-offs.
We have two primary signals that there is missing data:
1. Node failing to get some sample(s)
2. Validator failing to get their row/column
Additionally, we have a signal to decide if this is a localized issue or not (in the normal, non-validator DA attacking case). *Block being attested to and built upon*. If a node suspects a block contains unavailable data but it has very few attestations and no valid children, then there is low signal that a significant amount of others see the block as available. But, if such a block has high attestation weight and valid children, that is a signal that there might be localized unavailable samples (thus need to reconstruct) or they might be poorly connected to their sampling mechanism.
One thing to consider is that in a (block, slot) construction, such blocks would just be orphaned if the unavailable data cannot be widely reconstructed in 4s (or longer if a back-off scheme is employed)