owned this note
owned this note
Published
Linked with GitHub
# Relay based data availability sampling
* All relays will act as sample providers
* They are already strongly incentivised to provide samples for their own blocks, as they want them to be deemed available
* As relays are run as public services, it is natural to add data availability sampling for all blobs to them
* Some relays can run a full 2d polynomial interpolation algorithm, which would guarantee that data can even be reconstructed if only 1/4 of samples are available
Let's say we have $n$ samples from a [2d Danksharding construction](https://notes.ethereum.org/ReasmW86SuKqC2FaX83T1g), of which each node samples $s$ samples, of which $q$ have to be available (example: $q=s=75$).
Each relay runs an RPC service which provides two endpoints:
* `samplesAvailable`:
* Inputs *blockHeight*, *blockHash*, *sampleIds: List[sampleId]*
* Returns a bitfield that indicates which of these samples are available for this service
* `getSamples`:
* Inputs *blockHeight*, *blockHash*, *sampleIds: List[sampleId]*
* Returns the samples
* `sendSamples`:
* Inputs *List[sample]*
* Allows any node to upload a list of samples for the current block
* `sendBlockAndSamples`
* inputs *block*, *List[samples]*
* Allows other relays to upload a list of samples to this relay. Block must be a valid signed block for the current height. Can send all or a subset of samples.
The block header is extended with four fields:
* `IPv4` address of relay sample endpoint
* `IPv6` address of relay sample endpoint
* `DNS` of relay sample endpoint
* `port` of relay sample endpoint
This sample endpoint is called the block-provided endpoint.
## Sample endpoint
Each node maintains a list of reliable sample endpoints. These are input by the users (defaults can be given by clients), but continuously scored by the node. It queries the 5-10 endpoints with the highest reputation, plus the block-provided endpoint for each block.
* Sample endpoints are scored according to the samples they provide. For each sample, a reasonable weight is $1/r$, where $r$ is the number of sample endpoints that have advertised the samples through `samplesAvailable`.
## Sampling
Upon receipt of a block, each node selects $s$ random samples to query. It then queries its current list of sample endpoints, plus the block-provided endpoint, using the `samplesAvailable` endpoint for all its $s$ samples.
Once it has replies from some or all endpoints, it starts downloading the actual samples using the `getSamples` endpoint. Whenever a sample is received that was not advertised through `samplesAvailable` by another endpoint, it is sent to that endpoint via `sendSamples`.
If not enough samples were found through this process, then this process is repeated for the unavailable samples until either
* enough samples (at least $q$) are found available
* Another block has a heavier weight in the fork choice rule, at which point sampling switches to that block.
(In other words, sampling is continuously active for the latest ancestor of the current fork choice head that hasn't been determined as available yet.)
## Initial sample distribution
When sending a block, a relay immediately streams the block and the samples to other sample endpoints it knows about, as well as making it available to its own sample endpoint (that it also advertises in its block).
* A relay has an incentive to make its samples available to as many sample endpoints as possible. While it may be tempting to make its own endpoint more reliable by initially only providing to it, this would increase the risk of the block being unavailable and is not worth the risk
* In particular, quickly distributing to many sample providers mitigates the dangers of a DOS attack
## Sample endpoint scoring
* Sample endpoints are scored according to the samples they provide.
* For each sample that has been provided through `getSamples`, a positive score of $1/r$, where $r$ is the number of sample endpoints that have advertised the samples through `samplesAvailable`
* For each sample that was not adertised thorugh `samplesAvailable` but was available through other sample endpoints, apply a penalty of $-r$
* Any sample that is adertised through `samplesAvailable` but the sample endpoint fails to provide (after a number of repeat queries) is penalized with $-10000$ reputation