HackMD
    • Sharing Link copied
    • /edit
    • View mode
      • Edit mode
      • View mode
      • Book mode
      • Slide mode
      Edit mode View mode Book mode Slide mode
    • Note Permission
    • Read
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • Write
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • More (Comment, Invitee)
    • Publishing
    • Commenting Enable
      Disabled Forbidden Owners Signed-in users Everyone
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Invitee
    • No invitee
    • Options
    • Versions and GitHub Sync
    • Transfer ownership
    • Delete this note
    • Template
    • Save as template
    • Insert from template
    • Export
    • Google Drive Export to Google Drive
    • Gist
    • Import
    • Google Drive Import from Google Drive
    • Gist
    • Clipboard
    • Download
    • Markdown
    • HTML
    • Raw HTML
Menu Sharing Help
Menu
Options
Versions and GitHub Sync Transfer ownership Delete this note
Export
Google Drive Export to Google Drive Gist
Import
Google Drive Import from Google Drive Gist Clipboard
Download
Markdown HTML Raw HTML
Back
Sharing
Sharing Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
More (Comment, Invitee)
Publishing
More (Comment, Invitee)
Commenting Enable
Disabled Forbidden Owners Signed-in users Everyone
Permission
Owners
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Invitee
No invitee
   owned this note    owned this note      
Published Linked with GitHub
Like BookmarkBookmarked
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Gossip data sampling idea For context, the write-up of Vitalik describing the problem: [Data availability sampling in practice](https://notes.ethereum.org/@vbuterin/r1v8VCULP) This write-up is a variant that expands on approach 2, by @protolambda. This is all experimental, there may be holes/weaknesses, feedback welcome. TLDR of context: - Shard Proposers publish block data somewhere, optionally split somehow - Listeners want to make sure the data is available - Ignore the attestation data bit(s) and availability proofs for now. General idea is: - Request any tiny chunk - Enough random distinct tiny chunks can reconstruct the original data, like error correction - The correctness of these pieces can be shown individually - Request `k / (2N + 1)` of the error correction data, `N` being the count matching the original input. - Random sampling + network effect -> enough to trust data is available with `k` requests ## Desiderata - Low latency, sub-slot time preferably - Stable, current attestation subnet are at the brink of requiring too much subnet movement - Random choices for sampling, less predictable is better - No new network stack components ## General approaches - DHT based: slow, stable, pull-based - Gossip based: fast, difficult, pubsub-based (push, but gossip, not direct) - RPC streams: slow, peering / fan-out complexity, inventing a new system ## The idea Improve upon approach 2 of original writeup, chosen because: - Gossip is fast - Gossip can be stable (current approach 2 is not so much) - Random sampling is possible (try and exploit gossip approach here) - Gossipsub is already widely integrated in Eth2 Although the other approaches / options have better sampling, this approach seems more viable, and we can try to improve sampling still. ### How Like approach 2, the chunks are mapped to gossip subnets, and reach the validators. Different now: try to move work from subscriber to publisher. Additionally, we try shuffling the mapping between chunk index and subnet index for each round. This doesn't add much to randomness, but is a start. To repeatedly and quickly get `k` random samples, you can now stay on a random set of `k` subnets. Each subnet is processing a new random chunk index (or subset of all indices) each round. A proposer needs to put the chunks on each subnet, but this is a one-of task, that can be improved with hierarchical nets. Another way would be to do a fan-out step: to distribute data to all chunk subnets (`M`), first distribute it to all connected peers, which then put it on their chunk subnets. The chunks can be content-addressed like attestations, so duplicates don't hurt. Gossipsub already has similar fanout functionality (push to all peers, even if outside of joined topics mesh). Note that compared to approach 2, *this shifts most of the work to proposers*. Which is good, since the publishing task should be more flexible than the subscriber work, and there are many more subscribers than publishers that need to run. The larger the subnet count, the better the sampling would be. Each subnet is what counts towards the random sample taken by validators. The chunks are simply split `total_chunk_count / subnet_count`. To avoid DoS and validate chunks in gossip, either: - the proof material for validity of chunks is made globally available - the proof material is added to the chunk gossip messages, if small enough #### Weaknesses Now weaknesses in sampling relate to correlation and predictability: Correlation: by staying on net `i`, you get the same random series of chunk indices you would get as others on `i`. Predictability: by staying on net `i`, and some proposer knowing your presence in advance, they can try to only publish on the subnets you are on, and omit the rest. The add-on of shuffling the `subnet <> chunk_index` mapping really only contributes to detach certain chunk indices from being stuck to the same subnet forever. I think there is some marginal value to this, not every subnet may be the same. Not every node has the exact same sequence anyway (overlapping, but different start and end), and always validating the same chunk indices (and same shard) seems worse. #### Mitigating the weaknesses As a group the honest validators should already be safe: they each participate in their own selection of random subnets, and omitting some chunks from a validator would mean not publishing it on the corresponding subnets at all, otherwise the subnet should propagate those missing the chunks. So the concern really is tricking individual validators, and getting them to vote for missing blocks with missing chunks. To mitigate this, validators can still join some subnets randomly, but just part of the time. By joining a subnet randomly (with local randomness, not predictable to attacker) there is a greater chance to get on a network with a missing chunk. The error correction redundancy lowers the amount of subnets that are necessary to trust the random sampling. Still open for more ideas how to increase sampling here. #### Some familiarity, with twists Lots of similar ideas with attestation nets, but used differently: **Existing:** Being subscribed to a few random attestation subnets (A.k.a. the attestation subnet backbone) **Here:** The default, subscribing to `k` subnets, easily able to do work as listener Note: Should be more stable **Existing:** Rotating backbone subnets randomly on some longer interval **Here:** Useful to increase security with little effort, being more resistant against missing chunks **Existing:** Joining unknown subnets on shorter notice, for simple attestation work **Here:** Some extra randomness, good to join some subnet randomly on shorter time, to make predictability harder **Existing:** Attestation subnet bits in ENR and metadata to share where you are. **Here:** It literally doesn't matter who you peer to for random sampling, as long as it is random, and new peers are able to join. Enough to just share "I'm on random subnets" in the metadata, maybe with a counter of how many the peer is on, TBD. Or maybe everyone just shares the details of just a few they are subscribed to longer-term. Leaking just a few of many shouldn't matter, but can really help bootstrap new subscribers, with their own random picks. This could be as small as a few bytes describing a few subnet indices. **Existing:** The aggregate-and-proof subnet is useful for a gated global network. **Existing:** Like a DHT, put some content to some random place to retrieve it from. **Here:** Content is not hashed to decide on location, but it is distributed (randomly or not) between all subnets. Another seed for randomness may work better, but after that step, the gossip messages are content-addressed. Parameter similarity: - `RANDOM_SUBNETS_PER_VALIDATOR` - `k` - `EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION` - the slow rotation (incl. random rotation lengths) - `ATTESTATION_SUBNET_COUNT` - amount of chunk subnets - validator shuffling - chunk shuffling ### To be decided The interesting bits to decide: - Shuffling add-on; needs randomness to put chunks on subnets. Hash of shard block may be good enough. Need better general sampling still, if subnet count is low. - Parameters: - Amount of chunk subnets (could be a more than with attestations, if it's random work anyway) - Data availability; `N` chunks, `k` samples, exact details about chunk size (do we think of them as ranges, or the 32 byte pieces?) - Rotation: epochs for slow rotation of the `k` subnets, and slots for some `q` random subnets getting rotated more quickly (both applied with random variance). - Initial discovery of `k` random subnets. - A small constant `q` for the amount of subnet indices to intentionally leak, to bootstrap others who are joining. - Approach to publishing messages - Formatting / contents of proof of chunks, to validate gossip messages (and avoid DoS) And the more boring details to decide: - Encoding details of messages - Topic naming And then testing the idea, probably starting off with chunkifying the input and publishing the chunks to many subnets. The subscriber side should be relatively simple. ### Add-ons And add-ons briefly discussed on the call but not described as much: - Sentry nodes that can reconstruct the missing data, and publish the reconstructing data - For the more powerful nodes, an option to listen in on the full shard data Both could be really useful to fill gaps in whatever subnet is missing chunks. Error correction and has some direct use-case, and topics are a clear go-to for nodes that have the full data and can publish it as necessary.

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lost their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.


Upgrade

All
  • All
  • Team
No template.

Create a template


Upgrade

Delete template

Do you really want to delete this template?

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Sign in via SAML

or

Sign in via GitHub

Help

  • English
  • 中文
  • 日本語

Documents

Tutorials

Book Mode Tutorial

Slide Example

YAML Metadata

Resources

Releases

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions

Versions and GitHub Sync

Sign in to link this note to GitHub Learn more
This note is not linked with GitHub Learn more
 
Add badge Pull Push GitHub Link Settings
Upgrade now

Version named by    

More Less
  • Edit
  • Delete

Note content is identical to the latest version.
Compare with
    Choose a version
    No search result
    Version not found

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub

      Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

       Sign in to GitHub

      HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Available push count

      Upgrade

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Upgrade

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully