# A note on Ethereum 2.0 attestation aggregation strategies ###### tags: `eth2` `v0.10.0` [TOC] :::info By @hwwhww 20200115 Special thanks to @djrtwo for the review. ::: This document describes the requirements of Ethereum 2.0 attestation aggregation strategies and introduces the current "naive" aggregation strategy. # 1. Background ## 1.1. Consensus layer ### 1.1.1 Committee A `committee` is an array of `ValidatorIndex`s that are assigned to validate the block at a specific `slot`. - For each *epoch*, all the validators are [shuffled and grouped](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#compute_committee) into different committees. There may be multiple committees at one slot, but we set the maximum committees count per slot [**`MAX_COMMITTEES_PER_SLOT`** committees for each slot](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#get_committee_count_at_slot) to `64` now. - We can use `Slot` and `CommitteeIndex` to identify a specific committee: - `Slot`: the slot number - `CommitteeIndex`: the index of the committee at the given slot :::info In **phase 1**, the validators of a certain committee have to attest a certain shard block. ::: ### 1.1.2. Attestation The validators of the committees of a certain slot have to **attest** the block of the slot. The message type of their "vote" is called [`Attestation`](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#attestation): ```python class Attestation(Container): aggregation_bits: Bitlist[MAX_VALIDATORS_PER_COMMITTEE] data: AttestationData signature: BLSSignature ``` - `aggregation_bits`: a bitfield mapped to the specific committee. The bits represented which validators participated the attestation. - `data`: the [AttestationData](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#attestationdata), including `slot: Slot` and `committee_index: CommitteeIndex` to point out the committee set and the vote content `beacon_block_root: Checkpoint`, `source: Checkpoint`, and `target: Checkpoint`. - `signature`: The 96 bytes *aggregate* (see below) [BLS12-381 signature](https://hackmd.io/@benjaminion/bls12-381). ### 1.1.3. Aggregation BLS12-381 signatures are **aggregatable**. For example: 1. With the same signing `message`: - The validator $V_1$ with privkey $Privkey_1$ and pubkey $Pubkey_1$ signs `message` resulting in a 96-byte $Sig_1$ output - The validator $V_2$ with privkey $Privkey_2$ and pubkey $Pubkey_2$ signs `message` resulting in a 96-byte $Sig_2$ output We can use the [BLS cryptography library](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#bls-signatures) to get the aggregate $Sig_a$ from `Aggregate([Sig1, Sig2])`. The aggregate result can be verified with `AggregateVerify([(Pubkey_1, message), (Pubkey_2, message)], Sig_a)` Or in this case, because the same `message` has been signed for each signature, the result can be verified with `FastAggregateVerify([Pubkey_1, Pubkey_2], message, Sig_a)`. 2. We can also aggregate more signatures `Aggregate([Sig1, Sig2, ... SigN])`. 3. We can also aggregate two aggregate signatures. For example, we can get aggregate signature from `Aggregate(Aggregate([Sig1, Sig2]), Aggregate([Sig3, Sig4]))`. :::warning However, if the aggregate is **overlapping**, e.g., `Aggregate(Aggregate([Sig1, Sig2]), Aggregate([Sig2, Sig3]))`, we are not able to verify within eth2 consensus because the bitfield we use to track participation cannot represent the overlapping participants. Note that the verify functions can intrinsically handle this case by counting multiple pubkeys for overlaps, but our choice of data structure does not allow for double counting. ::: ## 1.2. Networking layer The messages are broadcasted with libp2p gossipsub protocol. - The beacon nodes subscribe to some [`topics`](https://github.com/ethereum/eth2.0-specs/blob/v0.10.0/specs/phase0/p2p-interface.md#topics-and-messages). Nodes can publish the message on the topic, and the subscribers will get the message. We frequently refer to gossipsub topics as "subnets" and can be thought of as overlays on the base p2p network. Learn more about the basics [here](https://docs.libp2p.io/concepts/publish-subscribe/)! - There are topics that **every beacon node** is expected to listen to. We call them the ["global" topics](https://github.com/ethereum/eth2.0-specs/blob/v0.10.0/specs/phase0/p2p-interface.md#global-topics). ### 1.2.1. Attestation subnets The idea of "shards" will be introduced in phase 1. In phase 0, we go-ahead to introduce the concept of "subnets". In the networking spec, we set a constant `ATTESTATION_SUBNET_COUNT` as the total number of attestation subnets. `ATTESTATION_SUBNET_COUNT` is set to `64` now. We use `subnet_id` as the identifier of the subnet. Each `subnet_id` is set to `index % ATTESTATION_SUBNET_COUNT`, where `index` is the `CommitteeIndex` of the given committee. :::info In **phase 1**, these subnets will be mapped to the shards. ::: ## 1.3. Aggregation strategy requirements The requirements of an aggregation strategy are: - **Non-overlapping aggregate**: Since we use bitfield to record the participants, each individual signature can only be included in an aggregate once. - **Validator privacy**: Do not require an explicit coupling of validator (consensus) ID to node ID for some affordance of privacy and flexibility. (Note: this does not guarantee privacy, just keeps some avenues and techniques open. Information is still leaked via messages sent and pubsub topic subscriptions.) - **Efficiency**: Minimize network traffic, especially on the global channel. # 2. The "naive" aggregation strategy mechanism This basic aggregation strategy, designed by Danny Ryan, Vitalik Buterin, and co, is what we are planning to use in phase 0 launch. ## 2.1. Self-elect We use [two helper functions](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#aggregation-selection) to describe the aggregator selection: ```python def get_slot_signature(state: BeaconState, slot: Slot, privkey: int) -> BLSSignature: domain = get_domain(state, DOMAIN_BEACON_ATTESTER, compute_epoch_at_slot(slot)) signing_root = compute_signing_root(slot, domain) return bls.Sign(privkey, signing_root) ``` ```python def is_aggregator(state: BeaconState, slot: Slot, index: CommitteeIndex, slot_signature: BLSSignature) -> bool: committee = get_beacon_committee(state, slot, index) modulo = max(1, len(committee) // TARGET_AGGREGATORS_PER_COMMITTEE) return bytes_to_int(hash(slot_signature)[0:8]) % modulo == 0 ``` The validators of the committee at `slot` with committee index `index` generate their `slot_signature`s by calling `get_slot_signature(state, slot, privkey) -> BLSSignature` function. Then they call `is_aggregator(state, slot, index, slot_signature)` function with the given `slot_signature` to check if they are one of the **aggregators**. `is_aggregator` function has some features: - **Self-elect only**: only the aggregators themselves can reveal their `slot_signature`. - **Everyone can verify**: everyone who received the `slot_signature` can verify it with `is_aggregator` function. - **Probabilistic selection**: `TARGET_AGGREGATORS_PER_COMMITTEE` is the target aggregator per committee and is set to `16` now. ## 2.2. Attesting Each attester [creates and broadcast an individaul (unaggregated) attestion](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#attesting) to [`committee_index{subnet_id}_beacon_attestation` subnet](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#attestation-subnets). The honest attesters broadcast their attestation when `SECONDS_PER_SLOT / 3` seconds after the start of `slot` or when once received a block from the expected proposer (whichever comes first). ## 2.3. Aggregate The selected aggregators [construct an aggregate attestation]( https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#construct-aggregate) with the unaggregated attestations that they received earlier that have the _same `AttestationData` as their own_. And then, they create and broadcast [`attestation_and_proof: `AggregateAndProof](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#aggregateandproof) message to the global channel `beacon_aggregate_and_proof`. The honest validator broadcasts this message at `SECONDS_PER_SLOT * 2 / 3` seconds after the start of `slot`. ```python class AggregateAndProof(Container): aggregator_index: ValidatorIndex aggregate: Attestation selection_proof: BLSSignature ``` - `aggregator_index`: the `ValidatorIndex` of the aggregator - `aggregate`: the aggregate attestation - `selection_proof`: the signature generated by `get_slot_signature` function. ## 2.4. Verification [Here](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#global-topics) is the detailed `beacon_aggregate_and_proof` topic verification procedure. Note that the message is only valid with the `selection_proof` of a selected aggregator. # 3. Analysis of naive aggregation strategy mechanism ## 3.1. Load balancing ### 3.1.1. Constants and assumption - $S$ subnets - $A$ aggregators per committee, targeting at `TARGET_AGGREGATORS_PER_COMMITTEE` aggregators. - $V$ active validators. - $V_c$ validators in the commitee, with `MAX_VALIDATORS_PER_COMMITTEE` limitation. - $B_s$ beacon nodes that are subscribing a subnet. - $B_g$ beacon nodes that are subscribing the global channel. - `Attestation`: - Fields: - `aggregation_bits`: $Vc\ //\ 8$ - `data`: $8+8+32+40+40 = 128$ - `signature`: $96$ - Total size: $224 + Vc\ //\ 8$ - Worst case: $480$ bytes ### 3.1.2. Gossipsub assumption - Each beacon node has $P$ peers - It takes $H_s$ hops for one (individual or aggregate) message to be fully broadcasted over the subnet with $B_s$ beacon nodes and target $P$ peers. - It takes $H_g$ hops for one aggregate message to be fully broadcast over the global channel with $B_g$ beacon nodes and target $P$ peers. ### 3.1.3. Per committee attestation messages overhead The total overhead of per committee attestation could be roughly represented as: $$ \begin{split}subnet\ messages\ overhead &= individual\ message\ size * len(attesters) * len(nodes) * len(propagation\ hops) * len(subnets) \\ &= (224 + V_c\ //\ 8\ bytes) * (V_c\ validators\ in\ committee) * (B_s\ nodes) * (H_s\ hops) * (S\ subnets) \end{split} $$ $$ \begin{split}global\ messages\ overhead &= aggregate\ message\ size * len(aggregators) * len(nodes) * len(propagation\ hops) \\ &= (224 + V_c\ //\ 8\ bytes) * (A\ aggregators) * (B_g\ nodes) * (H_g\ hops) \end{split} $$ Some possible directions that we can optimize it: 1. Decrease $B_g$: In practice one beacon node will likely serve many validators 2. Decrease hops: optimize the networking protocol 3. Set a reasonable expected aggregator count $A$: it's currently set to $16$, but perhaps we should increase or decrease this number. ## 3.2. Beacon block proposer strategy The beacon block proposer cannot further aggregate aggregates that are overlapping. Because we don't assign an aggregator to only aggregate a sub-committee, it's highly likely that the aggregators will create different attestations that includes the same attesters. The beacon proposer _can_, however, include multiple aggregates that have overlaps on-chain but with no additional reward for repeat inclusions. Each block can be packed with up to `MAX_ATTESTATIONS` (128 in current spec) attestations. To maximize rewards, the proposer would go through all aggregates with any yet-to-be-included validators, try to find the maximum set of attestations. (Also see [maximum disjoint set problem](https://en.wikipedia.org/wiki/Maximum_disjoint_set)) ## 3.3. Probabilistic selection Since the `is_aggregator` function is probabilistic, it's *possible* that no attester is selected as an aggregator of a non-empty committee. That said, [with `TARGET_AGGREGATORS_PER_COMMITTEE := 16` and `len(committee) := 128`, the probability of no-aggregator is only about `3.78E-08`](https://docs.google.com/spreadsheets/d/1C7pBqEWJgzk3_jesLkqJoDTnjZOODnGTOJUrxUMdxMA/edit#gid=0). Even with a larger committee, the no-aggregator is still very low: ![](https://storage.googleapis.com/ethereum-hackmd/upload_73bae7e4131771f42722780c805b1aef.png) We expect `~TARGET_AGGREGATORS_PER_COMMITTEE` aggregators will be selected, and we only need one honest responsible/well-connected aggregator. ## 3.4. Privacy Before the aggregator reveals its `slot_signature`, only the aggregator itself knows it is one of the aggregators. The rationale is that if the aggregator selection is predictable, it may be an attack vector. ## 3.5. Forge aggregate attack If the attester reveal (i) their own unaggregated attestation at [Step 2.2](#22-Attesting) and (ii) the `aggregate_and_proof` at [Step 2.3](#23-Aggregate), a rational validator can re-create and re-broadcast a valid `aggregate_and_proof` by setting `aggregate_and_proof.aggregate` to any other aggregate that includes the aggregator's attestation. To avoid the forge attack, the aggregator can choose not to broadcast their unaggregated attestation $Attestation_1$ at [Step 2.1.2](#212-Attesting). However, then the *other* aggregators will not include $Attestation_1$ to their aggregate, guaranteeing some amount of imperfect aggregates and increasing the number of attestations a beacon proposer would need to include to reach 100% committee participation. In the normal case, we expect aggregators to broadcast their attestation for increased chance of inclusion in other aggregates and thus on-chain. We also expect non-aggregators to see their attestation included with like-aggregates and thus have no need to forge. In the case that a validator sees an aggregate of like-attestation_data without their own included, a rebroadcast with the additional inclusion imposes a marginal increase in bandwidth over the global subnet, but this is not necessarily a net negative for the network. The main concern here would be avenues in which a single validator (or cartel) might be able to amplify the expected number of valid aggregates broadcast on the global subnet. Another idea to plug this hole is to have the aggregator sign over the proof+aggregate to prevent all forms of this type of forge, but decided not to introduce the additional signature/verification overhead before further analysis. # 4. Other strategies Eth2 teams are still researching other potential strategies. :::info To auditors: these strategies are out of the scope of auditing as they will almost certainly not be deployed in Phase 0. ::: ## 4.1. Local aggregation The simplest strategy. Every attester broadcasts their unaggregated attestation to the global channel and creates aggregates locally. ## 4.2. Handel: Practical Multi-Signature Aggregation for Large Byzantine Committees [^first] Handel allows the aggregation thousands of signatures in just under a second, even in a Byzantine context, by organizing the connections in levels. With the regular deployment, the participants in Handel map to their public IP addresses. To improve privacy, some other deployments include using linkable ring signatures, over DHTs, and Tor, .etc, are in research. ## 4.3. Heuristically partitioned attestation aggregation [^second] This strategy converts the bitfield to a binary tree. The attester can find their place in the tree easily, and each subtree can be heuristically aggregated. ## 4.4. Comparision | | Local aggregation | Naive strategy | Handel | Heuristically partitioned strategy | | - | -------- | -------- | ------ | ----- | | Efficiency | Very low | Low | High | Medium, depends on the number of participants and partitions | | Validator privacy | Very high | High | Low | Medium | | Security assumption | None | Probabilistic and one-sixteenth honest validators | Honest majority | Honest majority | # References [^first]: [Handel: Practical Multi-Signature Aggregation for Large Byzantine Committees](https://arxiv.org/abs/1906.05132) by Olivier Bégassat, Blazej Kolad, Nicolas Gailly, Nicolas Liochon [^second]: [Heuristically Partitioned Attestation Aggregation ](https://notes.ethereum.org/@protolambda/Hkjl_L6_H) by Protolambda