# A note on Ethereum 2.0 attestation aggregation strategies ###### tags: eth2 v0.10.0 [TOC] :::info By @hwwhww 20200115 Special thanks to @djrtwo for the review. ::: This document describes the requirements of Ethereum 2.0 attestation aggregation strategies and introduces the current "naive" aggregation strategy. # 1. Background ## 1.1. Consensus layer ### 1.1.1 Committee A committee is an array of ValidatorIndexs that are assigned to validate the block at a specific slot. - For each *epoch*, all the validators are [shuffled and grouped](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#compute_committee) into different committees. There may be multiple committees at one slot, but we set the maximum committees count per slot [**MAX_COMMITTEES_PER_SLOT** committees for each slot](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#get_committee_count_at_slot) to 64 now. - We can use Slot and CommitteeIndex to identify a specific committee: - Slot: the slot number - CommitteeIndex: the index of the committee at the given slot :::info In **phase 1**, the validators of a certain committee have to attest a certain shard block. ::: ### 1.1.2. Attestation The validators of the committees of a certain slot have to **attest** the block of the slot. The message type of their "vote" is called [Attestation](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#attestation): python class Attestation(Container): aggregation_bits: Bitlist[MAX_VALIDATORS_PER_COMMITTEE] data: AttestationData signature: BLSSignature  - aggregation_bits: a bitfield mapped to the specific committee. The bits represented which validators participated the attestation. - data: the [AttestationData](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#attestationdata), including slot: Slot and committee_index: CommitteeIndex to point out the committee set and the vote content beacon_block_root: Checkpoint, source: Checkpoint, and target: Checkpoint. - signature: The 96 bytes *aggregate* (see below) [BLS12-381 signature](https://hackmd.io/@benjaminion/bls12-381). ### 1.1.3. Aggregation BLS12-381 signatures are **aggregatable**. For example: 1. With the same signing message: - The validator $V_1$ with privkey $Privkey_1$ and pubkey $Pubkey_1$ signs message resulting in a 96-byte $Sig_1$ output - The validator $V_2$ with privkey $Privkey_2$ and pubkey $Pubkey_2$ signs message resulting in a 96-byte $Sig_2$ output We can use the [BLS cryptography library](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#bls-signatures) to get the aggregate $Sig_a$ from Aggregate([Sig1, Sig2]). The aggregate result can be verified with AggregateVerify([(Pubkey_1, message), (Pubkey_2, message)], Sig_a) Or in this case, because the same message has been signed for each signature, the result can be verified with FastAggregateVerify([Pubkey_1, Pubkey_2], message, Sig_a). 2. We can also aggregate more signatures Aggregate([Sig1, Sig2, ... SigN]). 3. We can also aggregate two aggregate signatures. For example, we can get aggregate signature from Aggregate(Aggregate([Sig1, Sig2]), Aggregate([Sig3, Sig4])). :::warning However, if the aggregate is **overlapping**, e.g., Aggregate(Aggregate([Sig1, Sig2]), Aggregate([Sig2, Sig3])), we are not able to verify within eth2 consensus because the bitfield we use to track participation cannot represent the overlapping participants. Note that the verify functions can intrinsically handle this case by counting multiple pubkeys for overlaps, but our choice of data structure does not allow for double counting. ::: ## 1.2. Networking layer The messages are broadcasted with libp2p gossipsub protocol. - The beacon nodes subscribe to some [topics](https://github.com/ethereum/eth2.0-specs/blob/v0.10.0/specs/phase0/p2p-interface.md#topics-and-messages). Nodes can publish the message on the topic, and the subscribers will get the message. We frequently refer to gossipsub topics as "subnets" and can be thought of as overlays on the base p2p network. Learn more about the basics [here](https://docs.libp2p.io/concepts/publish-subscribe/)! - There are topics that **every beacon node** is expected to listen to. We call them the ["global" topics](https://github.com/ethereum/eth2.0-specs/blob/v0.10.0/specs/phase0/p2p-interface.md#global-topics). ### 1.2.1. Attestation subnets The idea of "shards" will be introduced in phase 1. In phase 0, we go-ahead to introduce the concept of "subnets". In the networking spec, we set a constant ATTESTATION_SUBNET_COUNT as the total number of attestation subnets. ATTESTATION_SUBNET_COUNT is set to 64 now. We use subnet_id as the identifier of the subnet. Each subnet_id is set to index % ATTESTATION_SUBNET_COUNT, where index is the CommitteeIndex of the given committee. :::info In **phase 1**, these subnets will be mapped to the shards. ::: ## 1.3. Aggregation strategy requirements The requirements of an aggregation strategy are: - **Non-overlapping aggregate**: Since we use bitfield to record the participants, each individual signature can only be included in an aggregate once. - **Validator privacy**: Do not require an explicit coupling of validator (consensus) ID to node ID for some affordance of privacy and flexibility. (Note: this does not guarantee privacy, just keeps some avenues and techniques open. Information is still leaked via messages sent and pubsub topic subscriptions.) - **Efficiency**: Minimize network traffic, especially on the global channel. # 2. The "naive" aggregation strategy mechanism This basic aggregation strategy, designed by Danny Ryan, Vitalik Buterin, and co, is what we are planning to use in phase 0 launch. ## 2.1. Self-elect We use [two helper functions](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#aggregation-selection) to describe the aggregator selection: python def get_slot_signature(state: BeaconState, slot: Slot, privkey: int) -> BLSSignature: domain = get_domain(state, DOMAIN_BEACON_ATTESTER, compute_epoch_at_slot(slot)) signing_root = compute_signing_root(slot, domain) return bls.Sign(privkey, signing_root)  python def is_aggregator(state: BeaconState, slot: Slot, index: CommitteeIndex, slot_signature: BLSSignature) -> bool: committee = get_beacon_committee(state, slot, index) modulo = max(1, len(committee) // TARGET_AGGREGATORS_PER_COMMITTEE) return bytes_to_int(hash(slot_signature)[0:8]) % modulo == 0  The validators of the committee at slot with committee index index generate their slot_signatures by calling get_slot_signature(state, slot, privkey) -> BLSSignature function. Then they call is_aggregator(state, slot, index, slot_signature) function with the given slot_signature to check if they are one of the **aggregators**. is_aggregator function has some features: - **Self-elect only**: only the aggregators themselves can reveal their slot_signature. - **Everyone can verify**: everyone who received the slot_signature can verify it with is_aggregator function. - **Probabilistic selection**: TARGET_AGGREGATORS_PER_COMMITTEE is the target aggregator per committee and is set to 16 now. ## 2.2. Attesting Each attester [creates and broadcast an individaul (unaggregated) attestion](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#attesting) to [committee_index{subnet_id}_beacon_attestation subnet](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#attestation-subnets). The honest attesters broadcast their attestation when SECONDS_PER_SLOT / 3 seconds after the start of slot or when once received a block from the expected proposer (whichever comes first). ## 2.3. Aggregate The selected aggregators [construct an aggregate attestation]( https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#construct-aggregate) with the unaggregated attestations that they received earlier that have the _same AttestationData as their own_. And then, they create and broadcast [attestation_and_proof: AggregateAndProof](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#aggregateandproof) message to the global channel beacon_aggregate_and_proof. The honest validator broadcasts this message at SECONDS_PER_SLOT * 2 / 3 seconds after the start of slot. python class AggregateAndProof(Container): aggregator_index: ValidatorIndex aggregate: Attestation selection_proof: BLSSignature  - aggregator_index: the ValidatorIndex of the aggregator - aggregate: the aggregate attestation - selection_proof: the signature generated by get_slot_signature function. ## 2.4. Verification [Here](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#global-topics) is the detailed beacon_aggregate_and_proof topic verification procedure. Note that the message is only valid with the selection_proof of a selected aggregator. # 3. Analysis of naive aggregation strategy mechanism ## 3.1. Load balancing ### 3.1.1. Constants and assumption - $S$ subnets - $A$ aggregators per committee, targeting at TARGET_AGGREGATORS_PER_COMMITTEE aggregators. - $V$ active validators. - $V_c$ validators in the commitee, with MAX_VALIDATORS_PER_COMMITTEE limitation. - $B_s$ beacon nodes that are subscribing a subnet. - $B_g$ beacon nodes that are subscribing the global channel. - Attestation: - Fields: - aggregation_bits: $Vc\ //\ 8$ - data: $8+8+32+40+40 = 128$ - signature: $96$ - Total size: $224 + Vc\ //\ 8$ - Worst case: $480$ bytes ### 3.1.2. Gossipsub assumption - Each beacon node has $P$ peers - It takes $H_s$ hops for one (individual or aggregate) message to be fully broadcasted over the subnet with $B_s$ beacon nodes and target $P$ peers. - It takes $H_g$ hops for one aggregate message to be fully broadcast over the global channel with $B_g$ beacon nodes and target $P$ peers. ### 3.1.3. Per committee attestation messages overhead The total overhead of per committee attestation could be roughly represented as: $$\begin{split}subnet\ messages\ overhead &= individual\ message\ size * len(attesters) * len(nodes) * len(propagation\ hops) * len(subnets) \\ &= (224 + V_c\ //\ 8\ bytes) * (V_c\ validators\ in\ committee) * (B_s\ nodes) * (H_s\ hops) * (S\ subnets) \end{split}$$ $$\begin{split}global\ messages\ overhead &= aggregate\ message\ size * len(aggregators) * len(nodes) * len(propagation\ hops) \\ &= (224 + V_c\ //\ 8\ bytes) * (A\ aggregators) * (B_g\ nodes) * (H_g\ hops) \end{split}$$ Some possible directions that we can optimize it: 1. Decrease $B_g$: In practice one beacon node will likely serve many validators 2. Decrease hops: optimize the networking protocol 3. Set a reasonable expected aggregator count $A$: it's currently set to $16$, but perhaps we should increase or decrease this number. ## 3.2. Beacon block proposer strategy The beacon block proposer cannot further aggregate aggregates that are overlapping. Because we don't assign an aggregator to only aggregate a sub-committee, it's highly likely that the aggregators will create different attestations that includes the same attesters. The beacon proposer _can_, however, include multiple aggregates that have overlaps on-chain but with no additional reward for repeat inclusions. Each block can be packed with up to MAX_ATTESTATIONS (128 in current spec) attestations. To maximize rewards, the proposer would go through all aggregates with any yet-to-be-included validators, try to find the maximum set of attestations. (Also see [maximum disjoint set problem](https://en.wikipedia.org/wiki/Maximum_disjoint_set)) ## 3.3. Probabilistic selection Since the is_aggregator function is probabilistic, it's *possible* that no attester is selected as an aggregator of a non-empty committee. That said, [with TARGET_AGGREGATORS_PER_COMMITTEE := 16 and len(committee) := 128, the probability of no-aggregator is only about 3.78E-08](https://docs.google.com/spreadsheets/d/1C7pBqEWJgzk3_jesLkqJoDTnjZOODnGTOJUrxUMdxMA/edit#gid=0). Even with a larger committee, the no-aggregator is still very low: ![](https://storage.googleapis.com/ethereum-hackmd/upload_73bae7e4131771f42722780c805b1aef.png) We expect ~TARGET_AGGREGATORS_PER_COMMITTEE aggregators will be selected, and we only need one honest responsible/well-connected aggregator. ## 3.4. Privacy Before the aggregator reveals its slot_signature, only the aggregator itself knows it is one of the aggregators. The rationale is that if the aggregator selection is predictable, it may be an attack vector. ## 3.5. Forge aggregate attack If the attester reveal (i) their own unaggregated attestation at [Step 2.2](#22-Attesting) and (ii) the aggregate_and_proof at [Step 2.3](#23-Aggregate), a rational validator can re-create and re-broadcast a valid aggregate_and_proof by setting aggregate_and_proof.aggregate to any other aggregate that includes the aggregator's attestation. To avoid the forge attack, the aggregator can choose not to broadcast their unaggregated attestation $Attestation_1$ at [Step 2.1.2](#212-Attesting). However, then the *other* aggregators will not include $Attestation_1$ to their aggregate, guaranteeing some amount of imperfect aggregates and increasing the number of attestations a beacon proposer would need to include to reach 100% committee participation. In the normal case, we expect aggregators to broadcast their attestation for increased chance of inclusion in other aggregates and thus on-chain. We also expect non-aggregators to see their attestation included with like-aggregates and thus have no need to forge. In the case that a validator sees an aggregate of like-attestation_data without their own included, a rebroadcast with the additional inclusion imposes a marginal increase in bandwidth over the global subnet, but this is not necessarily a net negative for the network. The main concern here would be avenues in which a single validator (or cartel) might be able to amplify the expected number of valid aggregates broadcast on the global subnet. Another idea to plug this hole is to have the aggregator sign over the proof+aggregate to prevent all forms of this type of forge, but decided not to introduce the additional signature/verification overhead before further analysis. # 4. Other strategies Eth2 teams are still researching other potential strategies. :::info To auditors: these strategies are out of the scope of auditing as they will almost certainly not be deployed in Phase 0. ::: ## 4.1. Local aggregation The simplest strategy. Every attester broadcasts their unaggregated attestation to the global channel and creates aggregates locally. ## 4.2. Handel: Practical Multi-Signature Aggregation for Large Byzantine Committees [^first] Handel allows the aggregation thousands of signatures in just under a second, even in a Byzantine context, by organizing the connections in levels. With the regular deployment, the participants in Handel map to their public IP addresses. To improve privacy, some other deployments include using linkable ring signatures, over DHTs, and Tor, .etc, are in research. ## 4.3. Heuristically partitioned attestation aggregation [^second] This strategy converts the bitfield to a binary tree. The attester can find their place in the tree easily, and each subtree can be heuristically aggregated. ## 4.4. Comparision | | Local aggregation | Naive strategy | Handel | Heuristically partitioned strategy | | - | -------- | -------- | ------ | ----- | | Efficiency | Very low | Low | High | Medium, depends on the number of participants and partitions | | Validator privacy | Very high | High | Low | Medium | | Security assumption | None | Probabilistic and one-sixteenth honest validators | Honest majority | Honest majority | # References [^first]: [Handel: Practical Multi-Signature Aggregation for Large Byzantine Committees](https://arxiv.org/abs/1906.05132) by Olivier Bégassat, Blazej Kolad, Nicolas Gailly, Nicolas Liochon [^second]: [Heuristically Partitioned Attestation Aggregation ](https://notes.ethereum.org/@protolambda/Hkjl_L6_H) by Protolambda