NetClusters: Speedrunning SSF via networking-layer consolidation

# NetClusters: Speedrunning SSF via networking-layer consolidation *Authors: Francesco <@fradamt>, George <@asn>* ## Motivation Implementing [Single Slot Finality (SSF)](https://ethereum.org/fil/roadmap/single-slot-finality/) is hard due to [the difficulty](https://ethresear.ch/t/signature-merging-for-large-scale-consensus/17386) of aggregating signatures from every validator in one go. The [MaxEB proposal](https://ethresear.ch/t/increase-the-max-effective-balance-a-modest-proposal/15801) *could* drastically reduce the number of validators by allowing staking pools to consolidate up to 64 validators into one. MaxEB also has other benefits, like giving auto-compounding to solo stakers. Nonetheless, it is a more complicated feature than what we are going to be proposing, involving consensus-layer changes and administrative overhead for stakers. This might well be acceptable given the importance of Single Slot Finality and the other benefits of MaxEB, but we think it is worth exploring alternative paths, for example in case other features are prioritized over MaxEB. In this post, we propose a lighter version of consolidation solely for the purposes of signature aggregation. Our proposal leaves validators unchanged, but allows staking pools to define *clusters* of validators for the purposes of signature aggregation. ## Protocol Overview Our proposal works in two phases: **Phase 1: Cluster formation and registration** Each staking pool can optionally send a `ClusterFormation` message to specify which validators are contained in its cluster, and register it on the Beacon Chain. Staking pools are encouraged to register a cluster per node, containing all validators run on that node. A cluster can later be disbanded through another operation. At this point, there is essentially no meaningful change for stakers, other than being able to register clusters on the Beacon Chain, while from the protocol's perspective the only change is managing cluster registrations and records. After *enough* clusters have been defined, we can flip the switch and move to Phase 2. The amount of time between the two phases is indeterminate and can be decided with social consensus. The idea is that we would commit to only switching to Phase 2 when we can do so alongside a transition to an SSF protocol, as this would mean that registering a cluster never imposes any additional slashing risk, for reasons we are going to discuss later. **Phase 2: Cluster-based signature aggregation** The current signature aggregation protocol switches to using clusters instead of validators. Essentially each cluster (i.e. a staking pool) participates as a single entity in the signature aggregation protocol. From the protocol's perspective, the additional change is using the mapping from clusters to validators to "interprete" attestations, which now come with a short bitfield *over the clusters*. This allows bitfields to be much shorter (length equal to the number of clusters instead of the number of validators), both drastically reducing the bandwidth needed for aggregation and the verfication time, because the aggregate pubkey for each cluster can be pre-computed. From a staking pool's perspective, the only change would now be that a beacon node whose validators are registered as a cluster would pre-aggregate their attestations, bundling them together in a single attestation for the whole cluster. ## Detailed protocol changes In the following section we delve into the specific spec modifications needed to accomodate this proposal. Additionally, the *discussion section* goes over considerations about the slashing risks and strategies for managing the overall validator set. For now, let's go over the spec changes: ### Phase 1: Cluster formation and registration We add to each `Validator` object the field `cluster_index: ClusterIndex`, a `uint64` initially set to `FAR_ClUSTER_INDEX (2**64 - 1)`, and we add to the state a`cluster_registry: List[Cluster, MAX_CLUSTERS]`, where `Cluster` is this object: ```python class Cluster(Container): members: List[ValidatorIndex, MAX_VALIDATORS_PER_CLUSTER] formation_epoch: Epoch disbanding_epoch: Epoch ``` We also add operations for forming and disbanding a cluster. ```python class ClusterFormation(Container): members: List[ValidatorIndex, MAX_VALIDATORS_PER_CLUSTER] class SignedClusterFormation(Container): message: ClusterFormation signature: BLSSignature class ClusterDisbanding(Container): cluster_index: ClusterIndex class SignedClusterDisbanding(Container): message: ClusterDisbanding signature: BLSSignature ``` #### Operation processing These operations are individually processed as part of `process_operations`, in the state transition of the block in which they are included. Verifying ```SignedClusterFormation``` involves verifying that `signature` is the aggregated signature of all `members` over the `message`, and that `cluster_index` is set to `FAR_CLUSTER_INDEX` for all `members`. The `cluster_index` of all `members` is then set to the first available index in `state.cluster_registry`, and a `Cluster`is added to the registry at that index. The first available index might be somewhere in the existing list, if there are clusters for which `disbanding_epoch <= epoch`, i.e., already disbanded clusters. In that case, the first such cluster is overwritten. Regardless, the `formation_epoch` of the new `Cluster` is set to `compute_activation_exit_epoch(epoch)`. The `disbanding_epoch` is set to `FAR_FUTURE_EPOCH = 2**64 - 1`. Verifying `SignedClusterDisbanding` involves verifying that `signature` is the aggregated signature of all `members` of the cluster with the given `cluster_index`, that `cluster.formation_epoch <= epoch` and that `disbanding_epoch == FAR_FUTURE_EPOCH`. The`disbanding_epoch` is now set to `compute_activation_exit_epoch(epoch)`. For more details, see *Appendix A* for the precise spec changes required. The Discussion section also contains a paragraph on how these operations can be rate-limited. ### Phase 2: Cluster-based signature aggregation #### Network-layer aggregation Once we move to Phase 2, we can reuse the existing subnet-based aggregation infrastructure (or even do without subnets altogether, if the number of clusters is sufficiently small), but organize it around the set of clusters instead of the validator set. It would work as follows: - We move `committee_index` from `AttestationData` to the `Attestation` itself, as in [EIP-7549](https://eips.ethereum.org/EIPS/eip-7549). This way, it is not part of the message which is signed when attesting, allowing attestations from different committees to be aggregated. ```python class Attestation(Container): aggregation_bits: Bitlist[MAX_VALIDATORS_PER_COMMITTEE] committee_index: CommitteeIndex data: AttestationData signature: BLSSignature ``` - Other than that, we keep the `Attestation` format as is, but we change the interpretation of the `aggregation_bits` in an `Attestation` to refer to the clusters, i.e., the $i^{th}$ bit being set would mean that the $i^{th}$ cluster has contributed to the aggregate signature with all of its members. Validators that are not part of a cluster are treated as singleton clusters. - We keep the 64 attestation subnets, but now assign attestation subnet membership by clusters (including singleton clusters). In other words, we do a (possibly weighted) shuffle on the clusters. Within a subnet, we`REJECT` attestations from clusters that are not assigned to it according to the shuffle. - Within a subnet, the `aggregation_bits` of an `Attestation` object refer to the clusters assigned to that subnet, just like today they refer to the `beacon_committee` that is assigned to it, instead of the whole validator set.`AggregateAndProof` messages sent in the global topic also use subnet-specific bitfields, interpreted using the `committee_index`. This is so that the length of a bitfield is the size of its committee, instead of the whole set of clusters. #### Attestation processing on-chain: Translating from clusters to validators At this point, we have specified how to use clusters to transform the current attestation aggregation system into a more efficient one, essentially just having it operate based on clusters and not on validator indices. We could stop here and have the proposer transform the bitfields based on clusters into bitfields based on validator indices, before putting everything on chain. This way, the Beacon Chain would not need to interact with clusters at all, other than to keep a record of them. In particular, it could keep processing attestations as it currently does. On the other end, this means putting very large bitfields on chain, e.g., with 1M validators and 2x redundancy (as today, where we can include 128 attestations from 64 subnets) we would have 256 KBs of bitfields in every block. To avoid this, we need the Beacon Chain to be able to do the transformation from a bitfield over clusters to attesting indices by itself. For now let's assume that the proposer further aggregates the attestations from different subnets, by aggregating the signatures and merging the bitfields (into a single one over all clusters). Aggregating the signatures is possible since `CommitteeIndex` has been taken out of `AttestationData`. Then, all we need to change in the Beacon Chain is the `get_attesting_indices` function, which determines which validator indices contributed to an attestation. We need it to do the same thing, but given a bitlist over the clusters, instead of over a committee. ```python def get_attesting_indices(state: BeaconState, data: AttestationData, bits: Bitlist[VALIDATOR_REGISTRY_LIMIT]) -> Set[ValidatorIndex]: """ Return the set of validator indices that participated in the attestation. The bitfield is interpreted with the first part corresponding to active clusters at data.slot and the latter part corresponds to individual validators not in any active cluster at that slot. """ data_epoch = compute_epoch_at_slot(epoch) attesting_indices = set() # Determine active clusters for data_epoch active_cluster_indices = get_active_cluster_indices(state, data_epoch) num_active_clusters = len(active_cluster_indices) # Determine indices of validators not in a cluster for data_epoch isolated_validator_indices = get_active_isolated_validator_indices(state, data_epoch) assert(num_active_clusters + len(isolated_validator_indices) == len(bits)) # Process bits for active clusters for i, cluster_index in enumerate(active_cluster_indices): if bits[i]: # Add all active validators from the cluster active_cluster_members = [i for i in state.cluster_registry[cluster_index].members if is_active_validator(state.validators[i], data_epoch)] attesting_indices.update(active_cluster_members) # Process bits for validators not in a cluster start_isolated_index = num_active_clusters for i, validator_index in enumerate(isolated_validator_indices): if bits[start_isolated_index + i]: attesting_indices.add(validator_index) return attesting_indices def get_active_cluster_indices(state: BeaconState, epoch: Epoch) -> Sequence[ClusterIndex]: return [i for i, cluster in state.cluster_registry if is_active_cluster(cluster, epoch)] def get_active_isolated_validator_indices(state: BeaconState, epoch: Epoch) -> Sequence[ValidatorIndex]: """ Return the sequence of indices of active validators not in an active cluster at ``slot``. """ return [ValidatorIndex(i) for i, v in enumerate(state.validators) if is_active_validator(v, epoch) and (v.cluster_index == FAR_CLUSTER_INDEX or not is_active_cluster(state.cluster_registry[v.cluster_index], epoch) )] def is_active_cluster(cluster: Cluster, epoch: Epoch) -> bool: """ Check if a cluster is active at a given slot. """ return cluster.formation_epoch <= epoch < cluster.disbanding_epoch ``` We might want to avoid the final aggregation step done by the proposer, in order to still be able to include many (smaller) attestations per slot, and more easily allow for older attestations to be included using the extra space. From a space saving perspective, further aggregation does little to reduce the overall size since the signatures are at this point dominated by the bitfields. On the other hand, keeping the bitfields over the subnet-specific clusters requires the Beacon Chain to still be able to compute the shuffle over clusters, just like it is able to compute committees today. We demonstrate the changes required for this technique in *Appendix B*. ## Discussion ### Slashing risk The slashing risk for pools is necessarily higher once we move to SSF, because all validators are expected to vote in every slot, which makes a circuit breaker that turns off validator operation upon detection of a slashing event less effective. In other words, today a bad setup is likely to cause at most some fraction of 1/32 of the pool validators to be slashed, whereas the same bad setup in a SSF protocol would cause that same fraction of *all* pool validators to be slashed. This is just an unavoidable consequence of SSF, since the goal is precisely to get full economic security at once. Contrary to MaxEB, this proposal allows this additional slashing risk to be in effect *only after we have successfully transitioned to SSF* (i.e. Phase 2). Moreover, it does not entail any extra slashing risk if pools only register clusters made up of validators ran on the same node, because all those validators would anyway sign over the same data, and being in the cluster only involves a pre-aggregation step in the node. In other words, a pool would have the baseline slashing risk of today's protocol in Phase 1, and the baseline slashing risk of an SSF protocol in Phase 2, and no more. ### Capping the cluster set To move from *Phase 1* to *Phase 2* we need to make sure that our signature aggregation protocol can handle the number of clusters. However, it's possible that in a post-phase-2 future, the cluster set grows to the point that makes signature aggregation impossible. To avoid this, we need to cap the number of clusters, to effectively make sure that signature aggregation is possible. We leave this mechanic to a future proposal. Note that this same problem needs to be solved regardless of how exactly we get to Single Slot Finality, e.g., if the validator set size sufficiently shrinks after MaxEB and we move to SSF, we would still need to enforce a cap on it. ### Rate-limiting cluster operations To rate-limit `ClusterFormation` and `ClusterDisbanding` in a simple way, we set bounds like `MAX_CLUSTER_FORMATION_PER_SLOT = 16` and `MAX_CLUSTER_DISBANDING_PER_SLOT = 16`. To still preserve fairness of access to the operations, we rate limit their initiation to 16 validators with contiguous indices, which rotate every slot. For example, in the first slot after this is introduced, only validators with indices $[0, 15]$ can initiate one of these operations, then $[15,31]$ etc... By this we mean that at least one of the `members`, and thus of the signers, must be one of the allowed validators. Note that, for either operation, a single validator can only match once: once a validator is involved in a `ClusterFormation`, it cannot be involved in another one in the same block, because its `cluster_index` is now set to something other than `FAR_CLUSTER_INDEX`. Similarly, a `ClusterDisbanding` sets the `disbanding_epoch`, which prevents another `ClusterDisbanding` involving the same cluster. One way to implement the rotating set of allowed initiators is to just add a field `next_validator_index_for_cluster_operations: ValidatorIndex` to the state, to track where the allowed initiator indices begin in the next slot. ## Appendix ### Appendix A: Cluster operations spec changes Here is the spec code required to process cluster operations: ```python def process_cluster_formation(state: BeaconState, signed_cluster_formation: SignedClusterFormation) -> None: """ Process a cluster creation operation. """ cluster_formation = signed_cluster_formation.message members = cluster_formation.members assert len(members) <= MAX_VALIDATORS_PER_CLUSTER allowed_initiator_indices = range(state.next_validator_index_for_cluster_operations, state.next_validator_index_for_cluster_operations + 16) # Check that at least one of the members is an allowed initiator assert any(validator_index in allowed_indices for validator_index in members) # Verify the aggregated signature pubkeys = [state.validators[i].pubkey for i in members] domain = get_domain(state, DOMAIN_CLUSTER_FORMATION) signing_root = compute_signing_root(cluster_formation, domain) assert bls.FastAggregateVerify(pubkeys, signing_root, signed_cluster_formation.signature) # Find the first available cluster index available_cluster_index = find_first_available_cluster_index(state) # Set the cluster_index for each validator in the cluster for validator_index in cluster_formation.members: validator = state.validators[validator_index] # Validator must not already be part of a cluster assert validator.cluster_index == FAR_CLUSTER_INDEX validator.cluster_index = available_cluster_index add_cluster_to_registry(state, members, available_cluster_index) def add_cluster_to_registry(state: BeaconState, members: List[ValidatorIndex, MAX_VALIDATORS_PER_CLUSTER], available_cluster_index: ClusterIndex) -> None: epoch = compute_epoch_at_slot(state.slot) cluster = Cluster(members=members, formation_epoch=compute_activation_exit_epoch(epoch), disbanding_epoch=FAR_FUTURE_EPOCH) if available_cluster_index < len(state.cluster_registry): state.cluster_registry[available_cluster_index] = cluster else: state.cluster_registry.append(cluster) def find_first_available_cluster_index(state): """ Finds the first available cluster index. """ # Implement logic to find the first unused cluster index in the state epoch = compute_epoch_at_slot(state.slot) for index, cluster in enumerate(state.cluster_registry): if cluster.disbanding_epoch <= epoch return index return len(state.cluster_registry) def process_cluster_disbanding(state: BeaconState, signed_cluster_disbanding: SignedClusterDisbanding) -> None: """ Process a single cluster removal operation. """ epoch = compute_epoch_at_slot(state.slot) cluster_index = signed_cluster_disbanding.message.cluster_index cluster = state.cluster_registry[cluster_index] assert(cluster.formation_epoch <= epoch) assert(cluster.disbanding_epoch == FAR_FUTURE_EPOCH) members = cluster.members allowed_initiator_indices = range(state.next_validator_index_for_cluster_operations, state.next_validator_index_for_cluster_operations + 16) # Check that at least one of the members is an allowed initiator assert any(validator_index in allowed_initiator_indices for validator_index in members) # Verify the aggregated signature pubkeys = [state.validators[i].pubkey for i in members] domain = get_domain(state, DOMAIN_CLUSTER_DISBANDING) signing_root = compute_signing_root(signed_cluster_disbanding.message, domain) assert bls.FastAggregateVerify(pubkeys, signing_root, signed_cluster_disbanding.signature) cluster.disbanding_epoch = compute_activation_exit_epoch(epoch) ``` ### Appendix B: Attestation processing without aggregation In this appendix we describe an alternative attestation processing scheme where the proposer does not aggregate the clusters' signatures. Instead, the attestations with bitfields over clusters are put on-chain, and attestation processing involves computing `ClusterCommittees` (akin to what beacon committees today), so that `get_attesting_indices` can work with such cluster-specific bitfields. ```python class ClusterCommittee(Container): cluster_indices: Sequence[ClusterIndex] isolated_validator_indices: Sequence[ValidatorIndex] def get_cluster_committee(state: BeaconState, epoch: Epoch, index: CommitteeIndex) -> ClusterCommittee: """ Return the cluster committee at ``slot`` for ``index``. """ committees_per_slot = get_committee_count_per_slot(state, epoch) active_cluster_indices = get_active_cluster_indices(state, epoch) isolated_validator_indices = get_active_isolated_validator_indices(state, epoch) merged_indices = active_cluster_indices + isolated_validator_indices committee_indices = compute_committee( indices=[i for i in range(len(merged_indices))], seed=get_seed(state, epoch, DOMAIN_BEACON_ATTESTER), index=index, count=committees_per_slot, ) num_active_clusters = len(active_cluster_indices) cluster_indices = [merged_indices[i] for i in committee_indices if i < num_active_clusters] isolated_validator_indices = [merged_indices[i] for i in committee_indices if i >= num_active_clusters] return ClusterCommittee( cluster_indices=cluster_indices, isolated_validator_indices=isolated_validator_indices ) def get_committee_count_per_epoch(state: BeaconState, epoch: Epoch) -> uint64: """ Return the number of committees for the given ``epoch``. """ active_cluster_indices = get_active_cluster_indices(state, epoch) isolated_validator_indices = get_active_isolated_validator_indices(state, epoch) num_all_active_clusters = uint64(len(active_cluster_indices) + len(isolated_validator_indices) return max(uint64(1), min( MAX_COMMITTEES_PER_SLOT, num_all_active_clusters // TARGET_COMMITTEE_SIZE)) ``` With this, we can modify `get_attesting_indices` to take an attestation whose bitlist is over a cluster committee. To do so, we pass to it the `committee_index` from the `Attestation` object as well (since it's not in `AttestationData` anymore). ```python def get_attesting_indices(state: BeaconState, data: AttestationData, bits: Bitlist[VALIDATOR_REGISTRY_LIMIT], committee_index: CommitteeIndex) -> Set[ValidatorIndex]: data_epoch = compute_epoch_at_slot(data.slot) cluster_committee = get_cluster_committee(state, data.slot, committee_index) cluster_indices = cluster_committee.cluster_indices isolated_validator_indices = cluster_committee.isolated_validator_indices assert(len(cluster_indices) + len(isolated_validator_indices) == len(bits)) attesting_indices = set() # Process bits for clusters for i, cluster_index in enumerate(cluster_indices): if bits[i]: # Add all active validators from the cluster active_cluster_members = [i for i in state.cluster_registry[cluster_index].members if is_active_validator(state.validators[i], data_epoch)] attesting_indices.update(active_cluster_members) # Process bits for validators not in a cluster start_isolated_index = len(cluster_indices) for i, validator_index in enumerate(isolated_validator_indices): if bits[start_isolated_index + i]: attesting_indices.add(validator_index) return attesting_indices ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.