Weak Subjectivity in Eth2.0

# Weak Subjectivity in Eth2.0 <center> - Aditya Asgaonkar, Ethereum Research </center> **Thanks to Carl Beekhuizen & Danny Ryan for discussion.** This document is aimed for Eth2.0 client teams to understand weak subjectivity periods and their implication. **Edit**: Fixes in `get_latest_weak_subjectivity_checkpoint_epoch()`. Changed `state.finalized_checkpoint.epoch//weak_subjectivity_mod` to `state.finalized_checkpoint.epoch - (state.finalized_checkpoint.epoch % weak_subjectivity_mod)`. Thanks to Meredith B. ## Contents [toc] ## Weak Subjectivity? For background on weak subjectivity, I recommend Vitalik's post from 2014: https://blog.ethereum.org/2014/11/25/proof-stake-learned-love-weak-subjectivity/ ## Weak Subjectivity Checkpoint When a blockchain network is started, the entire network agrees on a specific genesis block. Any new nodes entering the network after genesis time have to download the genesis block to start validating the chain. There is an implicit assumption that this genesis block is "irreversible", "finalized", "un-orphanable", or whatever else you'd like to call a block that is always found in the canonical chain. All future blocks in the chain are validated on the basis of the ground truth of the genesis block. A weak subjectivity checkpoint is similar in concept to a genesis block, except that it is at a non-genesis position in the chain. It is a block that the entire network agrees on as always being part of the canonical chain. Note that this is quite different than the concept of a "finalized" block -- if a node sees two conflicting finalized blocks, then the network has experienced consensus failure and the node cannot identify a canonical fork. On the other hand, **if a node sees a block conflicting with a weak subjectivity checkpoint, then it immediately rejects that block**. As far as the fork choice of nodes is concerned, the latest weak subjectivity checkpoint is the new genesis block of the network. ## Weak Subjectivity Period The weak subjectivity period is the number of recent epochs within which there must be a weak subjectivity chekpoint so that an attacker who takes control of the validator set at the beginning of the period is slashed at least a threshold amount in case a conflicting finalized checkpoint is produced. Another concept we will require is **Safety Decay `D`**, which can be defined as the loss in the 1/3rd consensus safety margin of the Casper FFG mechanism because of the changing validator set. The new safety margin that the mechanism can tolerate becomes `1/3 - D`. ### Calculating the Weak Subjectivity Period [Quick Version] We calculate the longest safe weak subjectivity period by analyzing the following attack: **An attacker takes control of the validator set at checkpoint `C0`. The attacker then finalizes conflicting checkpoints `C1` and `C2` for some future epoch. How far in the future should this epoch be at earliest, so that the attacker is slashed lesser than `1/3 - D` fraction of the total validator set?** ![](https://i.imgur.com/2W98xn9.png) In the optimal strategy for the attacker, in `C1`'s fork (and similarly in `C2`'s fork): - everyone who activates in between `C0` and `C1` has voted for `C1` - everyone who exited from between `C0` and `C2` has voted for `C1` So, the number of validators slashed will be `|V0|/3 - (A + E)`, where: - `|V0|` is the number of validators at `C0` - `A` is the max. number of activations between `C0` and the epoch of `C1` & `C2` - `E` is the max. number of exits between `C0` and the epoch of `C1` & `C2` **Note**: If the number of epochs is not too large, `A+E` is indeed maximized when `A` and `E` are the max. possible activations/exits respectively. We want to satisfy the condition: `|V0|/3 - (A + E) < (1/3 - D) * |V0|` Because the [churn limit depends on the number of validators](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#get_validator_churn_limit), we get two cases: - For `|V0| ≥ MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT`: - `A = E = (C1.epoch - C0.epoch) * |V0|/CHURN_LIMIT_QUOTIENT` - The condition becomes: - `(C1.epoch - C0.epoch) > D * CHURN_LIMIT_QUOTIENT / 2 ` - For `|V0| < MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT`: - `A = E = (C1.epoch - C0.epoch) * MIN_PER_EPOCH_CHURN_LIMIT` - The condition becomes: - `(C1.epoch - C0.epoch) > D * |V0| / (2 * MIN_PER_EPOCH_CHURN_LIMIT)` Calculating the weak subjective period using these equations gives us the following table:    ![](https://i.imgur.com/0VTKdy4.png) **Note 1**: *This calculation ignores things such as `MIN_VALIDATOR_WITHDRAWABILITY_DELAY = 256 epochs (~27 hours)`, so all the values in here are off by that amount. This only means that the maximum safe weak subjectivity periods in the table are tighter than actually necessary, and we can add 27 hours to each entry in the "Duration" column in this table. The maximum safe weak subjectivity period is always at least 27 hours.*  ## Implications for Client Teams For new clients to correctly sync to the canonical head of the beacon chain, they must use the latest weak subjectivity checkpoint as the starting point of their sync. Since the latest weak subjectivity checkpoint must be within the maximum safe weak subjectivity period, two things become important: 1. The method of distributing weak subjectivity checkpoint states 2. The process of updating the latest weak subjectivity checkpoint state ### Distributing Weak Subjectivity Checkpoint States These are the feasible ways to distribute the weak subjectivity checkpoint states (in the order of increasing implementation complexity): 1. **Embed state in the client codebase** - Clients embed a recent checkpoint state into their source & binary distributions - Pros: - As secure as distributing the client source/binary itself - This is the most direct way of distributing checkpoint states - Incredibly low implementation overhead, minimal changes to existing client code - Cons: - Bloats the size of the client source/binary - Complicates the process of creating nightly builds 2. **Pass state as a parameter to the client process** - Users manually download recent checkpoint states from reliable internet sources and pass it as a paramater to their client process - Pros: - Relatively low implementation overhead, few lines of code change - Cons: - Exposure to user error, since the user has to find a reliable source for the checkpoint state 3. **Have bootnodes provide state via special endpoints** - Bootnodes provide special endpoints which serve a recent checkpoint state - Security can be increased by verifying against a hash embedded in the client source/binary - Cons: - Low decentralization factor, bootnodes become responsible for more than just peer lists - Introduces new vectors to DoS bootnodes 4. **Serve state over p2p** - Introduce a new p2p "weak subjectivity checkpoint state" endpoint - Security can be increased by verifying against a hash embedded in the client source/binary - Pros: - Most decentralized approach to providing these states - Cons: - High implementation complexity, requires a new p2p endpoint to be defined - Introduces new DoS vectors - Causes degraded performance and/or storage overheads **Client teams are strongly recommended to implement (2. Pass state as a parameter to the client process) in addition to one or more of the other options.** Which other option to implement requires more discussion, and should be thoroughly discussed between the clients teams & the research team before moving forward. ### Updating Weak Subjectivity Checkpoint States Irrespective of which method for distributing weak subjectivity checkpoint states is agreed upon, each client team must maintain a record of the latest such checkpoint state, and provide (at the very least) a hash of the state on their GitHub/website. In spite of any other methods of distributing the subjective checkpoint state, users should always be able to verify the state against the hash on their client's GitHub/website, and deploy their new client knowing that it's operating on the correct beacon chain. It is also expected that block explorers provide (at the very least) the hash of a recent subjective checkpoint state, as a public service to increase security of the network. The recommended safety decay is 10%, and the associated weak subjectivity period can be derived from the number of validators at that point in time using the calculations in this document (use the included table for quick lookup). In practice, the subjective state should be updated with enough margin to spare for misc. SNAFUs. In the initial phase of the network when there are an abysmally low number of validators, each provider needs to update their weak subjectivity checkpoint state nightly. As the validator count increases, we can regularly increase the weak subjectivity period in multiples of 256 epochs (~27 hours). When the network reaches ~250K validators, this period will stabilize at around 2 weeks. To simplify matters for providers, it should be easy to identify which epochs can be made weak subjectivity checkpoints. We shall use the following method: ```python def get_latest_weak_subjectivity_checkpoint_epoch(state, safety_decay=0.1): # Returns the epoch of the latest weak subjectivity checkpoint for the given # `state` and `safety_decay`. The default `safety_decay` used should be 10% (= 0.1) # The calculations in this document do not account for the withdrawability delay. # We should factor that in by adding MIN_VALIDATOR_WITHDRAWABILITY_DELAY to the # calculated subjectivity period. weak_subjectivity_mod = MIN_VALIDATOR_WITHDRAWABILITY_DELAY val_count = len(get_active_validator_indices(state, get_current_epoch(state))) if val_count >= MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT: weak_subjectivity_mod += 256 * ((safety_decay*CHURN_LIMIT_QUOTIENT/2) // 256) else: # This means val_count < MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT weak_subjectivity_mod += 256 * ((safety_decay*val_count/(2*MIN_PER_EPOCH_CHURN_LIMIT)) // 256) return state.finalized_checkpoint.epoch - (state.finalized_checkpoint.epoch % weak_subjectivity_mod) ``` This is what the values look like for `safety_decay = 0.1`: | `val_count` | `weak_subjectivity_mod` | Duration in days | |:----------- | -----------------------:| ----------------:| | 8192 | 256 | 1.1 | | 16384 | 256 | 1.1 | | 32768 | 512 | 2.3 | | 65536 | 1024 | 4.6 | | 131072 | 1792 | 8.0 | | 262144 | 3328 | 14.8 | --- --- --- --- --- --- ## Calculating the Weak Subjectivity Period [Complete Version] This section presents [Vitalik's ethresear.ch post on this topic](https://ethresear.ch/t/weak-subjectivity-under-the-exit-queue-model/5187) in more detail. To answer the question of the longest weak subjectivity period that maintains consensus safety with a decay of `D`, we ask: **The current latest finalized checkpoint is `C1`. An attacker takes control of all the validators from a past checkpoint `C0` and produces a finalized block `C2` that conflicts with `C1`. What is the most recent such `C0` so that the attacker experiences a slashing of lesser than `1/3 - D` fraction of validators?** ![](https://i.imgur.com/BZLkPDv.png) After calculating the answer (See **Appendix Part A** and **Appendix Part B**), we get the constraint: `2 * (C1.epoch - C0.epoch) * max(MIN_PER_EPOCH_CHURN_LIMIT, |V1|/CHURN_LIMIT_QUOTIENT) ≥ D * |V1|` The final result becomes (with `|V1|` as the validator set size at `C1`): - For `|V1| ≤ MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT`: Max Safe Weak Sub. Period = `D * |V1| / (2 * MIN_PER_EPOCH_CHURN_LIMIT)` - For `|V1| > MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT`: Max Safe Weak Sub. Period = `D * CHURN_LIMIT_QUOTIENT / 2` This comes out to be the same as the numbers in the earlier calculation in the [Quick version](#Calculating-the-Weak-Subjectivity-Period-Quick-Version). --- ## Appendix ### Part A: Computing the minimum slashable validator set The main assumption here is that validator activations and exits are rate-limited on a per-epoch basis. Notation: - `V1`, `V2` are the validator sets at checkpoints `C1`, `C2` - `Q1`, `Q2` are the validators who voted for checkpoints `C1`, `C2` - `A1`, `A2` are the validators that activated between `C0` & `C1`, and `C0` & `C2` respectively - `E1`, `E2` are the validators that exited between `C0` & `C1`, and `C0` & `C2` respectively We want to solve for the condition that the attacker is slashed lesser than `1/3 - D` fraction of validators, i.e., `|Q1 ∩ Q2| ≤ (1/3 - D) * |V1|`. The attacker has to satisfy `|Q1| ≥ 2/3 * |V1|` and `|Q2| ≥ 2/3 * |V2|` while keeping `|Q1 ∩ Q2| ≤ (1/3 - D) * |V1|`, in the least number of epochs possible. Some observations about the optimal strategy for the attacker: - Exiting the same validator from `V0` in both `V1` and `V2` is not optimal since it wastes one spot in the validator exit and doesn't contribute to the objective. So, `E1 ∩ E2 = {}`. - Similarly, activating the same validator activate in both `V1` and `V2` is not optimal. So, `A1 ∩ A2 = {}`. - Since those who activated in `V1` have not activated in `V2` and those who exited in `V2` have not exited from `V1`, the set difference `V1 - V2 = A1 ∪ E2` and `V2 - V1 = A2 ∪ E1`. - All validators in `V1 - V2` are in `Q1` and vice-versa, since that is the largest that `Q1` can be without increasing `Q1 ∩ Q2` and vice-versa. This is what we have till now: ![](https://i.imgur.com/6xxLZQw.png) ![](https://i.imgur.com/WbbB3rm.png) Now for the magic trick from Vitalik's post: - Let `I = V1 ∩ V2`. - `I = V1 - (V1 - V2)`, so `|I| = |V1| - A1 - E2`. Similarly, `|I| = |V2| - A2 - E1`. Using a linear combination of both equations we get: `|I| = 1/3 * (|V1| - A1 - E2) + 2/3 * (|V2| - A2 - E1)` - `|Q1 ∩ I| = |Q1| - |V1 - V2| ≥ 2/3 * |V1| - A1 - E2`, and `|Q2 ∩ I| = |Q2| - |V2 - V1| ≥ 2/3 * |V2| - A2 - E1` - `|Q1 ∩ Q2| = |Q1 ∩ I| + |Q2 ∩ I| - |I|` After simplifying we get: `|Q1 ∩ Q2| ≥ 1/3 * |V1| - 1/3 * (A2 + E1) - 2/3 * (A1 + E2)` ### Part B: Searching for the most recent suitable attack block Since the attacker has an upper limit for how much it can be slashed, and making `|Q1 ∩ Q2|` smaller takes a larger number of epochs, we calculate for: `1/3 * |V1| - 1/3 * (A2 + E1) - 2/3 * (A1 + E2) ≤ (1/3 - D) * |V1|` We also know that the maximum height of the attacker's finalized checkpoint `C2` is same as the height of our latest finalized checkpoint `C1`, since blocks are limited by the `SECONDS_PER_SLOT` parameter. In the optimal strategy, the attacker will use the maximum height for `C2` since that gives the greatest leeway to change the validator set and reduce `|Q1 ∩ Q2|`. We know `V1` and `D`, so we just have to find the most recent `C0` where we can safisfy this condition for some `C2` at the same height as `C1`: `(2/3 * A1 + 1/3 * E1) + (1/3 * A2 + 2/3 * E2) ≥ D * |V1|` To solve this search problem, we need to plug in the Eth2.0 activation and exit rate limits. In practice, validators who request activation/exit are actually added to/removed from the active validator set after a delay. For simplicity, we will assume that activations and exits are instantaneous. This is acceptable because it will only lead to a more recent `C0` than can be allowed with activation/exit delays, leading to us calculating a more conservative weak subjectivity period. The relevant rate-limiting function is [this](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#get_validator_churn_limit): ```python def get_validator_churn_limit(state: BeaconState) -> uint64: """ Return the validator churn limit for the current epoch. """ active_validator_indices = get_active_validator_indices(state, get_current_epoch(state)) return max(MIN_PER_EPOCH_CHURN_LIMIT, len(active_validator_indices) // CHURN_LIMIT_QUOTIENT) ``` Now, for each candidate `C0` in `C1`'s chain and in the descending order of height, we check if: `(2/3 * A1 + 1/3 * E1) + (1/3 * A2 + 2/3 * E2) ≥ D * |V1|` - `|V1|`, `D` are known constants - `A1`, `E1` are known from the chain - `A2`, `E2` need to be computed using the size of the validator set at `C0` and the validator churn limits **Claim 1:** `(1/3 * A2 + 2/3 * E2)` is maximized when the attacker makes the max. allowable activations & exits between `C0` and `C2`, and that for this case: - `A2 = E2 = (C1.epoch - C0.epoch) * max(MIN_PER_EPOCH_CHURN_LIMIT, |V1|/CHURN_LIMIT_QUOTIENT)` **Claim 2:** To account for the worst case, the max. possible value for `(2/3 * A1 + 1/3 * E1)` has to be calculated. This value is maximized when there have been max. allowable activations & exits between `C0` and `C1`, and that for this case: - `A1 = E1 = (C1.epoch - C0.epoch) * max(MIN_PER_EPOCH_CHURN_LIMIT, |V1|/CHURN_LIMIT_QUOTIENT)` (**Intuition for my claims**: The other viable strategy would have been to first do only max allowable activations for a few epochs to raise the churn limit, and then do max allowable activations + exits with the raised churn limit. For `(1/3 * A2 + 2/3 * E2)` from this strategy to be larger than the earlier one of just max allowable activations + exits every epoch, the number of epochs needed would be very large.) From these two claims, we simplify the search condition to: `2 * (C1.epoch - C0.epoch) * max(MIN_PER_EPOCH_CHURN_LIMIT, |V1|/CHURN_LIMIT_QUOTIENT) ≥ D * |V1|` The final result becomes: - For `|V1| ≤ MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT`: Max Safe Weak Sub. Period = `D * |V1| / (2 * MIN_PER_EPOCH_CHURN_LIMIT)` - For `|V1| > MIN_PER_EPOCH_CHURN_LIMIT * CHURN_LIMIT_QUOTIENT`: Max Safe Weak Sub. Period = `D * CHURN_LIMIT_QUOTIENT / 2`