RANDAOtage—committee corruption from RANDAO bias and validator outages
**TLDR**—We show how an attacker with 1/3 voting power can plausibly bias RANDAO randomness to corrupt a shard committee and create invalid or unavailable crosslinks. The attack assumes 1/3+ε of validators suffer a randomly-occurring outage (e.g. a popular validator client forks off, or a large staking pool goes offline) or are temporarily forced offline by the attacker.
def eq_k_probability(n, k, p):
return p**k * (1 - p)**(n - k) * math.comb(n, k)
def ge_k_probability(n, k, p):
return sum([eq_k_probability(n, i, p) for i in range(k, n + 1)])
# Protocol parameters
MAX_COMMITTEES_PER_SLOT = 64
SLOTS_PER_EPOCH = 32
TARGET_COMMITTEE_SIZE = 128
CORRUPT_COMMITTEE_THRESHOLD = int(math.ceil(2 / 3 * TARGET_COMMITTEE_SIZE))
Notice that if 1/3+ε of the validators are temporary offline an attacker with 1/3 voting power can use the fork choice rule to monopolise beacon block production during that time. Indeed, during the temporary outage, the attacker's LMD GHOST weight (1/3) is greater than the weight of the rest of the network (1/3-ε).
Notice also that the expected number of naturally occurring corrupt committees in a given epoch (assuming no [fancy RANDAO bias](https://ethresear.ch/t/rng-exploitability-analysis-assuming-pure-randao-based-main-chain/1825)) is `ge_k_probability(TARGET_COMMITTEE_SIZE, CORRUPT_COMMITTEE_THRESHOLD, ATTACKER_CONTROL) * SLOTS_PER_EPOCH * MAX_COMMITTEES_PER_SLOT` which is roughly `2**-36.36`. This means that it suffices for the attacker to grind over `2**37` RANDAO sampling combinations to corrupt more than one committee in expectation.
We now consider two types of validator outages:
1) **25 min random outage**—Suppose that 1/3+ε of the validators stop attesting for 25 minutes (125 slots) in a randomly occurring outage. (For example, a popular client has a consensus bug and forks off, or a large staking pool goes offline.) There is a `ge_k_probability(125, 37, 1/3)` (roughly 83%) probability that the attacker controls 37 of the 125 slots in the corresponding RANDAO sampling window.
1) **15 min timed outage**—Suppose that the attacker can force a timed outage of 1/3+ε of the network (see forced outage feasibility below) for 15 minutes. The attacker opportunistically waits for 15-minute (i.e. 90-slot) RANDAO windows where he controls at least 37 slots. The probability of such a RANDAO window naturally occurring is `ge_k_probability(90, 37, 1/3)` (roughly `2**-8.38`). In expectation the attacker has to wait less than 2 days (`2**8.38epochs * 6.4min / 60min / 24h`) for such a 15-minute window to occur.
In either type of outage (natural or forced) the attacker can monopolise beacon block production and selectively reveal his own blocks to bias the randomness such that at least one committee is corrupted in expectation. Both attacks can be simulated in advance and be guaranteed to succeed assuming an appropriately long outage.
**Forced outage feasibility**
We highlight a few ways in which 1/3+ε of the validators can plausibly be taken down by an attacker for 15 minutes:
* **client 0-days**
* **crash bug**—Consider a popular client with a 0-day crash bug. Stakers using this client can be taken down for an extended period of time. Automatic restarts do not help since the bug can be repeatedly triggered by the attacker. 15 minutes is insufficient time for the client developers to push a fix.
* **fork bug**—Similar to the crash bug except that a consensus bug is exploited which forks off a popular client away from the canonical chain.
* **DoS bug**—Similar to the crash bug except that the client is forced to grind to a halt (as opposed to outright crashing).
* **home staking DoS**—Home internet connections are not designed to withstand network DoS attacks. As such, amateur home stakers are susceptible to being taken down by having their home connection DoSed. Personal fallback internet connections (e.g. a 4G mobile connection) are also not designed to withstand network DoS attacks.
* **large pool takedown**—Centralisation around a large staking pool could allow an attacker to target just one entity to force a large validator outage.
Any combination of the above attacks can be simultaneously mounted to reach the required 1/3+ε threshold. Notice that if an attacker can take down more validators (say, 1/2+ε) then an attacker with just 1/4 of the voting power can monopolise beacon block production and similarly bias RANDAO.
Increasing `TARGET_COMMITTEE_SIZE` to 256 is sufficient to remove this attack. Indeed, an attacker would have to grind through `math.log2(ge_k_probability(256, math.ceil(2/3 * 256), 1/3))` (roughly `2**90`) committee combinations which is not computationally feasible during the relatively small outage attack windows.