Preventing Eth2 Validator Failure

# Preventing Eth2 Validator Failure <center> Aditya Asgaonkar, Carl Beekhuizen Ethereum Research </center> --- [toc] ## Modes of Failure - **Beacon Node**: - Liveness: - Suggest attestation/block proposal on non-canonical chain - Go offline - **Validator Client**: - Safety: - Key Safety: Validator key compromise - Slashing Safety: Producing slashable attestations/blocks - Liveness: - Go offline ## Preventing Failure Different types of failures are prevented in different ways: - Liveness: Prevent through redundancy in associated components - Safety: - Key Safety: Secret-share the validator key across separate instances - Slashing Safety: Simple enough for Phase 0 -- no-slash logic at VC instances. [Things get complicated for Phase 1+](https://github.com/ethereum/eth2.0-specs/issues/1969), and possibly warrants fundamental changes in the BN/VC architecture. Out-of-scope for this document. Preventing both types simultaneously requires a Byzantine Agreement protocol (or stronger) for the redundant/secret-shared instances to agree on what attestations/blocks to produce. It's very important to identify the requirements of tolerating failures of various types before building a suitable protocol. ## Proposals for SSV protocols ### Type 1 **Objective:** Protect against VC safety, liveness, & key-safety failures **Protocol:** The setup is a single BN instance and multiple SSVCs 1. BN sends `msg` to all SSVCs 2. SSVC signs if `msg` passes the no-slashing check 3. BN receives signature shares from SSVCs, recovers threshold signature, and gossips on p2p. **Notes:** - Suitable for running SSVCs on low-powered devices - BN failure is catastrophic for liveness - Key safety is "free" as comes from the threshold key mechanism - Byzantine VCs have no effect on safety, so only crash faults need be considered. ∴ an arbitrary threshold can be set with the desired safety-liveness tradeoff. ### Type 2 **Objective:** Protect against all types of BN and VC failures #### Protocol: The setup is a number of BNs and a number of VCs which all run the following protocol: 1. Agreement within BN nodes: - Reliable broadcast BN BFT with leader change. Similar to [this](https://notes.ethereum.org/@adiasg/ssv-rbb#Protocol-Specification) 2. BNs suggest `msg` to SSVCs: - All BNs broadcast the agreed upon `msg` to all SSVCs 3. SSVC signature: - If a "good" `msg` is received from $2\cdot f_{BN} + 1$ BNs, then the SSVC signs `msg` and broadcasts to all BNs - If not, then the SSVC sends `LEADER_CHANGE` to all BNs 4. Signature aggregation/Leader change in BN - If $2\cdot f_{SSVC} + 1$ secret-share signed `msg`s are received, then aggregate the signature and gossip on p2p - If $2\cdot f_{SSVC} + 1$ `LEADER_CHANGE` are received, then change leader in local view and restart protocol **Notes:** - The number of BNs and number of VCs depends on the requirement, and need not be the same. - Running BN & VC instances on different machines can reduce risk from hardware failure. - BNs having each other as p2p peers aligns the chain-views of correct nodes up to the bounds of network latency. ## Random thoughts - Swiss cheese security model - Run one BN/VC from each client. - Adding 2 network level no-slash device provides same Slashing Safety as 4 SSVCs - In both cases, 3 no-slash DBs need to be corrupted to produce slashable messages - This does, however, come with a liveness compromise