owned this note
owned this note
Published
Linked with GitHub
# Notes on allowing re-activations of exited validators
It would be nice if we can allow validators who have exited (and even who are withdrawable) to re-activate. This would significantly reduce the risk to stakers in this early period because validators would have the safety of knowing that if they unexpectedly cannot stake for some time they can exit, but would still be free to resume staking if they resolve their issue.
However, there are technical challenges in doing this. This document attempts to cover some of these challenges.
### Who can re-activate and slashability
Currently, `is_slashable_validator` is defined as follows:
```python
def is_slashable_validator(validator: Validator, epoch: Epoch) -> bool:
return (
(not validator.slashed) and
(validator.activation_epoch <= epoch < validator.withdrawable_epoch)
)
```
If a validator is in the withdrawable state, it is "safe", and cannot be slashed anymore. However, this leads to one issue: if we want to preserve this invariant, it should not be possible for the signing key to re-activate the validator, because then an attacker who has stolen a validator's signing key could first re-activate the validator and then slash them.
Hence, we do one of the following:
1. Require re-activations to happen with the withdrawal key and not the signing key
2. Weaken the invariant around withdrawable validators being non-slashable.
(1) is IMO a weak choice because (i) it encourages people to take their withdrawal keys out of cold storage, and (ii) it does not work well with future forms of withdrawal credentials other than BLS keys. For (2), we could just make slashing always possible, or we could weaken the conditions partially, by making "withdrawable" and "withdrawn" into separate statuses, with only the "withdrawn" status conferring unslashability. In the long run, a "withdrawn" status will need to exist regardless (likely implemented by the validator object being removed or replaced with a stub and a claimable record being added to some Merkle tree), so this does not add "extra" complexity.
If we just want to allow [sortition](https://github.com/ethereum/eth2.0-specs/issues/2137), then because dormant validators never enter the withdrawable state, the solution is much simpler: we can just remove the `validator.activation_epoch <= epoch` criterion.
### Validator statuses and epochs
One major meta-problem in the spec is that the spec in many places relies on phase change epochs (`activation_eligibility_epoch`, `activation_epoch`, `exit_epoch`, `withdrawable_epoch`) to determine its current state. This logic becomes problematic when re-activation is possible, as the same phase change can happen multiple times. A major case of this worth explicit attention is the use of `get_beacon_committee` and other methods that call into `get_active_validator_indices`, potentially using historical epochs.
#### Current functions that use validator statuses
* `is_active_validator(index, epoch)`: used directly only using the current or previous epoch, except when used in `get_active_validator_indices`
* `get_active_validator_indices`: used for old epochs in `get_committee_count_per_slot` and `get_beacon_committee`
* `get_committee_count_per_slot`: used in `get_beacon_committee`, and in `process_attestation`, though in the latter case it can only be used with the current and previous epochs.
* `get_beacon_committee`: used in validating attestations, where it is only used for current and previous epochs (attester _slashings_ fortunately already require both attestations to be in the "indexed" format, removing the need for any deep lookback of committees at all)
**Summary**: no issues.
**Recommendation**: formalize the invariants by adding an `assert` to validator status accessor methods that clarifies that they only work for the current and previous epoch, and verify that the spec never breaks this invariant.
#### Future functions that use validator statuses
The main concern here is that in the future, we plan to establish committees that are predictable far in advance, particularly (i) sync committees aka light client committees and (ii) potentially shard proposer committees. Both last for about 256 epochs (~29 hours, 1/8 eek), and are predictable 256 epochs in advance.
The easiest and most flexibility-preserving way to solve this is to simply save the committees in the state. Note that we already do this for light client committees; we could do this for shard proposer committees too.
Another option is to add to the validator class a "active in the last period" bit (or a bitfield for the last `k` periods) and use this for historical committee calculation.
An additional future concern is that `get_committee_count_per_slot` is used for computing shard deltas with `get_start_shard`. However, all uses of this in the (now deprecated) phase 1 draft spec are for the current and previous epoch only.
### Concerns that motivated removing re-activation functionality in 2017
Re-activation was originally an intended feature in very old versions of Casper FFG, but it was in fact _removed_ in 2017. This is my attempt to document the reasons why, and why they do not apply here.
In 2017, the goal was to implement Casper FFG [inside of a smart contract](https://github.com/ethereum/casper/blob/master/casper/contracts/simple_casper.v.py), and smart contracts come with important limitations; particuarly, there is no easy way to run code independent of transactions (votes) coming in.
The challenge was programmatically determining whether or not a validator was active. Activation and exiting were delayed for safety reasons (as they are now), making an `active` flag impractical, as there would be no guarantee there would be a transaction that could follow through on executing that state transition after the safety delay. Hence, some epoch-based system was required. Additionally, a system for determining if a validator is active during the current or previous epoch was required.
At the time, it was thought that this functionality was also needed for historical epochs to implement slashing, but this turns out to be false: it's ok to slash validators for messages made while they are inactive, as they are not supposed to be signing messages anyway! Determining activity for historical epochs if an arbitrary number of exits and reactivations were allowed would require O(N) storage per validator, which was considered unrealistic.
Additionally, there were complexities around the fact that reactivation would make even determining activity in the _current or previous_ epoch a challenge, though in retrospect this was easily solvable: if a validator reactivated, set `activation_epoch > exit_epoch` and have the activity-checking code detect that special case and treat the validator as being _inactive_ during the interval and active outside it.
Hence, it could be argued that the reasons for disallowing reactivation were only a result of insufficient imagination even then!
One argument was stronger then - that allowing frequent exits and re-entries would weaken the consensus by making validator sets more disjoint. However, in the 2020 beacon chain spec we have stronger in-protocol tools (activation and exit queues) for dealing with this.