Quick and easy post-merge proof of custody proposals

# Quick and easy post-merge proof of custody proposals To reduce the risk that everyone will point to Infura instead of using a local Ethereum application-layer client, we can use a [proof of custody](https://ethresear.ch/t/a-0-001-bit-proof-of-custody/7409) mechanism. Block producers and attesters would be required to make some calculation `check_poc_eligibility(beacon_chain_data, application_state, validator_secret) -> bool`, and if the output is `False` (this would be targeted to be true about 0.1% of the time), they cannot produce or attest to that block. The validator would have to reveal their `validator_secret` at some point in the future (eg. after 1 year), and if they produced or attested to a block when they should not have, they can be slashed. If they reveal their secret far earlier than expected, they can also be penalized. This forces them to hold the `application_state` locally, as there is too much data to get by API, and it is risky to give the secret to a third party. This document describes some concrete proposals for how this can be done. There is one unique challenge with targeting the application state as a data set: the state is a {key: value} store, and almost all keys are empty. There is no reliable way to return "the 1724th key" or "the lexicographically next key after position X". The bulk of the complexity of these proposals involve workarounds for this ## Proposal 1: History-based access ### The source data We get around this by using the following trick: _we use the block history itself as a source of addresses that are probably non-empty_. Define `get_addresses(history: List[ApplicationBlock]) -> List[address]` as follows: ```python def get_addresses(history: List[ApplicationBlock]) -> List[address]: o = [] for block in reversed(history): for tx in block: o.extend([tx.sender, tx.to]) return o ``` ### The PoC algorithm Let `basic_poc(data: bytes, secret: bytes32, inv_failure_prob: int) -> bool` be a simple proof of custody function that operates on small pieces of data, with probability `1/inv_failure_prob` of returning `False`. A simple example is `int.from_bytes(sha256(data + secret), 'little') * inv_failure_prob >= 2**256`, but more sophisticated options that eg. enable MPC calculation also exist. Define `check_poc_eligibility` as follows. `beacon_chain_data.history` is defined to be the set of blocks included in the application state only. ```python INV_GLOBAL_FAILURE_PROB = 1000 ESTIMATED_TXS_PER_BLOCK = 200 MERGE_SLOT = 123456789 # TBD def check_poc_eligibility(beacon_chain_data, application_state, validator_secret) -> bool: addresses = get_addresses(beacon_chain_data.history) data = [] current_slot = beacon_chain_data.state.slot for address in addresses: data.append(hash( get_nonce(application_state, address).to_bytes(32, 'little') + get_balance(application_state, address).to_bytes(32, 'little') + get_code_hash(application_state, address) + current_slot.to_bytes(32, 'little') + hash_tree_root(beacon_chain_data.state) )) # We estimate the total size of `addresses` instead of checking it # explicitly to make verification easier inv_failure_prob_per_datum = ( INV_GLOBAL_FAILURE_PROB * (current_slot - MERGE_SLOT) * ESTIMATED_TXS_PER_BLOCK ) return all( basic_poc(datum, validator_secret, inv_failure_prob_per_datum) for datum in data ) ``` Essentially, we apply the PoC function to the contents of every address in history, and reject if _any_ of the PoC executions rejects. ## Proposal 2: Receipts We can use the receipts of the most recent block, or N blocks, as part of the PoC data. For example, we could do: ```python data = [] for block in list(reversed(beacon_chain_data.history))[:10]: for receipt in block.receipts: data.append(hash( receipt.gasUsed.to_bytes(32, 'little') + receipt.logsBloom + b''.join(log.data + b''.join(log.topics) for log in receipt.logs) )) ``` This increases the amount of per-block data that a non-application-block-validating node would have to download from some third party to be able to compute the PoC. ## Proposal 3: Adversarial proposer-selected PoC data The proposer at slot N could select 256 addresses, with the default algorithm being to choose a random 256 addresses with `len(code) >= 12000`. The code of each of these addresses would be added into the PoC data for the proposer and attesters at slot N+1. The delay ensures that the proposer does not risk non-inclusion of _their own_ block if they choose a challenging PoC; instead, any risk is passed on to the proposer a slot in the future, at which point slot N should be well-secured by the hundreds of attesters of that slot itself. This proposal adds 5 kB of overhead per block to the proposer for the 256 addresses, and adds 3-6 MB of overhead per block to non-application-block-validating nodes as they would have to download the code from a third party. However, note that the code is only a small portion of total state and does not change except from SELFDESTRUCT (which is itself likely to be neutered or removed soon), and so the data download amount may quickly go down due to caching. We could extend the scheme by adding not just the code, but also the first 100 storage slots (or some other subset of the storage tree that is commonly filled in practice) into the PoC data. This further increases the data subject to PoC, and furthermore storage is a subset of data that is more frequently edited. ### Fraud proofs Fraud proofs in this design are very simple. You just need: 1. Proof that a proposal or an attestation was made 2. A single application state branch for some address showing the contents 3. A single transaction tree branch showing that that address was in the history ### Revealing the secret A `VoluntaryExit` now starts a 99-eek clock on withdrawal instead of a 1/8 eek clock. At any point after the `VoluntaryExit` (waiting at least a few hours is recommended), the validator can provide another message that reveals their secret. This decreases the withdrawal clock to 1/8 eek (or 4 eeks if the validator was slashed).