- [ ] [[email protected]
``](https://)- [x] [0x00000000219ab540356cBB839Cbe05303d7705Fa``](https://)# On Demand State Availability
This document outlines the problem of providing a mechanism for fetching arbitrary data from the Ethereum state.
## Solution Requirements
A solution to this problem would have the following properties.
- A "node" can retrieve arbitrary data from the Ethereum state
- Retrieval is fast-enough for speculative transaction execution against a recent state root for a "normal" transaction.
- Requests must be able to be pinned to a single "recent" state root.
- The solution should natively scale and ideally should allow for low resource nodes to easily contribute back to the network.
### Things we **DO NOT** have to do
We do not have to support stateless mining
We do not have to support syncing the full state
## The Ethereum State
We refer to the "Ethereum State" as:
- The collection of account information (`balance/nonce/code_hash/state_root`) that makes up the global `state_root`.
- The collection of individual contract storage tries referenced by the account level `state_root`.
- The collection of all contract bytecodes. These are not explicitely part of the state as they are only referenced by `code_hash` of each account.
This data is randomly read and written. Some accounts/contracts are updated with high frequency. Others may be "cold", rarely or never being updated or touched.
### Account Trie Data
The account storage trie is naturally balanced since account addresses are evenly distributed across the address space and it is computationally prohibitive to "mine" enough addresses in a single region to produce a meaningful imbalance.
### Contract Trie Data
The contract tries can be modeled as either separate tries or as "dangling" under each account in the global state trie.
The contract tries vary widely in size from zero kb up to more than a gigabyte in size.
The contract tries are not balanced.
The `SELFDESTRUCT` mechanism can cause full contract tries to be wiped in a single block, potentially "purging" large amounts of information from the network in a single action.
### Contract Bytecode
The contract bytecode are referenced by each account's `code_hash` and is a stream of bytes.
## Open Questions and Research Topics
Each piece of state must be addressed.
#### Trie Node Hash
We can use the node hash from the trie node similar to how `GetNodeData` works.
- Good Stuff
- No need for proofs since response is validated based on the requested node hash
- No need for specifying `state_root`.
- Natively de-duplicates across multiple state roots.
- Bad Stuff
- Scheme is not condusive to the modern most efficient ways to store the trie (flat database layout)
- Node hashes provide zero context about the data. Leaf vs Intermediat / Account vs Contract-Storage.
- Inherently innefficient, storing intermediate trie nodes which are not strictly necessary.
#### By Trie Path and State Root
We can use `(state_root, address)` to reference an account and `(state_root, address, slot)` to reference contract storage.
- Good Stuff
- Seems to be condusive to flat database layout and in general does not dictate the architecture of the database.
- Provides contextual information about the data.
- Bad Stuff
- Unclear how de-duplication might work. An account which has not changed in a few blocks will have the same value across multiple state roots. Ideally we do not want to store multiple entries for the account.
### Provable Data
When retrieving and storing data for the network it needs to be "provable" to someone with access to the header chain. That means that account data will need something like a merkle proof against a state root and contract storage will need an additional proof against the account state root.
### Updating Data
The Ethereum state changes every 15 seconds. These changes need to be available quickly to consumers of the networks.
If the data is addressed and stored by node hash, all new trie nodes will need to be piped into the network at each block.
If the data is addressed differently, it may make sense to instead have nodes update the data in-place, though it still may make sense to pipe new proofs into the network.
Either way, the solution will need to deal with the constantly changing nature of the data.
> TODO: How many trie nodes or accounts and contract storage slots are touched on average for each block in current mainnet conditions.
### Dividing the Data
If we want to use something like a DHT we need a way to divide the data up evenly amoung the nodes. The account data is naturally balanced and easy to shard. The contract storage being inherently unbalanced is problematic.
### Efficiency of storing contigious blocks of data
Assuming the data nodes store includes proofs, it is much more efficient for a node to store a contigious block of addresses or storage slots since their individual proofs will have a large degree of overlap.