# Notes on Header Accumulator and the Portal Network
An accurate view of the header chain is essential for secure portal client operation. All validation of data from the portal network links back to the header chain.
- Requires validation of a proof against `Header.state_root`
- Requires validation of a transaction against `Header.transactions_root`
- Requires validation of a receipt against `Header.receipts_root`
The Header chain is >6GB and grows linearly. This exceeds the storage allowances and thus we need a network design that does not require clients to store the full chain.
A reasonable expectation is that clients will store *recent* headers, but *forget* headers beyond some recent time horizon. Maybe as few as 256 recent headers.
This poses a problem for retrieval of a header at a specific block number. At any height, there can be many valid block headers, but only one that is canonical. The non-canonical ones are *uncles* (or maybe were never included in the chain at all). The only way to validate that a header is part of the canonical chain would be to trace backwards using `Header.parent_hash` from the HEAD of the chain to the block number in question. This is a problem because it is an `O(N)` operation and proving the canonicall-ness of a header sufficiently far in the history becomes infeasible to do within a reasonable amount of time.
The proposed solution for this is to use a double batched merkle accumulator: https://ethresear.ch/t/double-batched-merkle-log-accumulator/571
Assuming a client has an accurate and recent view of this accumulator, any arbitrary header from history can be proven to be part of the canonical chain by providing a merkle proof which proves the header was included in the accumulator.
This gives us a solid solution for proving canonicallness of historical headers, but shifts the problem to acquisition and maintenance of the "header accumulator".
## Build it from scratch
The *trustless* way to do this is to build the entire accumulator from scratch. We start by assuming the client has a block header that it considers to be the head of the chain. The client then recursively uses `Header.parent_hash` to fetch the parent header all the way back to the genesis block of the chain. Once they have constructed a valid sequence of headers from the genesis block to their head block, they can then construct the accumulator from those headers.
Keeping the accumulator up-to-date involves following the head of the chain as it progresses, and updating the accumulator accordingly.
This process can be parallelized, by speculatively fetching a "skeleton" of the header chain, such as fetching every 1000th header, and then fetching the segments between these headers in parallel.
This approach is *easy* to implement, but imposes UX problems that make it non-viable for most portal client use cases. Even with the skeleton approach, it is prohibitively expensive with respect to computation, bandwidth, and storage for a resource constrained device to do this.
## Paths towards viable options
All of the options that I (piper) have imagined for acquiring an accurate and up-to-date view of the accumulator that are viable for portal client requirements involve some level of trust or uncertainty, but I believe they are viable.
We'll start with these base assumptions:
- The network allows us to request headers by their hash
- The network allows us to ask someone for the *hash* of their accumulator at recent heights.
- The network allows us to ask someone for a copy of their accumulator at recent heights.
One approach to acquiring an accurate view of the accumulator would be to simply ask a random set of nodes for their accumulator and acquire the one which is most popular.
If we assume an honest majority of nodes this approach works and is quite simple.
This approach can be griefed by malicious peers returning *junk* data.
### Hueristics for Validation
Assuming we acquire a recent view of the accumulator and want to try and validate that it is accurate, we should be able to accomplish this with some heuristics.
If we acquire the accumulator from "Alice" and then request a random historical header from "Bob" and validate that proof against the accumulator we have the following possibilities.
- Proof is invalid:
- A: Bob is malicious and gave us an invalid proof
- B: Alice is malicious and gave us an invalid accumulator
- Proof is valid
- C: Bob and Alice are both malicious and both lying to us
- D: The accumulator is valid.
#### Testing "A" vs "B"
We can test `A` vs `B` by requesting the same header+proof from more peers.
If all of the peers we request the proof from return a proof that is invalid, then we can with high probability assume that we are dealing with option `B` and should discard our accumulator and try another peer.
Otherwise we can assume we are dealing with option `A`
This test is relatively simple to execute.
#### Testing "C" vs "D"
We can test `C` vs `D` by performing N random samplings from the historical headers. The amount of work necessary to pass off a fraudulent accumulator grows exponentially as more and more samples from history are validated.
At each sample you would pick a block number `N` and request the header at that height. If no proof can be retrieved from the network you assume the accumulator is invalid. If the proof can be retrieved and is valid, **and** the returned header is also valid, then we know one of two things.
- E: The accumulator is correct
- F: The accumulator is incorrect **but** an attacker has generated a valid header at the requested height
The cost of executing situation `F` can be measured, since each header has a proof-of-work seal, and that seal can be validated. This means that each time we perform this check by requesting a random header+proof and validating them, we increase the amount of work that an attacker must do to *trick* us into accepting an invalid accumulator.
By requesting headers at random, we increase the cost of performing this attack because the attacker will have to actually generate a valid proof-of-work chain of headers at the requested height.
After performing a sufficient number of these random validations we can probabalistically assume that we are operating in case `E` and that the accumulator is indeed valid.
#### Additional security
Clients can also do any of the following to further increase their confidence.
- Validate headers against centralized providers.
- When checking a random header to validate an accumulator the returned header can actually be validated against centralized sources like Infura or Etherscan.
- Validate headers againt known checkpoints.
- Clients can include specific checkpoints for either the accumulator or hard coded header hashes at certain heights.