owned this note
owned this note
Published
Linked with GitHub
# Data availability via extended execution payloads
*Update: Change construction to make it friendly to distributed reconstruction*
Previous data shard construction: Add $n=64$ data shards to the beacon chain. Each slot, $n$ proposers independently propose their shard blobs, which are then confirmed by committees. Once a shard blob is confirmed (this can take several slots), it can be referenced by the execution chain.
Here is an alternative proposal: Instead of adding data shards with independent proposers, increase the allowed execution block size from 1 MB to 32 MB, however with the restriction that the first 1 MB has to be sufficient to execute the full block, and the rest of the block is only for publishing data that can be referenced inside the block using its commitment. Then Kateize the whole block. This basically directly integrates the data shards into the execution payloads.
The first up to 1 MB of the block contains transactions. We add a new transaction type that takes an additional argument -- `kzg_calldata_commitment`. This is like calldata, but the smart contract only receives it in the form of a KZG commitment to the polynomial and has no way of accessing the actual calldata (unless it is separately provided).
For each transaction with KZGCalldata, a proof is added to the block that that calldata is in the extended part of the block.
**Advantages:**
* Tighter coupling with the current Eth1 chain -- rollup data is immediately visible to manager contracts
* All data is confirmed together with the rollup calls
* Do not need to wait until shard blocks are confirmed
* Do not need to worry about duplicate shard data
* Easier to reason about for developers
* Rollups do not need to implement awkward "shard scanning logic"
* There is no need for a separate PBS for shards
* This will reduce the cost of market entry for builders, as they will not have to build up two different systems
* It will also mean that MEV from interactions between data and execution can be exploited by the builder risk-free, which means less builder centralization
* There is no need for a separate shard transaction payment system
* The cost of KZGCalldata can be directly charged to the transaction that is using them
* Increased bribery resistance as data can be confirmed by larger committees (1/32 of validator set)
* Execution block is immediately part of data availability set
* Sets stage for data availability+fraud proof protected light client execution as soon as verkle tries are ready
* "Eth1 as a rollup"
* Enables even lighter validators+nodes that don't need to run a full Eth1 node
* Synchronous calls from L2 to L1 become possible with ZKRollups
* No need for applications to commit to shards that they want to use
* Total block size can be governed in a very similar way to block gas limit
* Current shard blob market is inefficient
* We claim 64 independent proposers. However in practice, this is impossible, as this would lead to lots of duplicates being confirmed, which in return will lead to centralization of the market
**Disadvantages**
* Increased requirements for builders:
* Compute KZG proofs for 32 MB of data in ~1s. This requires a 32-64 core CPU. I think a machine that is capable can be built for <5,000$, proof of concept to be implemented
* Larger minimum bandwidth requirements: Need to be able to send 64 MB of data within builder deadline, and make sure it is distributed in P2P. Probably needs at least 2.5 GBit upstream
* Neither of these are crazy requirements for builders and other entry requirements arguably already set a higher bar
* More power to the builder as they serve execution + data availability
* But with PBS we need other censorship resistance methods anyway. With this system they can be treated as a separate concern
## Suggested Implementation
* New constants
```python
MAX_SHARDS = 2**6 # 64
SHARD_SIZE_FIELD_ELEMENTS = 2**14 # 2**14*31 = 507904 bytes
MAX_BLOCK_SIZE = SHARD_SIZE_FIELD_ELEMENTS * MAX_SHARDS # 32,505,856 bytes
TARGET_BLOCK_SIZE = MAX_BLOCK_SIZE // 2 # EIP1559 for data gas
```
* The `ExecutionPayload` is changed so that the commitment is a `KZGCommitment` with up to `2**20` field elements
```python
class ExecutionPayload(Container):
# Execution block header fields
parent_hash: Hash32
fee_recipient: ExecutionAddress # 'beneficiary' in the yellow paper
state_root: Bytes32
receipt_root: Bytes32 # 'receipts root' in the yellow paper
logs_bloom: ByteVector[BYTES_PER_LOGS_BLOOM]
random: Bytes32 # 'difficulty' in the yellow paper
block_number: uint64 # 'number' in the yellow paper
gas_limit: uint64
gas_used: uint64
timestamp: uint64
extra_data: ByteList[MAX_EXTRA_DATA_BYTES]
base_fee_per_gas: uint256
# NEW payload field using KZG commitments
block_commitment: KZGCommitment
block_execution_length: uint64
block_length: uint64
block_degree_proof: KZGCommitment
block_kzgcalldata_proofs: List[KZGCalldataProof, MAX_KZGCALLDATA_PROOFS]
# Sharded data for data availability sampling and reconstruction
sharded_commitments: List[KZGCommitment, 2 * MAX_SHARDS]
sharded_commitments_proof: ShardedCommitmentProof
# NEW fields for data gas market
data_gas_limit: uint64
data_gas_used: uint64
data_base_fee_per_gas: uint256
class KZGCalldataProof(Container):
data_start: uint64
shifted_calldata_commitment: KZGCommitment
kzg_proof: KZGCommitment
# Prove polynomial shift
y: uint256
kzg_proof1: KZGCommitment
kzg_proof2: KZGCommitment
class ShardedCommitmentProof(Container):
#Aggregate degree proof for all sharded_commitments
degree_proof: KZGCommitment
# Prove polynomial shift
y: uint256
z: List[uint256, MAX_SHARDS]
kzg_proof1: KZGCommitment
kzg_proof2: List[KZGCommitment, MAX_SHARDS]
```
* The block up to `block_execution_length` is `List[Transaction, MAX_TRANSACTIONS_PER_PAYLOAD]`, encoded into 31 bytes per field element
* The remainder of the data does not have any validity conditions except for the ones imposed by the KZGCalldataProofs
* Add a new transaction type 3, with the following payload specification:
```python
@dataclass
class TransactionKZGCalldataPayload:
chain_id: int = 0
signer_nonce: int = 0
max_priority_fee_per_gas: int = 0
max_fee_per_gas: int = 0
# New data gas
max_priority_fee_per_data_gas: int = 0
max_fee_per_data_gas: int = 0
gas_limit: int = 0
destination: int = 0
amount: int = 0
payload: bytes = bytes()
# KZG Calldata
kzg_calldata_commitment: KZGCommitment
kzg_calldata_length: uint64
access_list: List[Tuple[int, List[int]]] = field(default_factory=list)
signature_y_parity: bool = False
signature_r: int = 0
signature_s: int = 0
```
* For each transaction of type 3, there has to be a KZG proof that the KZG Calldata was correctly included. This is included in the class `KZGCalldataProof`. It works as follows:
* `shifted_calldata_commitment` is the commitment to the data, but shifted to the position where it appears in the `block_commitment`, i.e. it starts at $\omega^\mathrm{data\_start}$
* `kzg_calldata_commitment` (part of the transaction data) is the commitment to the data so that it starts at the first position, i.e. $\omega^0$
* The proof consists of a KZG proof and a shift proof. The KZG proof is verified by
* $e(\mathrm{block\_commitment} - \mathrm{shifted\_calldata\_commitment}, [1]_2) = e(\mathrm{kzg\_proof}, [Z(s)]_2)$, where $Z(X)$ is the polynomial that is zero on $\omega^\mathrm{data\_start}, \ldots, \omega^{\mathrm{data\_start} + \mathrm{kzg\_calldata\_length} - 1}$
* To verify the shift proof, compute $x = \mathrm{hash}(\mathrm{kzg\_calldata\_commitment}, \mathrm{shifted\_calldata\_commitment}$
* Then verify the KZG proofs
* $e(\mathrm{kzg\_calldata\_commitment} - [y]_1, [1]_2) = e(\mathrm{kzg\_proof1}, [s - x]_2)$
* $e(\mathrm{shifted\_calldata\_commitment} - [y]_1, [1]_2) = e(\mathrm{kzg\_proof2}, [s - \omega^\mathrm{data\_start}x]_2)$
* Add a new opcode to the EVM, `KZGCALLDATA`, that pushes the `kzg_calldata_commitment`, `kzg_calldata_length` to the stack (or memory?)
* Verify that the shard commitments are correctly encoded using `ShardedCommitmentProof`
* For each commitment, there is a degree proof, proving that the degree of the commitment is `< SHARD_SIZE_FIELD_ELEMENTS`, checked via the pairing equation:
* $e(\mathrm{degree\_proof}, [1]_2) = e(\mathrm{sharded\_commitment}, [s^{\mathrm{SETUP\_MAX\_DEGREE} - \mathrm{SHARD\_SIZE\_FIELD\_ELEMENTS}}]_2)$
* Then the correctness of the commitments is checked via Fiat-Shamir:
* $x = \mathrm{hash}(\mathrm{block\_commitment}, \mathrm{sharded\_commitments}$
* $e(\mathrm{block\_commitment} - [y]_1, [1]_2) = e(\mathrm{kzg\_proof1}, [s - x]_2)$
* $\sum_{i = 0}^\mathrm{MAX\_SHARDS} z_i = y$
* For $i <\mathrm{MAX\_SHARDS}$: $e(\mathrm{sharded\_commitments}_i - [\_i]]_1, [1]_2) = e(\mathrm{kzg\_proof2}_i, [s - \omega^{i \cdot \mathrm{SHARD\_SIZE\_FIELD\_ELEMENTS}}x]_2)$
* (this can probably be optimized and compressed using KZG multiproof schemes)
* Verify that the $\mathrm{sharded\_commitments}_i$, $i>\mathrm{MAX\_SHARDS}$, are correct using Lagrange interpolation
* Add `BLS12_381` opcodes (necessary for any reasonable sharding implementation)
# Sampling
* We have 64 sharded commitments for the original data, which are extended to 128 using polynomial extension
* Each sharded commitments is of degree $2^{14}$ and extended to $2^{15}$ field elements for sampling
* Let's assume a sample size of $2^{4} = 16$ field elements (512 bytes). Then we are sampling a square of 128 rows and 2048 columns for a total of $128*2048 = 2^{18} = 262144$ samples per block. Any 50% of these samples are sufficient to reconstruct the whole block. Additionally, any row/column can be reconstructed from 50% of samples, which allows efficient recovery by small actors like at home validators even in the event where a block is intentionally poorly seeded
* Each validator is assigned to 32 random columns, which they also have to custody in their next attestation (meaning that the attestation will custody a total of 32*32*128 samples). This has the nice side effect that each block gets bribing resistance of 1/32 of the full validator set (bribing less than 1/2 of the validators attesting to one block is useless)
# Block Builder Costs
Block builders get two new tasks. One is computing the KZG witnesses for the samples, and the other one is seeding the samples in the P2P network.
* Let's start with the second one. Given that we have about 128 MB of samples, and we say want to distribute each sample to at least 5 peers for good dispersion, and all of this should happen in ~1s, the builder needs a total upstream of 640 MB/s or 5 GBit/s. This is certainly unlikely for an at-home internet connection, but easy enough in a data center
* If people really want to run a builder at home, a service can distribute the samples after they were created, reducing their own upstream requirement to 32 MB/s or 256 MBit/s which is possible in some regions
* However, the requirement that builders run in datacenters does not seem particularly limiting (they will most likely do that anyway)
* Computing the KZG proofs for all samples. This is going to take ~200-320 seconds using FK20 according to my estimates (the uncertainty comes from the precise complexity of an MSM algorithm, which is unknown). This can be distributed across many machines. In a datacenter this computing power is easily available.
* It would be interesting to see if this can potentially be done much cheaper on a GPU, so that at home block building becomes available