# Payload Chunking
Multiple similar but different approaches are considered for Payload Chunking. The goal of this post is to explain differences, describe trade-offs and analyze how they interact with other EIPs.
## Motivation
While there are many designs for payload chunking, they share some of the common properties:
- Instead of sending the entire block over the network, we split it in smaller chunks and send them separately
- Ideally, we would also like to be able to execute chunks as they arrive, even if they arrive out of order (aka chunk execution)
In this simplified design, this allows us to go from this model of block propagation and execution (each unit is a block):

to something like this (each unit is a chunk):

> Chunks could arrive in any order. Natural order provided for simplicity.
We can differentiate two different improvements that this brings us:
**Parallel Propagation**
Being able to propagate each chunk as it arrives, makes the entire block propagation faster. Smaller chunks and bigger network (more hops to reach all peers) increases the effectiveness of chunking (see analysis below).
There are different ideas that also gives us this improvement (e.g. [Block-in-Blobs EIP-8142](https://eips.ethereum.org/EIPS/eip-8142), [RLNC](https://ethresear.ch/t/faster-block-blob-propagation-in-ethereum/21370)), but they usually don't provide the next improvement.
**Parallel Download and Execution**
Executing chunks independently reduces the total time spent between receiving first bits of block's data and verifying the block.
It's worth highlighting that this gives us a speedup by factor of 2 in the best case (i.e. when download time = execution time). In other cases, speedup is smaller because download / execution alone takes more than 50% of total time.
## The protocol changes
Assuming inclusion of [ePBS](https://eips.ethereum.org/EIPS/eip-7732) and [BAL](https://eips.ethereum.org/EIPS/eip-7928) proposals (scheduled for Glamsterdam), we can make some assumptions and consider some concrete payload chunking implementations.
Firstly, we observe that transactions and BAL represent that majority of data that is propagated over the network. The simplest design for payload chunking would be to remove them from `ExecutionPayloadEnvelope` (see [ePBS EIP](https://eips.ethereum.org/EIPS/eip-7732)). The `ExecutionPayloadEnvelope` will instead contain the commitment to chunks, which are propagated separately. Each chunk would contain only part of transaction/BAL data.
This opens some design questions.
### Chunkifying Block Access List
The BAL is on average smaller in size than transactions, but in the worst case, it can be bigger (e.g. a small number of transactions that are reading from / writing to many storage slots). Chunkifying BAL seems like a logical approach, but we have to consider some things.
BAL is what allows us to execute chunks independently even if they come out of order. If we want to preserve that property, we have to split it into Chunk Access Lists (aka CALs). We would have to design CAL in one of the following ways:
#### Dependent CALs
We can make CALs from previous chunks required when executing the current chunk. This causes some fields to be duplicated across CALs (in comparison to BAL):
- addresses and storage slots that are modified across multiple chunks
- storage slots that are read in earlier and modified in later chunks
These CALs should propagate separately from transactions. This allows clients to start pre-fetching data and execute other (later) chunks, even before corresponding transactions arrive.
#### Independent CALs
The other approach is to make CAL independent, i.e. each CAL would contain all required data in order to execute the corresponding chunk.
This increases data duplication even further (in comparison to previous approach):
- all storage slots that are accessed in earlier chunks
- if they were modified, we have to include latest modification as well
This design is not compatible with non-semantic chunking (see below). But we could send CAL and chunk transactions together, reducing complexity and simplifying the design.
Making CALs dependent is more pragmatic, as data duplication is significantly smaller in the worst case. The downside is that early CALs are needed in order to execute any of the later chunks (i.e. _CAL_0_ is required in order to execute any chunk, causing delays if it is received late).
### Semantic vs Non-semantic chunking
When deciding how to chunk transactions, two approaches naturally emerge.
#### Non-semantic chunking
By non-semantic chunking we consider the process where all transactions are encoded into one byte array, which is then split into even sizes. This ensures that each chunk propagates through the network in similar time.
However, if we want to be able to execute chunks as they arrive, we would have to propagate some extra information about encoding (e.g. in which chunk and at what index each transaction starts). This complicates design (some tx would span over multiple chunks, less options for BAL chunking) and extra data can be non-negligible as we scale gas limit up and reduce intrinsic tx gas cost ([EIP-2780](https://eips.ethereum.org/EIPS/eip-2780)).
This approach is very similar and compatible with other proposals. For example, [Block-in-Blobs](https://eips.ethereum.org/EIPS/eip-8142) (aka BiB) is basically doing the same thing. The downside of BiB is that since we can't enable DAS (data availability sampling) until we switch to zkEVM, every node has to download all chunks/blobs. If we use the same encoding as for regular blobs, we double the size of data that needs to be propagated, without many benefits. Builders would also have to calculate KZG commitments and proofs during critical slot time. A more pragmatic approach might be to use non-semantic chunking until zkEVM is ready.
#### Semantic chunking
One of the downsides of non-semantic chunking is that we have no guarantees that chunk execution would be similar across chunks (e.g. one chunk can contain all execution heavy transactions).
To create more uniform distribution of execution complexity across chunks, we can split transactions and create "mini-blocks" in the following way:
- each chunk has at least one transaction
- transactions are not split across chunks
- used gas in a chunk is less than 16.8M (set based on [EIP-7825](https://eips.ethereum.org/EIPS/eip-7825), can be adjusted)
- we can also ensure that average chunk is not too small
Since transaction size affects used gas, we are also limiting and balancing chunk's byte size, making propagation also somewhat uniformly distributed.
A more natural division of transactions into chunks makes chunk execution simpler in comparison to non-semantic chunking. These chunks can be considered mini-blocks, with their own chunk header (with chunk specific fields, e.g. *chunk_gas_used*, *pre_chunk_tx_count*...), but without some complexity that blocks have (no need to calculate state root, send attestations, etc.).
See [EIP-8101](https://ethereum-magicians.org/t/eip-8101-payload-chunking/27085) for detailed spec of this design. Check [here](https://notes.ethereum.org/@miloss/eip8101_prototype) for proof of concept implementation.
### Related approaches from other ecosystems
Other ecosystems introduced similar improvements.
Solana splits data into "shreds" and uses [Turbine](https://docs.anza.xyz/consensus/turbine-block-propagation) protocol to propagate it faster. While data spliting is analog to non-semantic chunking, the Turbine innovated on [networking layer](https://solana.com/news/turbine---solana-s-block-propagation-protocol-solves-the-scalability-trilemma) to improve propagation speed even further.
Base created [Flashblocks](https://blog.base.dev/flashblocks-deep-dive). Instead of propagating blocks every 2 seconds, Base produces and propagates sub-blocks every 200 ms. These sub-blocks are later combined into a single block. This approach is similar to semantic chunking, but it allows Builders to create sub-blocks before finalizing the block. This type of "Streaming Semantic Chunking" wouldn't be possible on the Ethereum mainnet with current design of ePBS. Instead, it would have to switch *Slot Auction* model, which has [well documented trade-offs](https://efdn.notion.site/Arguments-in-Favor-and-Against-Slot-Auctions-in-ePBS-c7acde3ff21b4a22a3d41ac4cf4c75d6).
## Scaling L1
The main benefit of Payload Chunking is that it allows us to scale L1. Shorter propagation and execution time allows us to either increase block size (i.e. _GasLimit_), reduce slot time, or combination of those.
These improvements become less relevant once we have zkEVM and BiB, so we have to analyze the potential scaling factor and consider whether it's worth doing it (the exact timeline of zkEVM plays a role in this discussion, but it will be left out of this post).
### Parallel download and execution
As observed earlier, independent chunk execution allows us to scale at most by a factor of 2. That is the best case estimate, which is far from realistic.
If we take a deeper look and consider the worst case scenario, the non-semantic chunking doesn't provide almost any guaranteed improvement in this regard. The worst case would be the block in which most of the execution is happening in one chunk that clients happen to receive the last. In that case, the diagram from the start would look something like this

On the first look, semantic chunking should solve this problem (since it limits used gas per chunk), but its effectiveness depends on parameters. BAL already allows transactions to be executed in parallel. So if a block is full of transactions that use maximum gas (16.8M) and are execution heavy, we don't actually scale that much.

In the diagram above, we have a block with only 4 transactions that use the entire block's _GasLimit_. They already execute in parallel because of BAL, so semantic chunking doesn't speed up execution.
It becomes obvious that in the worst case, the scaling factor is less than 2. It's also hard to estimate how much it is as it depends on many parameters (_GasLimit_, number of cores used for parallel execution, etc.).
With zkEVM on the way, chunk execution becomes even less valuable in the future.
### Parallel propagation
Parallel propagation guarantees scaling, regardless of the approach.
Without chunking, if propagating entire blocks takes $t$ seconds, then the peer that receives block after $h$ hops will finish receiving block at time $T_h=t h$. If block is split into $N$ chunks, it will instead finish receiving block at $T^{\prime}_h=(h-1)\frac{t}{N}+t=t\frac{h-1+N}{N}$. The scaling factor in that case would be $T_h/T^{\prime}_h=\frac{hN}{h-1+N}$.
If we take that $h=6$ hops is needed to reach the majority of peers (estimated based on current size of the network), and big blocks are split into $N=10$ chunks, we get the scaling factor of 4.
More chunks implies better scaling, but the relationship is not linear. We should also not forget that each chunk also carries some extra data with it (inclusion proof, signature) and requires some extra processing time (validation). This overhead can become impactful if there are too many very small chunks.
If we use non-semantic chunking, chunk size is a very flexible parameter that we can set based on benchmarks and other parameters (_GasLimit_, calldata pricing, etc.). We can also consider compatibility with BiB and other proposals.
## Conclusions
When considering whether and which approach is worth adding to the protocol, we have to consider potential benefits, complexity, and interaction with other planned improvements.
All explored designs provide parallel propagation, with similar effectiveness. When considering other planned improvements (e.g. BiB), non-semantic chunking emerges as the most flexible and compatible design.
Individual chunk execution, as explained above, doesn't provide big or guaranteed scaling, but it significantly increases complexity. If we remove this requirement, we also don't have to split BAL into CALs, which simplifies design even further.
This leads to the conclusion that if we want to do any kind of payload chunking, it makes the most sense to do non-semantic chunking, without support for chunk execution. BAL and transactions would be split into chunks and propagated separately. All chunks would have to arrive before we can execute the block. Which brings us to this model:

This design is simple, and compatible with [BiB](https://eips.ethereum.org/EIPS/eip-8142) and other considered improvements.
We could also decide to skip payload chunking and go directly with BiB.