Mainnet 2MB data experiment

# Mainnet 2MB data experiment **Goal**: to test sustained ~max block size (roughly 2MB) distribution on mainnet **Method**: abuse CALLDATA in a number of "spam" transactions to create roughly max-sized blocks given the current 30M gas-limit. Ideally these are sequential, testing a build-up in load. Additionally, the CALLDATA should be generally random, thus uncompressible. ### Data collection * Sentry nodes * Initial receiving time for each test block * Initial receiving time for each associated individual attestation (mapped to validator) and note missing (subscribe to all subnets) * Initial receiving time for each aggregate * Import time for blocks * Bandwidth usage changes * Log `forkChoiceUpdated` message request/response times on the Engine API port * Tells us the time it takes for the CL to issue a call and for the EL to fetch the block and validate it * Attestation propagation time * Chain data * Blocks making it into canonical chain * Proposal delays & subsequent late proposals/blocks * Attestations making it into canonical chain * correctness (head, source, target) * inclusion delay * Validator Sync Committee performance **Re sentry nodes**: * Prysm node can be configured to output the requisite data. Should check if we have at least one more client that can get the same data. * Ideally run (or use other existing nodes) across many regions and many bandwidth contraints (5Mb, 10Mb, 25Mb, 100Mb, 1Gb / s) * If possible, compare full attnet nodes against 1 to 2 attnet nodes (in case drastically changes mesh connectedness) * [List of latency-related metrics for various CL clients](https://hackmd.io/@dapplion/rJWMXd98j) ### Analysis For a given block size at a non-aribitrary sustain rate (e.g. 5 to 10 slots in a row) how much do our key indicators begin to degrade? -- both p2p arrivals and on-chain data. This requires pulling data of a *baseline* as well as during the experiment. #### Chaindata analysis Chaindata is a valuable proxy for how efficiently/effectively messages are being delivered by their expected times. The below questions should be parsed from the chain data. We do not have a particular degregation predefined that we would be willing to accept -- ideally none but this experiment will help inform conversations and debates of block size. * Does orphan rate increase? * Does attestation inclusion delay increase? * Random validators or conistent? * Does head correctness decrease? * Random validators or consistent? * Does attestation packing efficiency degrade? * Sync committee performance degregation (random or consistent)? #### Sentry node data analysis Success is generally defined as some degredation but receiving of a message still generally falling within expected slot sub-boundaries. The general idea is to pick a 4844 value at or below a successful mainnet experiment value. Consistent vs random degredation of validator indices helps indicate if this is a network issue or a particular hardware/setup issue for the sender. Differences in view between sentry nodes help indicate assymetries in network delivery for different geographies and resources (useful for understanding impact of message delivery on *user* nodes as well as validators). * Blocks: * What is the difference in first block receive time for sentry nodes? Does it begin to exceed >3s into slot? * Attestations: * What is the aggregate difference on first attestation receive time? Does it begin to exceed >7s (broadcast time + 3s) into slot? * Are there consistent validators with degraded performance or is it more random? * Aggregates: * What is the aggregate difference in aggregate receive times? Does it begin to exceed >11s into slot? * Consistent validators or random in inclusion problems? ### Open Questions #### Data collection * Where to send data? How to best visualize across all clients types and node operator setups? * EF Devops to spin up a common Prometheus instance that everyone can write logs to? #### Transactions * How best to "fill" blocks? 30m gas transactions? Multiple "big" transactions? Is there a cap on the transaction size for inclusion and/or gossip? How to best submit transactions? Public mempool? mev-boost? Special nonce ordering? * For v1 will want to ramp up block size anyways, so using larger and larger sets of 128kb transactions is probably sufficient. Can submit either via public mempool or mev-boost. **Similar Experiment**: Starkware ran a [similar experiment](https://ethereum-magicians.org/t/eip-2028-transaction-data-gas-cost-reduction/3280/35) in the context of EIP-2028 to assess the effect of blocks with large amounts of CALLDATA on the network before lowering CALLDATA gas price. This is an attempt to run a similar experiment with much more data gathering.