[DRAFT] Optimistic relaying v2

# [DRAFT] Optimistic relaying v2 *by [mike](https://twitter.com/mikeneuder), [niclas](https://twitter.com/alphamonad), & [justin](https://twitter.com/drakefjustin) -date-.* *Many thanks to [Joe Plaza](https://twitter.com/jpla7a) and the [rsync builder team](https://twitter.com/rsyncbuilder) for their relentless support our experiments and helping collect further v2 optimistic relaying data. Additional thanks to [Ankit Chiplunkar](https://twitter.com/ankitchiplunkar), [Chris Hager](https://twitter.com/metachris), and [Jon Charbonneau](https://twitter.com/jon_charb) for continued v2 discussions.* *Purpose – Present the optimistic relay v2 design, status, and inital results.* #### Accompanying artifacts | **title** | **description** | | --------- | --------------- | | [*v1 implementation*](https://github.com/flashbots/mev-boost-relay/pull/380) | the PR that implements v1 optimistic relaying | | [*v2 implementation*](https://github.com/flashbots/mev-boost-relay/pull/466) | the PR that implements v2 optimistic relaying | | [*roadmap*](https://github.com/michaelneuder/optimistic-relay-documentation/blob/main/towards-epbs.md) | a roadmap presenting the future plans for optimstic relaying in the context of enshrined PBS | | [*optimistic relays and where to find them*](https://frontier.tech/optimistic-relays-and-where-to-find-them) | analysis of v1 optimistic relaying and further details on the roadmap | | [*why enshrine proposer-builder separation?*](https://ethresear.ch/t/why-enshrine-proposer-builder-separation-a-viable-path-to-epbs/15710) | document outlining the state of ePBS research and the optimistic relay roadmap | ## Block submissions The goal of optimistic relaying is to reduce the latency of block submissions. The v2 design continues in this direction by removing the block payload download from the fast-path. The diagrams below illustrate the set of tasks performed by the relays for each block submission. ### Non-optimistic Non-optimistic submissions do \~nearly\~ everything on the fast path (meaning that path isn't actually very fast). The `handleSubmitNewBlock` function [implements](https://github.com/ultrasoundmoney/mev-boost-relay/blob/8f0b43cc5cb37d6247026c4e6999f3b7e41d397b/services/api/service.go#L1494) the following. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_929719ebe18ecf6133ef4c58fd533098.png" width=50%> We break the profile of the fast path into four distinct sections: (i) decode, (ii) prechecks, (iii) simulation, (iv) redis update. The table below shows the percentiles, in $\mu s$, of durations of builder submissions for these four stages for 250k builder submissions. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_652623a11e4e75de24c16f27bd9211a2.png" width=90%> Notice that the vast majority of the latency is attributable to the simulation and decode. These are the bottlenecks that v1 & v2 optimistic relaying address respectively. ### v1 v1 submissions take the simulation of the block and move it to the slow path. This is done by running `processOptimisticBlock` in a [separate goroutine](https://github.com/ultrasoundmoney/mev-boost-relay/blob/8f0b43cc5cb37d6247026c4e6999f3b7e41d397b/services/api/service.go#L1856). <img src="https://storage.googleapis.com/ethereum-hackmd/upload_4dab347251a23e632fd8a49275023706.png" width=50%> The table below shows the percentiles, in $\mu s$, of durations of builder submissions for these four stages for 1.3 million builder submissions collected. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_7d03b9a40c53e599681421abfbaa1de9.png" width=90%> We see that now, the decode stage is the bottleneck for a majority of submissions. This is what we aim to tackle with v2 optimistic relaying. ### v2 v2 submissions take the full payload decode and move it to the slow path. This is done in `handleSubmitNewBlockV2` which [implements](https://github.com/ultrasoundmoney/mev-boost-relay/blob/8f0b43cc5cb37d6247026c4e6999f3b7e41d397b/services/api/service.go#L1980) the following. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_e28c3a612fb9255d671dd94f8a67f228.png" width=50%> With the help of the rsync team, we have collected some inital v2 submission data. The histogram below shows the distribution of v2 decode durations, in $\mu s$, when using the SSZ header-only parsing for the bids. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_67ada6224c4e7244478c63b967cdd294.png" width=70%> With a median decode time of 41 $\mu s$, we see that the decode duration of ~10ms (from optimistic v1 submissions) can essentially be fully eliminated. ### v2 efficiency [Ankit](https://twitter.com/ankitchiplunkar) came up with a useful framework for understanding the value of v2 optimistic relaying. > V2 parallelizes the download of the payload (a builder-side operation) with the signing of the header (a proposer-side operation). –Ankit The key here is that the proposer has two interactions with the relay. They first call `getHeader` to get the highest-bidding header delivered, and they then sign the header and return it to the relay with a call to `getPayload`. The only data they need for the `getHeader` call is the actual `ExecutionPayloadHeader` [see spec](https://github.com/ethereum/consensus-specs/blob/dev/specs/capella/beacon-chain.md#executionpayloadheader), which doesn't include the transactions of the block. By allowing the proposers to do the signing at the same time that the builder delivers the payload improves the overall efficiency of the relay significantly. The figure below, from [*Time is Money: Strategic Timing Games in Proof-of-Stake Protocols*](https://arxiv.org/pdf/2305.09032.pdf), shows the timing of the `getHeader` and `getPayload` calls on the relay. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_04161efebf28755174005b03364776f4.png" width=90%> We see that there is a significant delay between the `getHeader` and `getPayload` calls, with a median value of 400ms (***note** –* the p99 decode time for the builder submissions is 524ms). This is the time during which we can allow the builder to continue sending the payload to the relay. The figure below shows the sequential nature of the payload download and the bid signing for v1 optimistic relaying. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_e91561d2008a9d4289591fab881fa158.png" width=75%> In this case, the payload download must complete before the bid is eligible and thus must complete before the `getHeader` call from the proposer. In v2, the two slow portions are done in parallel. <img src="https://storage.googleapis.com/ethereum-hackmd/upload_944476a4ad84f58b18a48b7f600a0b71.png" width=75%> With the signing and the dowload happening concurrently, we get much more efficient bid submissions. ### v2 safety The biggest concern around v2 is that the failure mode is much worse than v1 relaying. In v1, if we can't get the block simulated, it is very likely that the block was indeed valid and the publication will result in a block onchain. However, in v2, if the connection is dropped or the payload cannot be downloaded for any reason, we are guaranteed to miss a slot if that bid wins the auction, because the proposer will have signed a header for a payload that we don't have access to. There are two things we can do to limit the probability of a missed slot: 1. **Only allow v2 submissions for builders that have been vetted for reliable payload submissions.** We can simulate v2 without actually activating the bids and set thresholds that the builders must meet (e.g., 100% of the payloads have to be delivered within 1s of the submission starting). 2. **Only allow v2 submissions before `t=2` into the slot.** Combined with (1), we can now be certain that the payloads will be available for the relay by `t=3` at the latest. This gives us enough time to publish before the attestation deadline at `t=4`. The figure below shows the duration between when v2 bids were eligible (left) and received (right) and when the full payload was received by the relay. For example, values of 200 $\mu s$ and 400 $\mu s$ for `eligible at` and `received at` respectively is interpretted as, "The bid was received 400 $\mu s$ before the payload was downloaded; the bid was eligible in the auction 200 $\mu s$ before the payload was dowloaded." <img src="https://storage.googleapis.com/ethereum-hackmd/upload_522b8f313e8e2d0041d16d8a631a2c00.png" width=70%> We see that all of the payloads were received less than 1s after the bids were marked as eligible. This implies that as long as the v2 bid is received early enough in the slot, we should have plenty of time to download the payload before the attestation deadline. ## Path to full v2 There are a few items to additionally derisk the path to productionizing the v2 optimistic relay code. 1. Refactor the [v2 implementation](https://github.com/flashbots/mev-boost-relay/pull/466) to reuse most of the v1 optimistic path. - Split prechecks into separate functions that can be used in both places. - Highlight the v2 SSZ header-only parsing. 2. Collect more data on the builder submissions. - Understand what could make a payload not arrive on time. - For each builder who wants to use v2, start collecting preliminary data for qualification. 3. Collect more data on the `getHeader -> getPayload` time difference. - With UUIDs in the requests as of [v1.6](https://github.com/flashbots/mev-boost/releases/tag/v1.6), we should start seeing better data on the relays. 4. Decide on v2 timing deadlines - Set a threshold for when v2 submissions are accepted.

Read more

L1 R&D session: Proposer-Builder Separation

Issuance Issues — Initial Issue^ Playing with the polysemic nature of the word, issue, to mean both "a vital or unsettled matter" and "the thing or the whole quantity of things given out at one time." sorry lol... couldn't help it :)

Issuance Issues — Subsequent Soliloquy^ real ones will know that Soliloquy is probably the best green run at Copper Mountain :)

Consider the ePBS