-
-
Published
Linked with GitHub
# Aiding EL sync while avoiding engine API deadlocks
As written in the CL core specs, both `process_execution_payload` (in `state_transition`) and `is_valid_terminal_pow_block` (in `on_block`) imply that one would halt processing on the CL when they cannot be answered. In practice in the separation of EL and CL via the engine API, we must assume that the CL still sends subsequent messages from further processing of the beacon chain even when EL is blocked on responses to these messages while `SYNCING`. These continued payload executions and heads are *critical* for EL to actually be able to sync in various contexts and sync paradigms.
Thus, although the strict CL state transition might not be fully validated, the CL should continue forward in both of these contexts (with some trade-offs).
tl;dr:
* In both cases, beacon chain should optimistically transition and send `engine_executePayload` and `engine_forkchoiceUpdated`
* Subsequent calls to `INVALID`\* can invalidate previously uncertain chains (resulting in different re-orgs or even critical failures)
* These require notes to how to properly use the Engine API. CL specs remain stable.
\*This doc assumes `most_recent_correct_ancestor` is added to `INVALID` returns from `engine_executePayload`.
## `process_execution_payload`
When reading the CL spec, it makes clear that assertions in `process_execution_payload` must be validated to validate a beacon block transition. While this is true from an ultimate validity standpoint, for the engine API to function properly the CL must continue to send what look like good recent heads to EL. This it critical for EL synchronization and failure recoveries.
This implies that in the event that EL responds with `SYNCING` but other beacon chain validations are successful that CL should continue forward with subsequent descendant beacon blocks, triggering `executePayload` and `forkChoiceUpdated` in an *optimistic* fashion.
In the event that `INVALID:most_recent_correct_ancestor` is a response to a subsequent call to `executePayload`. The CL should act accordingly:
* If `most_recent_correct_ancestor` is in slots greater than latest finality, this invalid branch should be pruned from the CL block tree, potentially inducing a re-org
* If `most_recent_correct_ancestor` is in slots less than or equal to latest finality, this results in a critical error in which the CL is on an invalid yet finalized chain. This should surface to the user that manual intervention is required (e.g. new WS checkpoint from recent ntwork)
For blocks where `executePayload` returns `SYNCING`, CL should not allow validators perform duties against such blocks, and user beacon APIs should be designed to not expose this data as safe (or at all).
## `is_valid_terminal_pow_block`
If the CL blocks on `is_valid_terminal_pow_block` when getting blocks within `on_block` if the EL is not synced, then CL will not be able to send any PoS execution payloads to EL. Such more recent heads beyond PoS transition may very well in many contexts *be critical* to EL sync. If EL needs heads past PoS for sync but CL is blocked on `is_valid_terminal_pow_block`, this will result in a deadlock.
Ideally, EL syncs chains to TTD even if it is unaware of PoS transition occurring, but if the canonical chain is too far ahead of the transition, EL won't be able to perform certain state sync modes to TTD because the only avalaible PoW pivot blocks are too old.
The above implies that the CL should optimistically execute and provide `ExecutionPayload`s to EL so that EL does not get stuck, even when this optimism crosses the transition boundary. Thus in practice, when CL is crossing the boundary while EL is still syncing, it should move forward with continued `executePayload` and `forkChoiceUpdated` calls even if `is_valid_terminal_pow_block` cannot be ascertained.
Once EL starts responding with `VALID` on `executePayload`, CL can go back to fully validate `is_valid_terminal_pow_block` for that chain's transition block. If `INVALID:most_recent_correct_ancestor` is returned and `most_recent_correct_ancestor` is at earlier depth than the transition block, this invalidates the entirety of that transition chain. If this is invalidated depth is earlier or equal to current beacon chain finality, this is a critical error and would require manual intervention by the user.