Decoupling global liveness and individual safety in DAS

# Decoupling global liveness and individual safety in DAS *Note on terminology*: Whenever talking about a post-DAS system, I am going to use *full node* to refer to a node which fully validates the chain (in this respect, equivalent to today's full nodes), but which does not fully download the blob data, and instead performs some form of DAS. Also, full node will implicitly mean *honest (protocol-following)* full node, unless otherwise specified. Nodes which download the entire data are interchangeably called *super full nodes*. In a system with DAS, it is quite easy to achieve *global safety*, the property that at most a small minority of sampling nodes can be tricked into following an unavailable chain. Harder to achieve is *individual safety*, i.e., that no sampling node can be tricked except with negligible probability, as it requires the *unlinkability of sampling queries*. Also hard to achieve robustly is liveness. If we do not find a single DAS construction which has both great individual safety and liveness properties, we can still perhaps get the best of both worlds by using two different constructions for different purposes, so as to take advantage of their specific trade-off choices. Given a DAS solution with strong liveness and global safety properties (DAS1) and a DAS solution with strong individual safety properties (DAS2), there are at least two ways that we could employ them: - *Separate the DAS of validators and full nodes*: validators run DAS1 as their fork-choice filter, guiding their consensus participation, while full nodes run DAS2. - *Separate the DAS used by fork-choice rule and confirmation rule*: DAS1 is used by all full nodes as a fork-choice filter, and DAS2 is used as an additional confirmation rule criterion In the first case, a liveness failure of DAS2 does not have any impact on consensus, modulo potential network disruptions due to non-validator full nodes experience liveness issues in the fork-choice, because validators are unaffected. Even better in the second case, a liveness failure of DAS2 has *no global impact at all*: by being confined to the confirmation rule, it can only result in an inability to confirm new transactions, and only for full nodes. Neither the p2p network nor validators are affected. In either case, as long as DAS1 works, consensus is unaffected, and so is the ability of super full nodes to confirm transactions. This can make for a much more resilient system, and even reduce the incentive to attack DAS2 in the first place. In the following, I am first attempting to formalize security definitions for a post-DAS system, to capture the different notions of liveness and safety that we need to take into account. We then revisit and more concretely discuss the two options mentioned above. ## Security notions of DAS ### Security today As usual, we want our system to be safe and live. That said, it is important to specify exactly what we mean by liveness and safety. Let's first start with such notions for a pre-DAS system, where all full nodes are super full nodes, and thus validity and availability are both fully checked. We assume that full nodes attempt to confirm transactions only by waiting for them to be contained in a valid, finalized chain. We want safety to hold *unconditionally*, without any networking assumptions or honesty assumptions on validators, but we have to accept that liveness requires network synchrony and honest majority (in this case even supermajority) assumptions. 1. **Safety:** valid, finalized chains are never reverted 2. **Chain liveness**: the chain makes progress, i.e., valid blocks containing new transactions are regularly finalized. 3. **Client liveness**: a full node is able to regularly confirm new transactions, i.e., observe valid, finalized blocks containing them. *Client liveness = chain liveness + client not eclipsed.* Note that chain liveness is an objective property, in the sense that someone with a god's eye view of the network (in particular, including all local views of validators) would always be able to decide whether progress has been made or not. On the other hand, client liveness is subjective, and might hold for some client but not others at a particular moment in time. Nonetheless, we normally do not distinguish between chain liveness and client liveness, for the simple reason that they coincide *almost* exactly: client liveness clearly requires, and thus implies, chain liveness, and the reverse is true *as long as a client is not eclipsed*. This is simply because a full node which is connected to the rest of the network will be able to confirm new transactions as long as new transactions do make it in the chain and finality does not stall, i.e., as long as chain liveness holds. ### Security in a post-DAS world In a post-DAS world, where full nodes do not just simply download the whole data to check availability, things are quite different. Guaranteeing the safety and liveness of the DA Layer is an additional requirement, and the safety and liveness *of the DAS protocol itself* are an additional factor which might negatively affect *all* our security properties, including those unrelated to the DA layer, i.e., even security of L1 users which do not care about L2, or other global security properties. #### Security of DAS Let's first start with defining the security properties of a DAS protocol *in isolation*. Here, we have meaningful distinction between global safety and client safety, where a client is just someone that samples according to the protocol. 1. **DAS global safety:** whp, unavailable data is refuted by *all except a small minority* of full nodes (concretely, we can set a maximum tolerable fraction of full nodes which might not refute unavailable data). 2. **DAS client safety:** a client cannot be convinced that unavailable data is available, except with negligible probability. 3. **DAS client liveness**: a client cannot be convinced that available data is unavailable, or in other words it can *convince itself* that data is available, when it is indeed so. We formulate global safety of DAS in this way, allowing safety faults for a small minority, because this is in practice the relevant *global* safety notion for a DA layer: for the security of rollups it is essentially irrelevant whether a small minority can be convinced to accept an unavailable chain, because ultimately what matters is only the availability of the chain that ends up being canonical. With this definition, *global safety does not require client safety*, and it is in fact in practice quite easy to construct a system which satisfies the former but much harder to construct one which satisfies the latter. #### System security Let's now turn to the properties of the full system in a post-DAS world. Compared to the initial properties, we have to also consider the global security properties of the DA layer, as they directly translate to security properties for rollups. Moreover, we need to distinguish between client security properties *for full nodes or for super full nodes*, as for the latter DAS is only relevant insofar as it affects the validator set and its ability to come to consensus, i.e., chain liveness. **Safety properties**: 1. **DA layer safety:** unavailable chains are refuted by all except a small minority of full nodes, i.e., same as DAS global safety. 2. **Client safety for super full nodes**: a valid, available, finalized chain is not reverted 3. **Client safety for full nodes:** if a full node accepts a finalized chain as available, it is never reverted. *Client safety for full nodes = client safety for super full nodes + DAS client safety.* **Liveness properties**: 1. **Chain liveness**: the chain makes progress, i.e., valid blocks containing new transactions are regularly finalized. 2. **DA layer liveness**: the DA layer makes progress, i.e., valid blocks containing *available blobs* are regularly finalized (stronger than chain liveness, since we could have chain liveness with a chain without blobs) 3. **Client liveness for super full nodes**: a super full node is able to regularly confirm new transactions. *Client liveness for super full nodes = chain liveness + client not eclipsed.* 4. **Client liveness for full nodes**: a full node is able to regularly confirm new transactions and blobs, i.e., observe valid, finalized blocks containing them and successfully do DAS checks for them. *Client liveness for full nodes = client liveness for super full nodes + DAS client liveness.* Clearly, chain liveness is a precondition to any form of liveness. If in addition a super full node is not eclipsed, client liveness holds for it, just like client liveness in a pre-DAS system. Note however that this does not mean that the liveness guarantees are equally strong, because *chain liveness itself might require more assumptions in a post-DAS system.* For client liveness *for full nodes*, we additionally need DAS client liveness, since otherwise they might be convinced of the unavailability of a chain which is in fact available, and thus be forced to reject it. This would for example be the case if a node was simply unable to obtain responses for their queries due to failures of the networking architecture of the DAS protocol. In other words, the equivalence between client and chain liveness breaks down in a post-DAS world even for a full node which is not eclipsed, because it is not guaranteed to be able to confirm even valid and available finalized blocks: liveness failures of DAS imply liveness failures for full nodes, *even when the chain is live*. ## Solution 1: validator DAS and client DAS #### DAS liveness failures $\neq$ chain liveness failures Ideally, chain liveness and DA layer liveness would be as robust in a post-DAS world as it is in a pre-DAS world, and they would imply client liveness, even for full nodes, under as mild conditions as possible. On the other hand, it is important to acknowledge that *DAS client liveness is not necessarily a precondition for chain liveness or even DA layer liveness*, so that it is principle possible to design systems where the latter holds much more robustly than the former. This is very clear in a system where validators are all super full nodes, not participating in the DAS protocol as clients and thus immune from client security failures of DAS. In such a system, chain liveness (and DA layer liveness) requires the same assumptions it does in a pre-DAS system, for example honest supermajority and network synchrony. Essentially, everything looks the same as in a pre-DAS system *from the perspective of the validators*, and DAS only affects the client security properties for full nodes. #### Validator DAS The perceived equivalence of client liveness and chain liveness is probably due to the fact that the Ethereum roadmap consideres validators to be roughly equivalent to full nodes. In particular, they are explicitly not required to be super full nodes. The thought is then that, if DAS is not live for full nodes, it will not be live for validators as well, resulting in their inability to vote for blocks containing blobs, and thus at least in a liveness failure of the DA layer. This is indeed the case *if the same DAS protocol is utilized for transaction confirmation by clients and for consensus participation by validators*. In the example where validators are super full nodes, we implicitly do have two DAS protocols, with the DAS protocol for validators just being "the trivial one", i.e., downloading everything. Even in the Ethereum context, where validators are not required to be super full nodes, it is possible to have a separation between a validator DAS protocol and a client DAS protocol. In fact, there are some reasons suggesting that this approach might be sensible: - The DAS protocol used as part of consensus does not need to provide any client safety guarantees, since its goal is just to ensure liveness and safety of the DA layer under honest majority assumptions on the validator set. On the other hand, providing good DAS client security properties for full nodes is very much desirable, since it directly affects the security of their transaction confirmations. In other words, the DAS protocol for clients has quite different security requirements. - It might be ok for validators to be somewhere in between full nodes and super full nodes, in terms of requirements. Higher requirements enable simpler, more robust networking solutions with better liveness guarantees for validator DAS, translating into better chain and DA layer liveness. - The validator set is naturally sybil-resistant, and this can be leveraged to make a more validator DAS more robust. See for example [proof of validator](https://ethresear.ch/t/proof-of-validator-a-simple-anonymous-credential-scheme-for-ethereums-dht/16454). #### A possible construction Concretely, what might such a "DAS protocol for validators" look like? For simplicity let's assume that the DAS construction for full nodes is 1D, with 16 MBs of actual data extended to 32 MBs. Regardless of how exactly the DAS protocol for full nodes works, one example of a DAS protocol for validators would be to split the extended data across (for example) 2048 subnets, and semi-permanently assign validators to 16 of them, so that each validator downloads 1/128 of the data, or 256 KBs per slot. A validator only checks availability by downloading blob data on its assigned subnets, i.e., it considers blobs as available if all corresponding data on its assigned subnets is available. The fork-choice run by the validator to perform its duties, for example to decide what to vote for, is the usual fork-choice with an availability filter guided by precisely this notion of availability. See [subnetDAS](https://notes.ethereum.org/@fradamt/subnetDAS) for more details. #### Properties Crucially, we have the following properties: 1. If some data is not 1/2-available, only a small fraction of the validators see it as available. If whp the maximum such fraction is $\delta$, then $> \delta + 1/3$ of the validator set being honest implies that a finalized chain will be available whp, since no chain which is opposed by $> 1/3$ of validators can be finalized. Similarly, if network synchrony holds and $> \delta + 1/2$ of the validator set is honest, the canonical chain is available whp. 2. If some data is more than 1/2-available, *and reconstruction is fast* (for example if there is at least one honest super full node which quickly reconstructs the data when possible), then quickly all subnets will have the corresponding data and all validators will see it as available Together with threshold honesty assumptions, 1. implies client safety for full nodes and safety of the DA layer, *regardless of what the DAS protocol for clients is*. This is highly desirable, though of course not sufficient, because we do not want safety to require threshold assumptions on the validator set (hence why we also want a DAS protocol for clients, with good client safety). Also under threshold assumptions, plus that of fast reconstruction, 1. and 2. can give us chain liveness and DA layer liveness, roughly *if the underlying consensus protocol satisfies chain liveness in a pre-DAS world*, or in other words if it satisfies chain liveness when validators are super full nodes. The intuition here is simply that 1. and 2. ensure that availability does not split the views of honest validators in harmful ways. In particular, 1. means that the presence of unavailable data is not harmful, because the minority which is tricked into seeing the data as available will anyway follow the majority which does not, so all honest validators quickly end up on the same fork. Similarly, 2. means that the presence of reconstructible data, which is not immediately 100% available, is also not harmful, because the data is quickly reconstructed, at which point the availability checks pass for all honest validators. This way, we rule out interference of the availability filter in the work of honest validators, and thus should revert to the chain liveness properties of the underlying consensus protocol. ## Solution 2: DAS as a fork-choice filter and as a confirmation rule criterion All nodes run a fork-choice rule to follow the canonical Ethereum chain and correctly participate in consensus (as validators) and/or in the p2p network, *and* a confirmation rule to confirm blocks (and transactions) in the canonical chain. We can think of the confirmation rule as a stricter fork-choice, which always outputs a (safer) prefix of fork-choice output, i.e. the highest confirmed block. A major difference between a fork-choice and a confirmation rule is that the former has *global* security implications, for example it is a major determinant of the safety and liveness of consensus, while the latter only affects the individual nodes that run it. For example, a node can choose to use finality for its confirmation rule, while another node might prefer a less conservative synchronous confirmation rule, without these choices affecting the rest of the system in any way. Data availability checks necessarily impact both, because we need full nodes to reject chains which are not available. We usually think of availability as a *filter on the fork-choice*, an additional criterion which determines whether or not a chain is eligible to be canonical, which full nodes check through some form of DAS and super full nodes by downloading all the data. While such a fork-choice filter impacts the confirmation rule as well, we can reserve some of the availability checks *only* for the confirmation rule, without them impacting the fork-choice rule. In other words, we can use one DAS protocol (DAS1) as a fork-choice filter, and another DAS protocol (DAS2) as an additional confirmation rule criterion. The purpose of DAS1 is just to ensure DA layer safety and liveness, while the sole purpose of DAS2 is to endow the confirmation rule with its stronger client security properties. #### Comparison This solution functions much like the previous one. In particular, the fork-choice of validators uses DAS1 as a filter and is not impacted by DAS2, so that we can carry out the same analysis that we did in the previous section. A caveat is that we lose the advantages of designing DAS1 for validators instead of full nodes. We do not get sybil-resistance by default, and have to work with the lower bandwidth requirements of full nodes. On the other hand, this solution has the advantage that a liveness failure of DAS2 does not impact *anything* outside of the transaction confirmation of full nodes. Even non-validator full nodes would not be affected except in their inability to confirm transactions: they would all still follow the same canonical chain and be able to normally participate in the p2p network. Even a failure for all full nodes at once would not have any effect on either the p2p network, chain liveness or DA layer liveness. In the previous solution, full nodes would instead be affected and thus so would the p2p network, while chain liveness would still be ensured only if validators are still able to efficiently communicate with each other.

Read more

PeerDAS fork-choice

Annotated Electra Spec -- The Beacon Chain

PeerDAS design progress at the interop

PeerDAS fork-choice part 2