# Xatu CL Mimicry Ideas This document contains ideas for achieving [Xatu consensus layer mimicry](https://github.com/ethpandaops/xatu/issues/218), which is a thin client that can stay connected to the beacon p2p network with minimal resources with the goal of monitoring and exporting data on the network for later analysis. ## Goals #### 1. Exports beacon p2p events for analysis :::warning Severity: **Required** ::: ##### Notes - Probably to Xatu ([example Armiarma PR](https://github.com/migalabs/armiarma/pull/68)) - Needs to account for clock drift _before_ sending the event to Xatu. The above PR does NOT do this yet. - Improves on the current method of analysing propogation time (which is via the Beacon API event stream) for blocks/attestations/blobs/etc - Sidesteps relutance from client teams to implement timing metrics in to Beacon API event stream (which is fair enough) - Prepares us for adding more p2p monitoring for things like PeerDAS. - Seperates our concerns so that the p2p code is focused on p2p, and metrics/data analysis is deferred elsewhere #### 2. Connect to hundreds of peers :::warning Severity: **Required** ::: ##### Notes - If we're connected to as many beacon nodes as possible we can take the peer_id from `min(event_date_time)`, lookup their IPV4 and get a half decent guess at which peer is _creating_ attestations/blocks. Although we need to be careful to not dox validators - Needs to be able to subscribe to all attestation subnets while maintaining a high peer count. Mostly concerned about performance here. ##### Ideas - Probably needs to be in Go or Rust. Strong preference for Go since EthPandaOps can maintain long term. #### 3. Stay peered with all beacon clients (prysm, lighthouse, teku, nimbus, lodestar) :::warning Severity: **Required** ::: ##### Notes - Lighthouse & Prysm seem to have the strictest peer scoring #### 4. Reduced CPU/Memory/Disk IOPs requirements compared to full nodes :::info Severity: **REALLY nice to have** ::: ##### Notes - Disk IOPs is **really** expensive in the cloud - Disk size isn't really an issue in comparison to IOPS - The lighter the cpu/memory/disk requirements the more instances we can run. - We need a way to follow the correct chain without running the full chain. Deferring "trust" to a central server that we control is fine. ##### Ideas - Verifying attestations seems to be the heaviest IOPs operation for a normal beacon node. Defer validation to central server? - Storing blocks/blobs locally is probably fine, even on a HDD. Maybe backfill the last `MIN_EPOCHS_FOR_BLOCK_REQUESTS` (~5 months) blocks when the mimicry instance starts? Could fetch them via a centralized server that serves [ERA files](https://ethresear.ch/t/era-archival-files-for-block-and-consensus-data/13526). - Could route all requests to a central beacon node, and keep a small cache of blocks/blobs in memory. ##### Questions - Do we need to gossip attestations to peers to stay connected? Or is just blocks good enough? If we need to gossip attestations we will probably need to defer the verification responsibility to a central instance. If that instance is a beacon node then we might need to fork a CL to _not_ publish attestations it recieves over the Beacon API. Otherwise this central node will probably be downscored/overloaded for publishing the same attestation `n` times. #### 4. Disconnect bad peers :::success Severity: **Nice to have** ::: ##### Notes - Not as much of a priority if we're targetting hundreds of peers #### 6. Be a good peer :::success Severity: **Nice to have** ::: ##### Notes - Participating in gossip is healthy for the network - Verify incoming messages before sending them to peers. If we are targetted with invalid messages we'll be downscored by our peers #### 7. Allow incoming connections :::success Severity: **Nice to have** ::: ##### Notes - Allows us to discover more of the network. Especially important for discovering home stakers since they are more likely to not have port-forwarding setup and thus can't be connected to directly. #### 8. Controllable peer set :::success Severity: **Nice to have** ::: ##### Notes - Can view all of our connected peers as a global set, choosing to disconnect peers if we have too many connections to them to prevent accidental eclipse attacks. - Could optimize for latency. E.g. disconnect a US peer from our EU instances if we are already connected in the US. ## Architecture ideas #### 1. Adapt Erigon's CL Sentinel Erigon has a CL embedded that defers requests to a full Erigon node: [link](https://giulioswamp.substack.com/p/erigon-embedded-consensus-module#%C2%A7modular-archittecture) Could be a very viable option if it checks a lot of the boxes above. Needs investigation. It's possible that we will "just" need to add a way to export data to Xatu, and then check with Kurtosis that we can stay peered with all CLs. ProbeLab is also investigating using Erigon for other libp2p analysis. We can discuss with them on their findings. Pros: - Fastest & easiest way to our goal - Defers a lot of long term maintenance to Erigon (who will be doing it anyway) Cons: - Erigon is a moving target at the moment. v3 is being worked on and functionality may change. - Documentation not amazing Questions: - Achitecture needs investigating - CPU/Mem/Disk requirements of the central erigon node - CPU/Mem/Disk requirements of the thin erigon node - Latency concerns - Caching? Appears to be connected by gRPC #### 2. Improve Armiarma Armiarma is more of a "discovery" project at the moment. It tries to find peers, but doesn't really care about staying connected. Pros: - A lot of the hard work already complete - Proof of concept with Xatu already working Cons: - Can't stay peered with Prysm & Lighthouse - Relies on Postgres. Not a deal breaker but somewhat annoying. - I think it uses Postgres for peer info/connections, and also for network analysis - Using an old version of libp2p go modules. Questions: - Do we need to upgrade libp2p go modules, and if so: how much effort is it #### 3. Fork a consensus layer client Prysm seems to be the most suitable, but it doesn't seem easy to stub out the "beacon" and run it in a thin mode. Pros: - Can definitely stay peered when running as a full node - Peer scoring implemented Cons: - Fighting against a code base that is assumed to be a full node - Maintaining the fork while upstream prysm evolves will probably prove pretty difficult #### 4. Start from scratch in Xatu/something else Pros: - Completely control our destiny - Archicture can be optimal for our use case. "Xatu Server" is already a central server that we run. Could add more GRPC methods to implement our architecture. - Could implement the "Control peer set" functionality easily - Already implemented in Xatu Server/Mimicry for the EL p2p network - Could keep `n` recent blocks/blobs in memory, and go to s3 or a central server for everything else Cons: - The most amount of work possible - 100% of the maintenance effort required - Unknown unknowns - EthPandaOps team are not p2p experts