# Merge Testing Call Notes ## May 23 - Sam joined the calls - working in EF devops, currently working on sync tests. - Sam demoed the Sync test dashboard, all client teams should have access. If not reach out to Sam or Pari. It currently has Goerli shadow fork 4 and Kiln support where sync tests run twice a week, for all different kind of combinations of clients. - Currently using default sync modes for clients, will expand this moving forward - If you have ideas for sync tests, reach out to Sam/Pari - https://github.com/samcm/sync-test-coordinator - https://github.com/samcm/ethereum-sync-test-helm-chart - EF Security ran a few combinations of validators instrumented w various sanitizers (TSAN, UBSAN, MSAN, ASAN) on mainnet shadowfork and some minor race conditions. Example: https://github.com/prysmaticlabs/prysm/pull/10722 - Will start testing for Ropsten. - Ropsten release status: - Nethermind: No release yet, late this week or next week - Besu released, 22.4.1 with Ropsten support - Geth merged, started release notes. Perhaps today or tomorrow. - Teku released, 22.5.1 got Ropsten support - Nimbus: Released - Lighthouse: Released - Prysm: ? - Lodestar: ? - Pari to work on docs for Ropsten tomorrow - Shadow fork tracker: https://notes.ethereum.org/@parithosh/ryHXp5jUc - It would be good to have tests added that ensure that CL clients conform to spec for payloads. - Pari have shared validator keys with the teams. Each client team has ~9% of keys. - Kurtosis are meant to merge the kubernetes changes today, should be possible to run larger scale tests. - Mario working with hivetests. Marcus joined and has started adding basic tests, and will work on state tests. - All state tests were readded for the merge tests (19k tests), all worked fine. - Consensus clients added to hive. Lighthouse added already. Nimbus added. Lodestar and Teku to be added this week. Prysm docker client not working yet. ## May 16 - Mikhail testing coverage updates - https://github.com/txrx-research/TestingTheMerge/tree/main/tests - Marius EL Mock progress - Basic flows work, can insert faults - https://github.com/ethereum/hive/pull/540 - New member of the testing team starting to help out with testing efforts. - Merge readiness checklist - Some left with regards to stress testing * Single client load/metrics * Network load testing * Larger blocks * Shorter slot times - Kurtosis working on implementing stress tests. Goal is to have something out this Friday. - Shorter slot times and Larger blocks may need additional testing outside of Kurtosis. - Sync tests are now automated. Currently doing it on Göerli shadow fork. Erigon, Lodestar and Nimbus are not there, but it's possible to check history of sync times. Some work to be done with regards automating reports. - Non-finality tests - Mainnet shadow fork 2 - keep for sync benchmarks - Mainnet shadow fork 4 - use for slashing tests - Göerli shadow fork 3 - use for slashing test - Ropstep validator set and genesis files for The Merge being worked on today. If a client team does not want to validate, let Pari know. - Config files to be released in the next hours. - New mainnet shadow fork on Thursday - The main difference this week is that there will be an equal client split - Any new images for the next Shadow fork on Thursday? - Let Pari know if there is one available. ## May 9 - Edge case sync issue found in Geth - [test case](https://github.com/ethereum/hive/pull/526) - If multiple blocks created by the same validator in the row, could potentially force nodes from full to snap sync, and feed them an invalid chain, you could partition nodes off the network. - In geth today, this is mitigated because nodes aren't allowed to snap sync again after they've snap synced once. - Clients should not allow a fast/snap sync to be initiated if it has already switched to full sync - Marius has been working on the CL mocker - Terminal Block Hash Override - When and how do we test/implement this? - Unsure what the value is. Geth has a networking flag to whitelist a certain block. - It should either be in the spec + implemented, or not there at all - Value is if chain degrades pre-TTD being hit and we need to move to another block in emergency - Consensus is to build/test in parallel to testnet deployments. Don't want to delay testnets over this. - Moving to testnets - Want to make sure the newest set of Mario's tests is passing before merging (but could choose TTD/put client releases out before) - https://github.com/ethereum/hive/pull/535 - https://github.com/ethereum/hive/pull/534 - https://github.com/ethereum/hive/pull/526 - Clients team feel _close_ to being able to fork the first testnet, but not 100% sure. Will see on ACD what happens! - Potential schedule: - May 13 - Pick Ropsten TTD - May 16-18 - Client releases out - May 18/19 - Ropsten Fork Announcement blog post - June 1 - Bellatrix on Ropsten - June ~8 - Ropsten Merge - Shadow fork viewing meeting - Likely to be scheduled at 15:00 UTC on this Thursday - [#BPFdoor](https://twitter.com/CraigHRowland/status/1523266585133457408?s=20&t=1enAJMYr7xgSLUzBVF_kwg) conversation overtime! ## May 3 - Keep (at least half) the call focused on merge testing for now - [Testing Coverage Doc](https://hackmd.io/@n0ble/merge-test-coverage) update - Went through all specs to check testing coverage - Consensus specs are the best covered - Transition process is not covered there, though - Engine API/Transition coverage not fully covered - Should focus on Engine API first - Should test in Hive with a CL client mock - For Transition, need an EL client mock - EL client mock is probably the #1 thing we are missing currently - Marius may be able to build something like this, with help from Mario/Mikhail - Prysm has [a feature](https://github.com/prysmaticlabs/prysm/pull/10533) similar to this, will share with folks to see if some of it can be re-used - ReTestEth Status? - Mario was helping Dimitry on it, but hasn't had time to go back to it - [Kurtosis Nightly Runs](https://notes.ethereum.org/@parithosh/r16MZJDrq) - Seeing stable patterns emerge - Will try and focus on getting Besu up and running on Kurtosis - Some ideas of features to add to Kurtosis: - CL pause @ TTD + unpause later - EL pause @ TTD + unpause later - Add CL/EL post TTD and sync to head - Split network in half and rejoin (Unsure if its possible today with kurtosis) - Reset a node/CL/EL, remove their datadirectory and let them resync - Better log output in the CLI - Antithesis - Should be more bandwidth on their side in the coming weeks - Hive tests - Marius made two PRs to fix the failing Geth tests - New test with `latestValidHash` was failing on Geth, but now a PR is up to fix it. Once that is merged, Geth should be passing every tests - Mario talking with Erigon, helping them to get tests passing - Shadow Forks - Clients should try and stop/start nodes around TTD to test weird syncing/DB scenarios during this week's Shadow Fork - Marius going to stop some MSF2 nodes and restart them this week - Big Yuga NFT drop: no one checked during the actual drop, but a few hours later things looked stable - How many epochs do we have between Beacon Chain launch and TTD? - Next fork, ~1000 epochs - This means shadow forks don't reflect the size/history of the beacon chain - Could maybe use a copy of the Prater beacon chain? - Can we create a beacon chain 'that starts in the past'? - Could we use prymont? - Probably...! Would be a Goerli Shadow fork. - CL client feedback request on https://github.com/ethereum/consensus-specs/pull/2878 - Did clients all implement proposer boost? Can't know until Bellatrix ## April 11 - Mainnet shadow fork - Gas limit isssue caught because of mainnet: https://github.com/ethereum/go-ethereum/pull/24680 - Geth default is 8m, so block limit was downvoted from 30m - Nethermind: - Found a syncing issue, still investigating - Besu: - Odd responses to payload messages from CLs - Seems like Nethermind is always 12s behind the geth nodes *on ethstats* - Sync testing - Want to track sync times after 1/2 weeks after shadow fork - Pari has a few playbooks to get it done - Longer part of the process is figuring out sync time and if anything went wrong - Goerli shadow fork - Geth still investigating shadow fork issue - Kurtosis - Mario wrote a post-merge verifier to check 10-15 different metrics for network health. A lot of tests are failing now, but unclear if it's because the tests are having issues running. - Devconnect shadow forks - Goerli, April 19th - Mainnet, April 23rd ## April 4 Recording: https://ethereumfoundation.zoom.us/rec/share/TSbI_kGCmviCmuFiPLOtKZPJSkn86Vr9vT8BSjHP3RgojG3youW7r-bB56TpIHIT.SG0zDBPkhjKq89Dv Passcode: Fav!4^!U - Goerli Shadow Fork recap - Happened today at ~9am EU time - TTD and merge transition went smoothly - Besu had an issue during transition, found it and working on a fix - Nethermind went smoothly - Smoothest shadow fork - Geth: noticed bad blocks, issues with pre-executing transactions led to wrong state in the DB during last PoW block and first finalized block - Happens with transactions that deploy a contract - Why did this get hit during the merge? - Unsure yet, perhaps a race condition with payloads - How can we replicate this in nightly builds? - A lot of shadow forks issues have to do with the size of the state - Do we have tests for re-orging around the TTD? - Not really, still a WIP - Mainnet shadow fork this week - Want to see performance on this (e.g. sync/stability after the fork) - Need an editable genesis for mainnet from all EL client teams - Want traces of bad blocks that get hit - If the block PoW is accepted or CL marks it as accepted, we should have the traces - Could clients just store a witness for it? Yes, but it might be a bit complex to implement + there is no spec for it - ASAP: dump bad blocks as is, but also want to try and get the common format started - [Test coverage doc](https://hackmd.io/vjgC9hV_TrK1ZuftVm1rZA) - Went through every spec document we have, and wrote down all the checks we'd want to do based on each statement - CL tooling overview by Pari (see last 12-15 mins of recording) - Pari's generic debugging doc: https://notes.ethereum.org/cmyGUbKVTTqhUGDg_GYThg#How-to-create-a-merge-testnet - Shared client dashboard: comming :soon: - https://ethereum.github.io/beacon-APIs/ - https://ethtoolbox.com - https://github.com/wealdtech/chaind - https://github.com/wealdtech/ethereal - https://github.com/protolambda/eth2-val-tools - https://github.com/protolambda/zcli - https://github.com/wealdtech/ethdo ## March 28, 2022 - Goerli Shadow Fork recap - Nethermind: reworking sync, still need time - Besu: fork after TTD, need to sort out issues - Queuing up behavior changes that will happen post-TTD when `FORK_NEXT_VALUE` is hit. - Geth: syncing/reorg issues causing panics - Lighthouse: repeats same head on network for hours on end - Geth/Teku: Geth crashes because of Teku's RAM usage - Two shadow forks of Goerli this week (Wednesday + Friday) - Lighthouse has started fuzzing belletrix for all CLs - MergeFuzz not working with Besu - Kiln - Lighthouse wants to deploy their "chaos" tooling to spam the CL - Going to deploy the Bad Blocks creator on the EL - War Rooms ([context](https://twitter.com/realLedgerwatch/status/1507286658202451985)) - When stuff breaks, we need a lot more tooling and people to make sense of things. - Right now, a lot of this is dependent on Pari - Client teams should make sure they can monitor/triage issues - CL devs ask for EL info, and vice versa - Next calls: deep dive into tooling on each side. - April 4th: deep dive in CL tooling - April 11th: deep dive into EL tooling ## March 21, 2022 ### TODOs - [x] Make #interop only write for contributors - [x] Shadow Fork Goerli this week - [ ] Kurtosis - [ ] Set up Kurtosis to run through the transition/shadow fork frequently - [ ] Set up Kurtosis to enable builds where each client is a majority client on the network - [ ] Improve Kurtosis to get more fined grained metrics - [X] Setup team notifications for failed Actions: https://github.com/parithosh/nightly-kurtosis-test/actions - [ ] Hive - [ ] Launch another Hive instance to run CL tests quicker - [ ] Add better test coverage for the various spec statements - [ ] Network Health Metrics - [ ] Create a list of network health metrics we want to monitor. [WIP](https://notes.ethereum.org/cW0fTuBxRDicqwi4MvsM5Q?edit) - [ ] Other - [ ] Add more clients to Prysm's E2E integration test suite - [ ] Test client combination in "weird" network states (e.g. syncing, removing DB, etc.) to make sure that messaging works as expected - [ ] Test slow block processing time / large blocks on EL and make sure they work with CL implementation of the spec - [ ] Run a network through the merge (ideally mainnet shadow fork) and let it running for several weeks to get benchamrks on nodes running - [ ] Test the terminal block hash override ### Useful Links * Marius merge testing plan: https://hackmd.io/WKpg6SNzQbi1jVKNgrSgWg