Linked with GitHub
# Merge Testing Call Notes
## September 10
* EF Devops node should be prioritized for Berlin folks
* Boulder folks to set up others
## September 5
* Pari Running the bad block creator pr on 10% of all nodes on a shadowfor
* Hive tests
* Erigon invalid blocks - work is being done to create tests, but so far has been unsuccessful.
* Invalid transactions p2p, will work on that for the next days
* Discussions around the bounty program
* Has there been any high value reports?
* Nothing critical, but reports about actual issues keep coming in, and has increased since we upped the bounty.
* Bellatrix Watch party?
* Eth R&D - Party Lounge
## August 29
* Introducing more tests that do cross-checks on payload creation. Nothing interesting so far. Not been able to introduce any invalid payloads.
* Shadowfork 11
* Invalid payloads from Erigon last week, fixed now.
* Shadowfork 12 on Wednesday.
* Shadowfork 13 to be launched (hit TTD between bellatrix on mainnet).
* Estimates on bordel.wtf looking good so far
## August 22
* Hive updates
* https://github.com/ethereum/hive/pull/638 , test for the EL clients which tests on invalid execution on the terminal proof-of-work block
* https://github.com/ethereum/hive/pull/637 , for the CL clients implements "No viable head due to optimistic sync", lighthouse is passing without issues, teku requires client whitelisting to be able to sync, prysm is timing out while syncing but have not investigated if the culprit is a test issue
* Geth passes all engine api test, some other clients fails a few tests
* Besu, Nethermind, Erigon failing a variety of engine api, transition and sync tests.
* Mario is writing terminal_block_hash tests on hive, if anyone has anything they have implemented and want to test please reach out, Mario believes there are some scenarios we could test with these tests
* Client fuzzing
* Erigon: API used to fuzz no longer working, could this be looked into?
* Danny handling blog post for merge announcement
* Would be good to have client releases out within the next 24 hours.
* Shadowfork planned for next week.
* At least one mev-relays on the next shadowfork
* Tests being done on mev-boost failures
* Engine API release
* We can do two shadowforks until the merge
* Chaos testing to be done
* Kurtosis looking good: https://github.com/parithosh/nightly-kurtosis-test/actions
* Antithesis: No issues found since last week
* In-Person Merge events:
* Berlin: If you're in Berlin during the merge, reach out to Pari to join
* Boulder: If you're in Boulder during the merge, then there will be refreshments at the new Boulder office
## August 15
* Sepolia fork
* Marius and mrabino both recalculated things and the 5875 number looks good
* ETA still 14/15
* Goerli looks good - 88% participation rate
* Lodestar attested to a bad block, still investigating what happened
* Shadow fork testing
## August 8
* DAG size >5gb
* Are we sure the DAG size increase happens on [August 17](https://minerstat.com/dag-size-calculator)?
* Pretty sure, but might also be possible to extend life of certain machines
* Want to make sure we don't hit a one by off error
* Confirmed: DAG size will be just under 5GB on Aug 17th, well above around Aug 21st, but might not
* Mainnet release ETAs?
* Could clients release on week of 22nd?
* Besu, teku yes, Nethermind, Prysm TBD, Geth likely OK
* Optimistic sync tests https://github.com/ethereum/consensus-specs/pull/2965
* Original proposal: https://github.com/ethereum/consensus-spec-tests/issues/28
* Will allow different forks with diff weights, payloads, etc.
* Recommended that CL teams implement
* One more scenario in Hive: https://github.com/txrx-research/TestingTheMerge/pull/12
* Terminal Block Hash
* Do all clients support TBH?
* Geth no (only for peering), Netermind PR open, Besu not sure but has one for for peering
* Should we do a shadow fork to test this ?
* Can try and schedule one for next week - will discuss more on CL call
## August 1
* Lighthouse: new release, 2.5.0, should have all features needed for the merge.
* Goerli: Any worries that we should address?
* Doc is out
* mev-boost testing
* standardize messaging? "ready for merge" will now be displayed in logs by Lighthouse if client is ready for merge.
* "No valid head due to optimistic sync" is being worked on by multiple teams
* Paul and Mario to collaborate on how to create additional Hive tests for this
* Next Goerli shadow fork: Configs came out today, Beacon chain on Wednesday, TTD on Thursday.
* Forkchoice deadlock fix:
* Worked on by teams, included in Lighthouse 2.5.0, looks to be ready in time for mainnet
## July 25
* Requirements for testing MEV-boost?
* Currently being tested on Ropsten/Kiln
* OK for happy case, but need to better test edge cases/relay failures, etc.
* Want to run during shadow fork transtion to ensure that it's off during transition: will try on GSF6
* Will continue conversation in #block-construction
* Hive transition testings
* right now, download blocks from remote peer, but in reality we receive it from gossip network - would be good to test when we get >1 TTD block from gossip network, e.g. CL client re-orgs to a block with lower difficulty, 3-5 TTD blocks, etc.
* Hive test status
* Mario finishing up the test cases, should be doable to get them before Goerli fork
* TBH tests considered nice-to-have
* Goerli/Prater client releases
* Ideally, want to announce fork on Wed, so releases on Tuesday
* Prysm tomorrow
* Geth today/tomorrow
* Nethermind tuesday/wed
* Besu today/tomorrow
* Teku already
* Lighthouse already
* Nimbus release in process
* Lodestar already out
* Erigon already out
## July 18
* Goerli TTD: https://github.com/ethereum/execution-specs/pull/563
* Marius, Pari sanity checked it - good to merge!
* Going to pick Bellatrix epoch async this week on discord
* Goerli documentation
* WIP from Pari:
* Will try and add this on the launchpad, and link in the blog post announcement
* Script to check if you have Engine API open, valid JWT and correct TTD
* Some work on prototyping this, but unsure about whether this creates a false sense of security
* Alternative: provide a common string to look for across clients for error (or success?) logs
## July 11
* Invalid signature on Sepolia
* No root cause identified yet, some people looking into it
* Merge Release Process: two step vs. one step
* Risk of misconfiguration > risk of hashpower attack
* Can also override TTD in the case of an attack
* Hashrate relative to ATH is a potential cause for worry (https://etherscan.io/chart/hashrate)
* Bellatix before EL mainnet?
* Not great for a few reasons: EL will have error logs , would want both specs frozen at the same time
* One Step release process: ~3 weeks
* Choose Bellatrix epoch/TTD
* Blog post out a few days later
* Bellatrix 1-2 weeks later
* TTD ~2 weeks after that
* Two step release process: ~4 weeks
* Choose Bellatrix epoch
* Blog post out a few days later
* Bellatrix 1-2 weeks later
* TTD blog post 1-2 days later
* TTD hit ~2 weeks after that
* No resolution yet, need to discuss further.
## July 4
* Progress on testing:
* Geth: Hive tests, done pretty soon. Waiting for a PR in geth to make the test correct.
* Geth: Had a regression that caused some sync issues, it's fixed in the latest release now.
* Nethermind: Fixed a few things, waiting for mainnet shadow fork and sepolia to verify everything is fine. Also waiting for Hive tests. Two Hive tests (getblockbynumber,..) are expecting a zero response when Nethermind is returning 0x00..., so Nethermind is currently failing this test.
* Lighthouse: Merged a pretty big refactor during the weekend, waiting for shadowforks and sepolia.
* Pari will updates nodes to latest unstable.
* Discussion when a consensus node gets a missing head. New test case in Hive will be available in Hive for this.
* Consensus Clients Hive Integration: https://notes.ethereum.org/gmI6nmcuRhy3LV7vr0-sLA?view
* Tests in Hive are currently run on full sync, Mario will work on getting tests run on both full sync and snap sync.
* A lot of tests are currently failing, Pari working on updating this with Rafael
* ACD on Friday: mergeNetsplitBlocks for ropsten and sepolia is added to the agenda. https://github.com/ethereum/pm/issues/562
## June 27
* `latestValidHash` attack scenario [presentation](https://docs.google.com/presentation/d/1nFmxSgsjZ5BI6RGps6oiGE7R5USHZUfbrULFjdajdb8/edit#slide=id.g137cde17e05_0_0) by Mikhail
* Not a must have for The Merge, but client teams should decide whether they want to address it before moving to Goerli. Ideally, no more spec-conformance changes after Goerli.
* TTD for Sepolia: `17000000000000000`
* Chose based on estimations [here](https://gist.github.com/taxmeifyoucan/0dbe0c51b643eee1f215a6b1d94f9333).
* **Client Teams:** please put out a release by Wed with this TTD. Announcement coming on Thu/Fri.
## June 20
* Progress on testing from client teams
**Besu** Getting through backlog of failing hive tests.
**Nethermind** Currently fixing some bugs (transition), investigating issues with syncinc states. Concerns about sync edge cases, would like to discuss next week.
**Lighthouse** Testing forkchoice. Looking at bringing kurtosis into CI.
**Hive tests** Found new edge case when attempting to produce the transition payload, where the forkchoiceUpdates attributes is requesting a payload with the same timestamp as the terminal block, causing the execution client to return an error.
* Gray Glacier. Some concerns about information not going out to enough people with such a short time frame to update. Work has been done to further increase the information sharing.
* Would teams find it valuable to classify bugs as sync related or non-sync-related?
* Thoughts on setting TTD for Sepolia on ACD Friday?
* Would be okay, but perhaps 5 days isn't optimal as gray glacier will happen around the 29th?
* Lightclients - some discussions, currently working on making lightclients compatible with each other through the standard.
* Grafana Oncall - Tobias might demo this at some point.
## June 13
* Difficulty Bomb
* Is 500k the right amount?
* Means ~14s block times late August
* Doesn't leave a ton of room for error
* Post-merge sync still needs analysis
* Aiming for mid-September seems realistic
## June 6
- Empty blocks created by Besu/Nethermind
- Besu: found the root cause, PR up to fix.
- Nethermind: not a nethermind issue, caused by CL. Adjusted block production to be similar to Geth. Only hides CL bug, doesn't fix it.
- Question: how to disconnect EL PoW peers, post-merge?
- Can't blindly follow what's in the spec, because otherwise shadow forks won't work.
- Nethermind: asks peers if they have the same `head` block hash. Could perhaps use `finalized` instead?
- Geth does this differently, doesn't like this approach too much
- Would be nice if can standardize something
- Ropsten Merge
- CL Securiy has everything running, except Erigon - need to swap in to new version
- Community Guides
- Add TTD override to updated guides
- Hive Tests
- Not fixed yet, Marius has an idea for a fix. Will try and implement this week.
## May 30
- Beacon Chain Launch Status
- Should happen at 15:00 UTC
- Reaching out client teams to make sure they have the keys
- beaconcha.in will be live Friday
- Ropsten release status:
- Besu - released
- Nethemermind - released
- Geth - manually build from master or use latest docker build, previous release TTD override
- Erigon - released
- Lighthouse - good
- Lodestar - good
- Nimbus -
- Prysm - good
- Teku - good
- Hive Test not passing for Geth
- Geth is passing all tests except the ones with sister chain re-org sync
- Test expects different functionality than what was implemented
- sync part of the chain, and take the other part of the chain from cache.
- find peer with invalid chain, sync and detect that you ended up on the invalid chain.
- Spec is okay.
- Test scenarios for hive are being transitioned, hopefully done by end of June (they will gradually appear) to then be implemented by client teams.
- Geth does not know how soon it can be done from their side.
- Besu hopes to pass hive tests in a couple of weeks
- Nethermind hope to pass in a couple of months
- Docs on how to configure and run nodes should be improved given new complexity
- Tim looking at the possibility of having grant(s) created for documentation writing for the client teams.
- Technical writer in the EF working on improving documentation for launch pad and other areas.
- Goerli vs Sepolia
- Why was Sepolia chosen to be last?
- Primarily it was chosen as it would be nice to have Goerli running for a longer period of time due to Sepolia being so new (which means it's less likely to have issues).
- Difficulty Bomb
- Discussions about potential difficulty bomb delay being included, and how miners could react to this.
- Important to communicate clearly about the path towards PoS
- Discussions if it should be included in the Paris release or prior to the Paris release, if there were to be an additional difficulty bomb delay.
- Geth team is at ETHPrague if anyone wants to hang out
## May 23
- Sam joined the calls - working in EF devops, currently working on sync tests.
- Sam demoed the Sync test dashboard, all client teams should have access. If not reach out to Sam or Pari. It currently has Goerli shadow fork 4 and Kiln support where sync tests run twice a week, for all different kind of combinations of clients.
- Currently using default sync modes for clients, will expand this moving forward
- If you have ideas for sync tests, reach out to Sam/Pari
- EF Security ran a few combinations of validators instrumented w various sanitizers (TSAN, UBSAN, MSAN, ASAN) on mainnet shadowfork and some minor race conditions. Example: https://github.com/prysmaticlabs/prysm/pull/10722
- Will start testing for Ropsten.
- Ropsten release status:
- Nethermind: No release yet, late this week or next week
- Besu released, 22.4.1 with Ropsten support
- Geth merged, started release notes. Perhaps today or tomorrow.
- Teku released, 22.5.1 got Ropsten support
- Nimbus: Released
- Lighthouse: Released
- Prysm: ?
- Lodestar: ?
- Pari to work on docs for Ropsten tomorrow
- Shadow fork tracker: https://notes.ethereum.org/@parithosh/ryHXp5jUc
- It would be good to have tests added that ensure that CL clients conform to spec for payloads.
- Pari have shared validator keys with the teams. Each client team has ~9% of keys.
- Kurtosis are meant to merge the kubernetes changes today, should be possible to run larger scale tests.
- Mario working with hivetests. Marcus joined and has started adding basic tests, and will work on state tests.
- All state tests were readded for the merge tests (19k tests), all worked fine.
- Consensus clients added to hive. Lighthouse added already. Nimbus added. Lodestar and Teku to be added this week. Prysm docker client not working yet.
## May 16
- Mikhail testing coverage updates
- Marius EL Mock progress
- Basic flows work, can insert faults
- New member of the testing team starting to help out with testing efforts.
- Merge readiness checklist
- Some left with regards to stress testing
* Single client load/metrics
* Network load testing
* Larger blocks
* Shorter slot times
- Kurtosis working on implementing stress tests. Goal is to have something out this Friday.
- Shorter slot times and Larger blocks may need additional testing outside of Kurtosis.
- Sync tests are now automated. Currently doing it on Göerli shadow fork. Erigon, Lodestar and Nimbus are not there, but it's possible to check history of sync times. Some work to be done with regards automating reports.
- Non-finality tests
- Mainnet shadow fork 2 - keep for sync benchmarks
- Mainnet shadow fork 4 - use for slashing tests
- Göerli shadow fork 3 - use for slashing test
- Ropstep validator set and genesis files for The Merge being worked on today. If a client team does not want to validate, let Pari know.
- Config files to be released in the next hours.
- New mainnet shadow fork on Thursday
- The main difference this week is that there will be an equal client split
- Any new images for the next Shadow fork on Thursday?
- Let Pari know if there is one available.
## May 9
- Edge case sync issue found in Geth - [test case](https://github.com/ethereum/hive/pull/526)
- If multiple blocks created by the same validator in the row, could potentially force nodes from full to snap sync, and feed them an invalid chain, you could partition nodes off the network.
- In geth today, this is mitigated because nodes aren't allowed to snap sync again after they've snap synced once.
- Clients should not allow a fast/snap sync to be initiated if it has already switched to full sync
- Marius has been working on the CL mocker
- Terminal Block Hash Override
- When and how do we test/implement this?
- Unsure what the value is. Geth has a networking flag to whitelist a certain block.
- It should either be in the spec + implemented, or not there at all
- Value is if chain degrades pre-TTD being hit and we need to move to another block in emergency
- Consensus is to build/test in parallel to testnet deployments. Don't want to delay testnets over this.
- Moving to testnets
- Want to make sure the newest set of Mario's tests is passing before merging (but could choose TTD/put client releases out before)
- Clients team feel _close_ to being able to fork the first testnet, but not 100% sure. Will see on ACD what happens!
- Potential schedule:
- May 13 - Pick Ropsten TTD
- May 16-18 - Client releases out
- May 18/19 - Ropsten Fork Announcement blog post
- June 1 - Bellatrix on Ropsten
- June ~8 - Ropsten Merge
- Shadow fork viewing meeting
- Likely to be scheduled at 15:00 UTC on this Thursday
- [#BPFdoor](https://twitter.com/CraigHRowland/status/1523266585133457408?s=20&t=1enAJMYr7xgSLUzBVF_kwg) conversation overtime!
## May 3
- Keep (at least half) the call focused on merge testing for now
- [Testing Coverage Doc](https://hackmd.io/@n0ble/merge-test-coverage) update
- Went through all specs to check testing coverage
- Consensus specs are the best covered
- Transition process is not covered there, though
- Engine API/Transition coverage not fully covered
- Should focus on Engine API first
- Should test in Hive with a CL client mock
- For Transition, need an EL client mock
- EL client mock is probably the #1 thing we are missing currently
- Marius may be able to build something like this, with help from Mario/Mikhail
- Prysm has [a feature](https://github.com/prysmaticlabs/prysm/pull/10533) similar to this, will share with folks to see if some of it can be re-used
- ReTestEth Status?
- Mario was helping Dimitry on it, but hasn't had time to go back to it
- [Kurtosis Nightly Runs](https://notes.ethereum.org/@parithosh/r16MZJDrq)
- Seeing stable patterns emerge
- Will try and focus on getting Besu up and running on Kurtosis
- Some ideas of features to add to Kurtosis:
- CL pause @ TTD + unpause later
- EL pause @ TTD + unpause later
- Add CL/EL post TTD and sync to head
- Split network in half and rejoin (Unsure if its possible today with kurtosis)
- Reset a node/CL/EL, remove their datadirectory and let them resync
- Better log output in the CLI
- Should be more bandwidth on their side in the coming weeks
- Hive tests
- Marius made two PRs to fix the failing Geth tests
- New test with `latestValidHash` was failing on Geth, but now a PR is up to fix it. Once that is merged, Geth should be passing every tests
- Mario talking with Erigon, helping them to get tests passing
- Shadow Forks
- Clients should try and stop/start nodes around TTD to test weird syncing/DB scenarios during this week's Shadow Fork
- Marius going to stop some MSF2 nodes and restart them this week
- Big Yuga NFT drop: no one checked during the actual drop, but a few hours later things looked stable
- How many epochs do we have between Beacon Chain launch and TTD?
- Next fork, ~1000 epochs
- This means shadow forks don't reflect the size/history of the beacon chain
- Could maybe use a copy of the Prater beacon chain?
- Can we create a beacon chain 'that starts in the past'?
- Could we use prymont?
- Probably...! Would be a Goerli Shadow fork.
- CL client feedback request on https://github.com/ethereum/consensus-specs/pull/2878
- Did clients all implement proposer boost? Can't know until Bellatrix
## April 11
- Mainnet shadow fork
- Gas limit isssue caught because of mainnet: https://github.com/ethereum/go-ethereum/pull/24680
- Geth default is 8m, so block limit was downvoted from 30m
- Found a syncing issue, still investigating
- Odd responses to payload messages from CLs
- Seems like Nethermind is always 12s behind the geth nodes *on ethstats*
- Sync testing
- Want to track sync times after 1/2 weeks after shadow fork
- Pari has a few playbooks to get it done
- Longer part of the process is figuring out sync time and if anything went wrong
- Goerli shadow fork
- Geth still investigating shadow fork issue
- Mario wrote a post-merge verifier to check 10-15 different metrics for network health. A lot of tests are failing now, but unclear if it's because the tests are having issues running.
- Devconnect shadow forks
- Goerli, April 19th
- Mainnet, April 23rd
## April 4
Recording: https://ethereumfoundation.zoom.us/rec/share/TSbI_kGCmviCmuFiPLOtKZPJSkn86Vr9vT8BSjHP3RgojG3youW7r-bB56TpIHIT.SG0zDBPkhjKq89Dv Passcode: Fav!4^!U
- Goerli Shadow Fork recap
- Happened today at ~9am EU time
- TTD and merge transition went smoothly
- Besu had an issue during transition, found it and working on a fix
- Nethermind went smoothly
- Smoothest shadow fork
- Geth: noticed bad blocks, issues with pre-executing transactions led to wrong state in the DB during last PoW block and first finalized block
- Happens with transactions that deploy a contract
- Why did this get hit during the merge?
- Unsure yet, perhaps a race condition with payloads
- How can we replicate this in nightly builds?
- A lot of shadow forks issues have to do with the size of the state
- Do we have tests for re-orging around the TTD?
- Not really, still a WIP
- Mainnet shadow fork this week
- Want to see performance on this (e.g. sync/stability after the fork)
- Need an editable genesis for mainnet from all EL client teams
- Want traces of bad blocks that get hit
- If the block PoW is accepted or CL marks it as accepted, we should have the traces
- Could clients just store a witness for it? Yes, but it might be a bit complex to implement + there is no spec for it
- ASAP: dump bad blocks as is, but also want to try and get the common format started
- [Test coverage doc](https://hackmd.io/vjgC9hV_TrK1ZuftVm1rZA)
- Went through every spec document we have, and wrote down all the checks we'd want to do based on each statement
- CL tooling overview by Pari (see last 12-15 mins of recording)
- Pari's generic debugging doc: https://notes.ethereum.org/cmyGUbKVTTqhUGDg_GYThg#How-to-create-a-merge-testnet
- Shared client dashboard: comming :soon:
## March 28, 2022
- Goerli Shadow Fork recap
- Nethermind: reworking sync, still need time
- Besu: fork after TTD, need to sort out issues
- Queuing up behavior changes that will happen post-TTD when `FORK_NEXT_VALUE` is hit.
- Geth: syncing/reorg issues causing panics
- Lighthouse: repeats same head on network for hours on end
- Geth/Teku: Geth crashes because of Teku's RAM usage
- Two shadow forks of Goerli this week (Wednesday + Friday)
- Lighthouse has started fuzzing belletrix for all CLs
- MergeFuzz not working with Besu
- Lighthouse wants to deploy their "chaos" tooling to spam the CL
- Going to deploy the Bad Blocks creator on the EL
- War Rooms ([context](https://twitter.com/realLedgerwatch/status/1507286658202451985))
- When stuff breaks, we need a lot more tooling and people to make sense of things.
- Right now, a lot of this is dependent on Pari
- Client teams should make sure they can monitor/triage issues
- CL devs ask for EL info, and vice versa
- Next calls: deep dive into tooling on each side.
- April 4th: deep dive in CL tooling
- April 11th: deep dive into EL tooling
## March 21, 2022
- [x] Make #interop only write for contributors
- [x] Shadow Fork Goerli this week
- [ ] Kurtosis
- [ ] Set up Kurtosis to run through the transition/shadow fork frequently
- [ ] Set up Kurtosis to enable builds where each client is a majority client on the network
- [ ] Improve Kurtosis to get more fined grained metrics
- [X] Setup team notifications for failed Actions: https://github.com/parithosh/nightly-kurtosis-test/actions
- [ ] Hive
- [ ] Launch another Hive instance to run CL tests quicker
- [ ] Add better test coverage for the various spec statements
- [ ] Network Health Metrics
- [ ] Create a list of network health metrics we want to monitor. [WIP](https://notes.ethereum.org/cW0fTuBxRDicqwi4MvsM5Q?edit)
- [ ] Other
- [ ] Add more clients to Prysm's E2E integration test suite
- [ ] Test client combination in "weird" network states (e.g. syncing, removing DB, etc.) to make sure that messaging works as expected
- [ ] Test slow block processing time / large blocks on EL and make sure they work with CL implementation of the spec
- [ ] Run a network through the merge (ideally mainnet shadow fork) and let it running for several weeks to get benchamrks on nodes running
- [ ] Test the terminal block hash override
### Useful Links
* Marius merge testing plan: https://hackmd.io/WKpg6SNzQbi1jVKNgrSgWg