owned this note
owned this note
Published
Linked with GitHub
# Longstanding MC-testnet(s)
Over the coming months, there are two primary testnet efforts -- (1) longstanding multi-client testnets and (2) s.horter lived incentivized testnets. This document addresses (1). For details on (2) see [here](https://notes.ethereum.org/nM_8ZY70RnSBpBuiRkgLCw?view).
[toc]
## Goals
* Battle test multiple clients in a production (but limited adversarial) context
* Provide a hands-on experience that mimics mainnet as much as possible
* Induce and observe most expected behaviors in a production context (e.g. exits, slashings, etc)
* Monitor specific client performance and uptime to assess production readiness
* Serve as a long-standing testnet for the community well past the launch of mainnet
## Structure
A high level view on steps
1. **[Spec/coordination]** Select exact version of spec and coordinate with primary clients and tools
2. **[Public release]** Release testnet contract on public eth1 testnet and general details of testnet to public
3. **[Genesis]** Monitor testnet and client performance throughout
4. **[Behaviors]** Induce behaviors over time, documenting successes and failures
5. **[Death and rebirth]** If any substantive changes make it into spec, choose a time to kill testnet and restart this process
### 1. Pre-coordination
Long-standing testnets should target the absolute latest version of the spec. State transition tweeks are done, but we a number of small networking changes are still in the works. We should aim to stop fiddling with these ASAP (maybe second week of April).
_(Comment Afri: I think targeting the latest version of the spec is not desirable as this will always be a moving target. All client teams should agree on a very specific version, e.g. `v1.11.0`, and maintain a joint release strategy across the spec and all clients. Furthermore, it would be desirable if in the long run all clients maintain baseline compatibility in their master/development code. Breaking changes in spec/code should be subject to major/minor version bumps in the spec according to semantic versioning, i.e., all clients that implement a certain version of the `v1.11.x` spec should be compatible with any other client implementing any patch level of `v1.11.x`.)_
I expect the backbone of these testnets to largely be composed of Lighthouse, Prysm, and Nimbus. We should make sure all of these clients are on target spec version, and we should perform smoke-tests and short trial testnets before a public testnet.
_(Comment Afri: I started a first [short-trial testnet](https://gist.github.com/q9f/d6eea3ea3356e41bde81864143284ce9) with Lighthouse. The Nimbus tooling is not ready to just hop on another running testnet, yet. Didn't have the chance to play around with Prysm, yet. What about Teku?)_
In addition to client coordination, we need to coordinate with some of the public web providers for eth2 (etherscan, beaconcha.in, eth2stats.io, etc) so that they have versions of their monitoring websites ready for the particular version and genesis. It probably makes sense to include these teams in a dry-run of the testnet start (before the public fanfare).
Testnet monitoring also needs to be put in place prior to genesis so we have good insight into the network from day 0. Each client currently has ways to monitor performance of particular nodes (prometheus/grafana). In addition to that, we will want tools on hand to peer into the state of the network, degree of forking, cli debuggers (e.g. lcli, zcli), etc.
### 2. Public release
We need to select a public eth1 testnet (likely Georli) and launch a version of the eth2 deposit contract. We also need to define the parameters of the tesnet (minimum genesis validator count, minimum deposit, genesis time, etc). From there we need to have some sort of campaign to generate awareness and excitement for the testnet (blog posts, twitter, instructions/docs).
_(Comment Afri: Happy to coordinate with exactly that. The Schlesi spec (linked above) could be used as discussion to review the parameters for a persistant MCT.)_
Along with the release of the first public multi-client testnet, we'll also release a deposit/educational web-interface. This is near completion and can be shared soon for review by testnet coordinators.
### 3. Genesis
Genesis! Chain starts. wooo
Monitoring and data collection goes into place. We should expect issues and have resources available to debug and troubleshoot. We might see consensus/spec issues (e.g. consensus failure, network partions) as well as specific client problems with crashes, resource comsumption, etc.
We will likely try to keep this net going and mend it if possible as a simulation of keeping mainnet alive, but if it's time to die for some deep problem (especially if we will change the spec in response), we'll kill it and start again.
### 4. Behaviors
We want to see most of the logical paths in the state transition executed and most of the potential network messages sent on a near-mainnet testnet. Much of this will be covered through normal operation, but we should make a point to run and document each of the following at least once and likely many times.
* validator deposit
* validator exit
* proposer slashing
* attester slashing
* empty attestations included on chain
* (attempted) propagation of block with invalid signature
* propagation of block with valid signature but invalid data
- [ ] * (attempted) propogation of attestation with invalid signature
* inactivity leak
* can do this by taking a subset of nodes we control and cutting them off from rest of network for X amount of time. Then allow to rejoin network
### 5. Death and rebirth
The ultimate goal is to have a testnet live for an indefinite period of time to serve has a testbed/playground for eth2 mainnet.
But, these testnets might also inform us of breaking changes we must make to the spec. In event of breaking changes, we will set an end date for the testnet and restart this process on the updated version/clients.
## Things to discuss
* Should we use Georli as the eth1 backbone for this testnet?
* Pros:
* Already exists -- no overhead in launching
* Wide access and use
* Cons:
* Cannot control who/how much ETH shows up. Georli whales could easily attack the network (e.g. get 33% or 66% of staked ETH)
* _Note Afri: We hold 65% of the Görli ETH. This can be actually an advantage._
* If we want to mimic mainnet as much as possible, we should use as close to mainnet _everything_ as we can.
* 32 ETH requirement puts a burden on Georli ETH reserves
* _Note Afri: We could create a Görli testnet deposit faucet that specifically gives users 32 GöETH to be used for MCT deposits only. Con: adds a little complexity._
* We can put a drain function in the testnet deposit contract, but then have some divergence from mainnet (though likely not that important...)
* _Note Afri: I don't think this will be necessary._
* Should we attempt to rate limit participation via some identity system (e.g. github account for 32 ETH faucet on some more controlled eth1 testnet)?
* My intuition is to not do this for a long-standing testnet, but to investigate it more for the incentivized testnets
* _Note Afri: Alternatively, there was always the idea of having "testnet exchanges", where you can "buy" testnet Ether for real-value tokens. This would not require an identity system and prevent spam. This could be used for incentivization of the testnets but needs a lot of thoughts._
* How should we coordinate to ensure that (the most viable) clients get a good representation/distribution of nodes on the network?
* The worry here is that everyone will flock to Prysmatic and we will end up with a 85%+ node/validator representation in one client (this is a fear for mainnet too...). This will still be useful, but will not meet the goal of battle testing _many_ clients, and in the event of issues with e.g. fork-choice, the dominant client behavior might just win out instead of more interesting forking.
* My intuition is we (1) advertise the importance of client distribution and trying out the less known clients and (2) keep good tract of % network client distribution so that we can push for particular client usage if necessary. Strongly encouraging the use if something like eth2stats.io will help with this
* We should define what a "successful" mc-testnet looks like
* client-distribution
* ex: 3 clients of more than 20% and no more than 50% of any one client
* testnet-stats
* 3 weeks without entering an "inactivity penalty" (more than 4 epochs since finality at any time)
* x% of validator successful participation on some time frame
* x% of successful blocks on some time frame
* minimum degree of forking
* minimum number of participating nodes
* all induced-behaviors (see step 4 above)
* validator deposit
* validator exit
* proposer slashing
* attester slashing
* empty attestations included on chain
* (attempted) propagation of block with invalid signature
* propagation of block with valid signature but invalid data
* (attempted) propogation of attestation with invalid signature
* inactivity leak
* can do this by taking a subset of nodes we control and cutting them off from rest of network for X amount of time. Then allow to rejoin network
* We should define what a "production-ready" client looks like
* client-benchmarks
* The following metrics under 16k, 50k, 100k validator networks
* Time for non-epoch-boundary state transition full of attestations.
* Time for epoch-boundary state transition full of attestation
* etc... (will fill in soon)