# Eth2 hacker start
Want to get started with Eth2 protocol things? This is the place! There are lots of interesting problems to work on, even with only a surface level understanding of Eth2.
All of these challenges are generally good projects to learn more, and start to BUIDL on Eth2.
If you join us at one of the awesome new online hackathons ([EthGlobal](https://www.ethglobal.co/) , [Metacartel](https://hackathon.metacartel.org/) :eyes:), you can even earn a prize for completing a mini project!
Or, if you just feel like hacking we are always available to help you in the [Eth2 Discord](https://discord.gg/dtqe2TR)!
## Eth2 Information
It may be a lot to understand the full Eth2 picture, but learning it on a surface level is already a great start! You can generally get started with challenges after the TLDR.
- Devcon V Eth2 TLDR, sit back and learn: https://slideslive.com/38919931/eth-20-tldr
- Phase0 for humans, quick-read to understand the Phase0 spec: https://notes.ethereum.org/@djrtwo/Bkn3zpwxB
- Eth2 Specs repository; exact specs, networking Q&A, and more: https://github.com/ethereum/eth2.0-specs
- Eth2 INFO, a super aggregate of learning resources (more videos, diagrams, intro posts!): https://eth2.info
- Eth2 NEWS, weekly updates on Eth2 progress: https://eth2.news
## Eth2 Client Chats
If you are looking for a specific client team or people to help you get into Eth2 with a specific language, try these:
- Prysm (Go): https://discordapp.com/invite/KSA7rPr
- Lighthouse (Rust): https://discord.gg/uC7TuaH
- Lodestar (TS): https://discord.gg/Quv3nJX
- Nimbus (Nim): https://discord.gg/YbTCNat
- Teku (previously Artemis + Harmony) (Java): https://gitter.im/PegaSysEng/artemis
- Trinity (Py): https://gitter.im/ethereum/trinity
- Nethermind/cortex (C#): https://discord.gg/HQ4zTh
- Quilt/Phase2: https://t.me/eth2quilt
And the [Eth2 Discord](https://discord.gg/dtqe2TR) is the best common place to ask questions and find others working on Eth2 projects!
## Join and experiment with a testnet
- [Prysmatic public long-running testnet](https://prylabs.net/)
- [Lighthouse becoming a validator docs](https://lighthouse-book.sigmaprime.io/become-a-validator.html) and [EthDenver hacking guide](https://lighthouse-book.sigmaprime.io/ethdenver.html) (EthLondon guide up soon!)
- Other client teams occosionally have public testnets up, check the [eth2-testnets repo](https://github.com/eth2-clients/eth2-testnets) for details
## Phase0 Challenge Ideas
These challenges help you become familiar with different topics of Eth2 Phase0, while building a fun Eth2 app. No better way to learn Phase0, start hacking on things already :100:
Each of the challenges provides some background with useful links to understand the context of the challenge, and what is there for you to build with.
The background info should help a lot, but if a term is not obvious from the context, don't feel afraid to ask for help!
### Explore operation pools and gossip :loudspeaker:
#### What you learn
Learn how Libp2p GossipSub is used by Eth2 clients, and which information propagates through the Eth2 gossip network.
Eth2 spec about "The Gossip Domain": https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#the-gossip-domain-gossipsub
Nodes communicate "operations" (think; transactions) via GossipSub on the network.
GossipSub spec github Doc: https://github.com/libp2p/specs/tree/master/pubsub/gossipsub
GossipSub paper (with diagrams): https://research.protocol.ai/posts/201912-resnetlab-launch/PL-TechRep-gossipsub-v0.1-Dec30.pdf
Prysm testnet has been hurt by different Gossip and pooling usage bugs;
- Message IDs -> Instead of from + sequence nr, a content based ID works much better for Eth2. Can you see why?
- Dropping attestations out of their memory prematurely.
- Implement a validation pipeline, integrate with Go-libp2p GossipSub
Networking REPL: https://github.com/protolambda/rumor
#### Challenge: pool viewer
Like a block explorer, but only showing the most recent data: gossiped attestations, blocks and other operations.
To start off, use the networking REPL to join the Prysm testnet and log their blocks. Follow the commands in the README.
GossipSub is available in many different languages, so if you do not want to use the REPL gossip-logging (too easy?), you can use any of the following implementations:
- Go: https://github.com/libp2p/go-libp2p-pubsub
- Rust: https://github.com/libp2p/rust-libp2p
- Nim (more experimental): https://github.com/status-im/nim-libp2p
- JS: https://github.com/ChainSafe/gossipsub-js
- Java: https://github.com/libp2p/jvm-libp2p
Once you have established an input/log of gossip messages, you can build a dashboard to show them as they come in. Maybe even visualize them; blocks and attestations show how the chain is evolving.
If this works, you can also think of indexing the events: enable others to look up blocks, like a mini block-explorer!
Also take a look at the fork-choice client challenge if you want to work with others. Using the realtime attestation data to derive the chain head would be a very nice feature!
### Eth1-data tracking :eyes:
#### What your learn
Learn how the deposit contract works, and how the Eth1 -> Eth2 transition looks like for a validator. This is a great challenge if you want to learn about Eth2, but want to stay focused on web, dApp, and Eth1 tooling.
The Eth2 phase0 validators get into the system through a deposit from Eth1.
For this to keep functioning after the Eth2 genesis event, the Eth1 chain has to be kept track of.
Block proposers vote on Eth1 data, and with long voting periods the deposit-contract reference is updated, and new deposits are allowed in.
Validator lifecycle: https://notes.ethereum.org/@hww/lifecycle
Deposit contract spec: https://github.com/ethereum/eth2.0-specs/tree/dev/deposit_contract
Prysm testnet deposit contract: goerli.etherscan.io/address/0x4689a3C63CE249355C8a573B5974db21D2d1b8Ef (Goerli testnet)
For the more web, Dapp or Eth1 focused developers this definitely is a more accessible challenge. With Eth1 tooling, and some Eth2 tooling from Lodestar or the py-spec, you can read and understand the deposit contract interactions.
Lodestar tooling: https://hackmd.io/@wemeetagain/rJTEOdqPS/%2FCcsWTnvRS_eiLUajr3gi9g
Using py-spec as a library: https://github.com/ethereum/eth2.0-specs/pull/1584
Or py-spec SSZ: github.com/protolambda/remerkleable
Start off by retrieving deposit logs from the Prysm deposit contract, use your favorite Web3 library. (if in doubt, ethers.js and web3.py are great).
Now that you can see the deposits on the Eth1 side, it is time to find them on the Eth2 side.
With the API of Prysm, you can extract blocks with their respective Eth1 votes. Prysm has a public API RPC endpoint ([Docs here](https://api.prylabs.network/)), if you do not want to try participating in the testnet yourself already.
- Each Eth2 block votes for an Eth1-data.
- Each Eth2 block includes deposits, if there are any unprocessed deposits left.
Combine the two deposit information sources, and you can hack together an Eth1-deposit focused monitoring service!
### Forkchoice client :fork_and_knife:
#### What you learn
If you like to get some practical understanding of POS forkchoice in Eth2, and want to give your own approach a try, this is the challenge. You learn how GHOST works, what "LMD" means, and how it connects together into the Eth2 fork-choice.
LMD-GHOST, a.k.a. latest-message-driven greedy-heaviest-observed-subtree. You can find a good explanation here: https://vitalik.ca/general/2018/12/05/cbc_casper.html
Collection of LMD-GHOST implementations for Eth2: https://github.com/protolambda/lmd-ghost/
LMD-GHOST proto-array; optimized design, recently implemented by Lighthouse and Prysmatic. Stateful, simple representation, fast lookups, fast batched propagation of state changes. [Python](https://github.com/protolambda/eth2-py-hacks/blob/ec9395d371903d855e1488d04b8fe89fd5be0ad9/proto_array.py), [Rust](https://github.com/sigp/lighthouse/blob/master/eth2/proto_array_fork_choice/src/proto_array.rs)
Advanced: if you want to dive really deep, take a look at this paper: https://arxiv.org/abs/2003.03052
An Eth2 "client", but not an active participant or dealing with storage, only following the fork-choice. Or in other words, a "readonly client". Use the network tooling to stay in sync with minimal dependencies!
- Experiment with forkchoice
- If you need a placeholder for forkchoice internals you can try one of the existing proto-array implementations. The Python implementation should be easy to customize to your own needs.
- Or, if you like to dive into the algorithm, you can implement your own:
- You will learn a lot from implementing one of the existing approaches, so feel free to try that first.
- Extract data directly from a testnet
- The REPL can join gossipsub and fetch blocks from other nodes.
- You can instrument the REPL with Python, try [Pyrum](github.com/protolambda/pyrum). [Example here](https://github.com/protolambda/eth2-py-hacks)
- Or call its functionalities from Go
- Or re-use client code (varying usability as library)
- Or ask a client instance for blocks/attestations, if you only want to implement forkchoice. Hint: Lighthouse has a nice websocket feed.
- Apply the data to forkchoice, show head changes as new data comes in!
- Take an incoming attestation
- Get the corresponding committee information. (validator indices matching each bit in the attestation bitfield)
- Update latest-messages: you track the latest vote (compare epoch) of each validator. The weight of this vote (stake of validator, or simplify and give everyone a weight of 1) is what adds up to decide which block has more weight.
- Navigate the tree, and find the head, following LMD-GHOST.
- Challenge extension; the sourced data and fork-choice is a great start for explorer features. E.g. show an attestation, visualize the graph of blocks.
### multi-node setup :computer: :computer:
#### What you learn
If you have some interest in running in an Eth2 validator node, this is the challenge to get your hands dirty with. Learn how to run a validator node, and how its API works.
Into the rabbit hole of Beacon Node <> Validator client separation...
To run thousands of validators, block production has to be managed, and signatures have to be managed.
- The validator client notifies a beacon node of its presence, and its validator IDs
- The beacon-node tells the validator about its beacon-chain duties; when to propose, committee membership, etc.
- The beacon-node produces an unsigned block on validator proposals
- The validator signs and returns the block for propagation
- Two different block proposals for the same slot are BAD.
- The beacon-node produces attestation data to sign
- The validator signs and returns the attestation for propagation
- Two conflicting votes are BAD. Variants: double vote (same epoch, different data), or surround (conflicting justification). [See spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#is_slashable_attestation_data)
- The validator client holds the signing key, and protects itself from making double signatures
Validator API: https://github.com/ethereum/eth2.0-APIs/blob/master/apis/validator/README.md
For a practical example where load-balancing was misunderstood, and things went wrong, see this Prysm testnet event: https://twitter.com/raulitojordan/status/1242176262904430592
When running two validator nodes, you really do not want the nodes to publish conflicting votes!
Adapt the validator API to:
- Connect to multiple beacon nodes for redundancy, but ensure that no conflicting votes are requested (multiple duties assigned by the different nodes)
- Connect multiple validator nodes to these different beacon nodes for redundancy, but ensure they do not vote for different things.
You can scale up just the first point, or also implement the second, to challenge yourself and implement redundancy with double-vote protection.
No double voting or it gets slashed!
- More uptime, replication
- More distribution channels for attestations and blocks
- Less susceptible to eclipse or DOS attacks
### State-sync-only client with Python :snake:
#### What you learn
Like Python? Want to learn more about running the spec, yet prefer the speed of a client like lighthouse? Try this: learn about optimization efforts so far, and utilize tooling to script sync. Combine the best of both worlds. No uncontrollable big client, but as powerful, with just some scripting.
After this challenge you know better how the Phase 0 state-transition function works, and how clients stay in consensus.
For sync, state-transition optimizations are important, as that bottlenecks the speed of the sync.
Eth2 optimizations first started with a Go implementation focused on implementing the spec in an optimized form. This later evolved in ZRNT: github.com/protolambda/zrnt
However, this was done in a time where everyone was on a minimal configuration: a super tiny state!
Testnets have grown to as big as 100.000 validators, and the state does not have the same characteristics anymore.
To transition fast, and sync fast, memory has to be managed efficiently: no copies, no rollbacks. This is where [persistent/immutable data-structures](https://en.wikipedia.org/wiki/Persistent_data_structure) are great for!
This concept of "data sharing" is greatly under-utilized currently, but there is a Python library ready for you to use: https://github.com/protolambda/remerkleable
The Eth2 spec (dev branch) recently moved to utilize this, but is otherwise still the same slow naive implementation. (Except for some memoization tricks)
A second take has been made in ["`fast_spec.py`"](https://github.com/protolambda/eth2-py-hacks/blob/master/fast_spec.py), porting back pre-computation optimizations from ZRNT to Python.
Now, with this, and some [Py-libp2p](github.com/libp2p/py-libp2p) or [Pyrum](github.com/protolambda/pyrum) you fetch blocks from the network, and keep a local state in sync, without client!
Keep in sync with a small/medium testnet, with minimal code!
The syntax is plain python and nice and readable, but with the right algorithmic optimizations and data-sharing, the transition can be faster than the average Eth2 client!
Start off with the spec, fast-spec, and Pyrum in a python virtual environment:
git clone https://github.com/protolambda/eth2-py-hacks
# Open a venv to install the dependencies in
python3 -m venv venv
pip install -r requirements.txt
From here you can run functions as defined in the spec to get an idea of the state-transition. See `minimal_transition.py` for a basic example.
Next you can collect the genesis state of the Lighthouse testnet (v0.10) and extract some blocks with Pyrum (included in example: `app.py`). You can find all testnet data here: https://github.com/eth2-clients/eth2-testnets/tree/master/lighthouse/testnet5
Load state and blocks with remerkleable (`deserialize` or `from_bytes`, provided as class-methods on each spec type), and try a transition.
Once you have a working transition, you can try swapping out parts:
- Try optimizing spec functions in `fast_spec.py` more. The epoch-transition can still improve. Eth2-docs diagrams the optimized epoch-transition: https://github.com/protolambda/eth2-docs
- Customize the transition: `app.py` is just a simple loop to fetch blocks and update state.
- You can make state-snapshots to continue from when restarting your sync script
- As you sync, write statistics of the finalized data to CSV or another format. Python is great for data-science!
With this put together well, you can be syncing state like a client, in just a few hundred lines of Python :clap:
And, if you like to work with other project ideas:
- The forkchoice read-only client challenge complements this well: if you like to learn *what to stay in sync with*, instead of trusting your peer, you need forkchoice.
- To trust where validators are coming from, you need Eth1-data. The eth1-data tracking challenge shows how a client learns about that.
- If you like the python scripting, building a malicious sync could be a fun way to test the resilience of an Eth2 client.
### DOS/eclipse testnets :imp:
#### Whay you learn
Learn how the Eth2 network looks like, and where we are actively working to harden its weaknesses. By trying to attack a network, you see it in action first-hand, and the results may be useful to inform future Eth2 networking improvements.
Resilience in a p2p network is important. GossipSub, peer-scoring and DOS protections are put in place to resist attacks. However, clients are imperfect and in a public testnet phase. Time to try and play with them!
Things you can look at:
- DOS: input vectors that stall/crash a testnet. Think of e.g. RPC sync hardening, or unlimited spamming of GossipSub control messages.
- Eclipsing: get on the network, become the network (per the view of a honest target node), and coordinate to make it misbehave.
- Network-partitioning: separate the network in two or more partitions, and monitor how it recovers. Hint: closely related to consensus liveness proofs.
P2p spec: https://github.com/ethereum/eth2.0-specs
Networking REPL: github.com/protolambda/rumor
Python scripting for the REPL: https://github.com/protolambda/pyrum
Lighthouse Denver testnet (realistic, yet small and temporary enough to attack): https://lighthouse-book.sigmaprime.io/become-a-validator.html
Prysm public testnet (realistic, more difficult to attack, but chaotic fun...): https://prylabs.net/
- Use the networking REPL and/or clients to model an attack, and show how the testnet deals with it or how it fails
- Set up your own testnet, and push limits such as latency, node stability and malicious inputs to learn about testnet vulnerabilities.
Also think of fun inputs or outputs:
- Inject non-existing blocks or attestations onto the gossip and see what happens
- Join one of the networks as a validator, and use its key to sign and publish an invalid block or attestation. Too big? Pointing to missing data? Be creative!
- Track what messages you get from other nodes, try to map out their identity.
### Eth2 client fishtank :tropical_fish:
#### What you learn
Learn how to run different Eth2 clients, and explore interopability challenges.
*Sometimes software interactions are fascinating to watch.* https://xkcd.com/350/
So what happens when you put clients together anyway?
Clients have been trying to interop for a long time now, with varying success. It all started at the interop lock-in event where everything kind-of talked to each other, but far from stable.
Since then the client have improved, and some are on the same networking spec.
Lighthouse, Nimbus, Prysm, Teku have a chance at networking together. And the REPL can help debug things this time.
Can you put the clients together, and build a fishtank with one or more client types, and nodes running a network?
Tip: start of by pushing a client onto one of the public testnets, and check configuration/fork-version/spec-version.
Public testnets data: https://github.com/eth2-clients/eth2-testnets
### Slashing detection :triangular_flag_on_post:
In Eth2, a 100.000, possibly 400.000 validator consensus entities are signing attestations. And each of them is shuffled into a new committee every epoch, and signs some source/target data for FFG consensus.
If you want to keep things simple, think of an attestation as vote, with data `(source: int, target: int, ID: bytes32)`, by a single validator.
For `a.ID != b.ID` (i.e. two different attestations by the same validator):
- Surround votes: `a.source < b.source && b.target < a.target`
- Double votes: `a.target == b.target`
These two ways of voting (attesting) are objectively bad, as the 2nd attestation contradicts the 1st and vice versa. This is slashable behavior, which we can catch and enact a slashing for on-chain.
However, for POS to work, the votes need to be attributable and slashed if malicious. To do so, we need to *find* the offences in this see of activity. Not an easy task!
Some naive numbers:
- Compare a new attestation against 8 months of data, to cover the maximum weak-subjectivity period. We want to slash as far back as we are able to.
- `8 months * 30 days * 24 hours * 60 minutes * 60 seconds / 12 second block time / 32 slots = 54000 epochs` worth of voting data
- Deal with ~400.000 validators
- 3 cases to check:
- Surround vote: surrounded-by, surrounding
- Double vote
- Naively 8 bytes per event
- `3*54000*400000*8 = 518.4 GB`
Now, to improve this. Hypothetical best so far somewhere in the `2 - 3 GB` range.
- github.com/protolambda/eth2-surround (Focus on bottom section: min-max surround)
- Good write-up by Mikhail (TXRX research team): https://hackmd.io/nuLL7lHeQtSYV2G6g-ry0A
Also, matching speed is important too, as you want to match an attestation as quick as possible to save resources.
Read through the proposed optimizations, think of improvements, and try and implement the first slashing-detection. (Prysm has one, but not nearly as efficient yet). If you like you can just start with the min-max span approach, and iterate on that.
### Beacon-chain state-sync :arrow_down:
#### What you learn
Learn how the state in Eth2 is represented as a binary tree, and how communicating branches of this tree can work as simple state-sync proof of concept.
As Eth1x develops other things like Beamsync are being pioneered by clients such as Nethermind and Trinity. Similarly to a light-client, state is fetched on demand, but then gradually fills in a full state.
This can be replicated in Eth2, and more efficiently so! The Beacon-chain provides a structured way to proof batches of state and block-roots, and then maintains a history of batch-roots to compress state.
Additionally, SSZ is a clear binary-tree merkle format, supporting multi-proofs, partials and other fancy light-client toolbox items.
SSZ spec draft: github.com/protolambda/eth2.0-ssz/
Networking ReqResp: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#the-reqresp-domain
SSZ python library, with multi-proof/partials support, backing the current pyspec: https://github.com/protolambda/remerkleable
Hack a simple merkle-proof serialization format together, put it on Req/Resp (p2p RPC), and draft a (partial) state sync.
Binary trees work exceptionally well with diffing: find common sub-state, and not sync/repeat that.
And not all state may be required, a "partial" may let you act as if the full thing is there, but only use the parts that are provided in the proof.
Remerkleable has an experimental "virtual node" feature, to lazy-load more sub-state. It could be used to write a slow but fully "beam sync" on-demand state object.
Alternatively, a "merry-go-round" sync approach (inspired by Eth1x sync plans of Piper), would be fun: split the state into N messages, and gossip those on a special "merry-go-round" topic one after another. Then listen on this topic for a whole round of data, and become synced.
The state in Eth2 beacon-chain is relatively tiny (finalized state size is linear with the amount of validators, not the history), so experimenting with sync should be fast and fun.
### Open challenge :zap:
I am not here to say what to do, but to help you get into the exciting parts of Phase0! If you have your own ideas for projects, or want to spend more time on spec things, I am happy to help you get set up!
## Phase1 Challenges
Phase 1 of Eth2 has a nearly ready specification, but generally client implementers are focused on Phase0, and research on Phase2. If you have research questions about the Custody Game (data-availability in Eth2), shards as data for stateless Phase2, or feel like building something experimental beyond Phase0, let us know!
Phase 1 takes some more niche understanding, but if you are into it and like a challenge we love to help you :clap:
## Phase1.5 Challenges
There is an active effort to implement execution on Eth2 before a Phase2 ships. Have a look at [this roadmap diagram by Vitalik](https://twitter.com/VitalikButerin/status/1240365047421054976).
Part of the challenge here is how Eth1 functionalities are bridged into Eth2.
- Play with account abstraction
- Make your favorite client read blocks from a local source (as a placeholder for an Eth2 node that pulls down the shard blocks!)
- Implement and experiment with [the witness format of Alexey and Igor](https://github.com/ethereum/stateless-ethereum-specs/pull/1)
- Be creative, phase1.5 is experimental!
## Phase2 Challenges
If you like to learn what an "Execution Environment" is, and how running something on Eth2 is going to work, Phase2 is the place! Phase2 is in a research phase, but unlike Phase1 there is lots of stuff to build on and experiment with (Credits to the Quilt team).
Topics of interest:
- Account abstraction
- [DSA vs SSA]](https://ethresear.ch/t/automated-detection-of-dynamic-state-access-in-solidity/7003) (Dynamic/Static State Access. A.k.a. state provisioning to block proposers)
- Witness formats for stateless execution
To learn about Phase2, start by [reading this Ethresearch post](https://ethresear.ch/t/a-short-history-and-a-way-forward-for-phase-2/6982) describing history, progress and challenges.
Then, join the [Phase 2 / Quilt Telegram](https://t.me/eth2quilt) to get looped in with Phase 2 researchers and devs, who can give you starter challenges to hack on. Phase 2 is moving fast, but there are always smaller things you can take on as a beginner.