Inside the Shadow-Geth Grant: A Personal Journey into Making Ethereum Faster

# Inside the Shadow-Geth Grant: A Personal Journey into Making Ethereum Faster *by jsvisa* *September 2 2025* *You can find the full report here: https://hackmd.io/@jsvisa/geth-pebble-grant-report* Running a full Ethereum node has always appealed to me because it embodies the spirit of decentralization — you verify everything yourself and help secure the network. Go-Ethereum (Geth) is the most widely used execution client, so its performance determines how viable it is for everyday node runners, researchers, and infrastructure providers. This is my story as an external contributor funded by an Ethereum Foundation grant to work on **PebbleDB performance, state size analysis, and code improvements**. It was a rare chance to peek under the hood of Geth, experiment at massive scale, and push forward the limits of what the client can do. ![](https://notes.ethereum.org/_uploads/SJW9__Eqgx.png) ## Why Benchmark and Analyse Geth? The grant’s aim was simple: **measure where Geth struggles with very large databases and fix it**. Over time the Ethereum state has ballooned as more contracts, tokens and users join the network. By the start of 2025 the database was hundreds of gigabytes and projected to keep growing at ~48 GiB per year. Rather than rely on gut feelings, I wanted concrete data, so the project was split into four phases: 1. **Pebble performance benchmarking** – test Geth’s default database engine (Pebble) with datasets from 100 GB to 3 TB, build custom benchmark tools and identify bottlenecks. 2. **Chain height and state size analysis** – track how the state trie, storage and code have grown over the years and correlate growth with network events. 3. **Collaborative development with Bloatnet** – adapt Geth for large‑scale state testing and contribute features needed by the Bloatnet project. 4. **Pull requests and code contributions** – upstream performance improvements directly into Geth. What follows is a recap of each phase and how they collectively make Geth faster and more scalable. ## Phase 1: PebbleDB Benchmarking and Tooling The journey began with a clear goal: understand how Pebble, Geth’s underlying key-value database, behaves under the extreme demands of Ethereum. **Building the tools** My first step was to build a testing harness for Pebble. I wrote two utilities, [pdb‑writebench and pdb‑readbench](https://github.com/jsvisa/goleveldb-bench/tree/pebble), to simulate Geth’s write and read workloads across databases ranging from 100 GB to 3 TB. The benchmarks recorded throughput, latency and disk I/O, and exported Prometheus metrics so I could visualise performance in Grafana. A Docker Compose stack tied everything together, and Node Exporter gathered system statistics. With these tools in place it was easy to spin up tests and observe how performance changed as the database grew. **Key findings** The benchmarks revealed that **database reads dominate block processing time**. When importing real mainnet blocks on Geth v1.15.2 the average time to process a block was 78.8 ms, and over half of that (%53) was spent reading state from disk. Storage reads alone consumed 22 ms (34 % of the time) and account reads another 12.3 ms (19 %). Reads were roughly 3 100× more frequent than writes, with ~3130 successful reads per second and ~2800 failed reads, compared to only 61 writes per second. Latency analysis highlighted how much slower Pebble gets as the database grows. The table below summarises my findings (averages are shown when the report gave ranges): ![](https://notes.ethereum.org/_uploads/H1xpsO_N5ee.png) At **100 GB**, read‑only queries take about **149 µs**, while mixed read‑write workloads average **341 µs**. Scaling to **500 GB** pushes read‑only latency to around **232 µs** and mixed latency to **467 µs**, roughly a **1.5× slowdown**. At the extreme **3 TB** dataset, latencies balloon to **392 µs** (read‑only) and **658 µs** (read‑write). These degradations occur because background compaction competes with foreground reads and writes; stopping compaction temporarily reduced average latency from 2130 µs to 383 µs. Other insights included: - **Cache efficiency is low** – only 3.14 % of account reads and 8.79 % of storage reads hit the on‑disk database; most data was served from the diff layer or fast cache. - **Failed reads matter** – reading non‑existent keys wastes effort. The benchmarks showed a significant portion of reads were misses, causing extra overhead. - **Small tweaks help** – adding custom metrics to Pebble and building Grafana dashboards made it much easier to see and fix issues. **Lessons learned** This phase taught me that improving Geth’s performance isn’t just about switching databases; it’s about reducing read amplification and optimising code paths. Many of the later pull requests arose directly from these insights. I also gained a deep appreciation for the engineering behind Pebble — compaction algorithms, caching strategies and write batching all interplay in subtle ways. ## Phase 2: Chain Height and State Size Analysis While benchmarking tells you where the current bottlenecks are, it doesn’t tell you why the database gets so large in the first place. In phase 2 I built a [**live tracer**](https://github.com/ethereum/go-ethereum/pull/31914) into Geth to record state diffs for every block. This tracer, developed on top of Gary B’s state‑float branch, streamed changes into JSON and added **state size metrics** directly into the client. I ran a full sync from genesis and generated monthly aggregated datasets, then built visualisations to track how the state evolved over time. **What drives state growth?** The data showed that **trienodes are the main driver of growth**. Over the lifetime of Ethereum they accounted for **62 %** of total state expansion (~238 GB of 383 GB). Trienode growth averaged **2.56 GiB** per month in 2024 and is projected to reach **20.02 GiB** for 2025. By contrast, plain state data (storage slots and account balances) grows at roughly 1.14 GiB/month, and code growth is modest at 0.34 GiB/month. The stacked chart below visualises these categories year‑by‑year. ![](https://notes.ethereum.org/_uploads/Syhnd_Ecxe.png) Three patterns stood out: 1. **Network events leave fingerprints** – the ICO boom of 2017, early DeFi between 2018‑2020, the 2020–2022 DeFi Summer/NFT frenzy and the rise of Layer‑2s since 2022 each correspond to surges in state growth. 2. **Growth is accelerating** – although trienode growth dominates, the combination of state, trie and code is trending upward at ~48 GiB per year. 3. **Empty reads slow us down** – many trie accesses are for keys that no longer exist. By optimising how Geth handles these misses we can reduce wasted work. These insights have direct implications for node operators. Storage requirements will continue to increase, so efficient pruning and indexing matter more than ever. Performance optimisation yields compounding benefits as the state grows. ## Phase 3: Collaboration with the Bloatnet Project The grant initially included a plan to perform real‑world tests on a shadow fork of mainnet designed to stress Geth with enormous state sizes. Instead of reinventing the wheel, I joined forces with core developers already building [bloatnet](https://bloatnet.info), the state growth testing network. We collaborated on a changes to Geth tailored for large‑state experiments. My contributions included: - [#542](https://github.com/gballet/go-ethereum/pull/542) – add support for running Geth with a --bloatnet override flag so it can join the testing network. - [#545](https://github.com/gballet/go-ethereum/pull/545) – introduce state size metrics to measure how the state grows during tests. - [#546](https://github.com/gballet/go-ethereum/pull/546) – fix a panic that occurred when handling exceptionally large state sizes. These features allowed the testing team to track memory consumption and isolate failures during stress tests. While this phase was shorter than planned, it emphasised the importance of teamwork and sharing tools across projects. ## Phase 4: Pull Requests and Upstream Contributions Armed with benchmark data and state insights, I turned to the Geth codebase itself. Over eight weeks I opened more than two dozen pull requests across several categories: **Merged contributions include:** - **Performance**: rewrote parts of the PathDB prefetcher, optimized filtermaps indexing, and reduced I/O stalls. - **Metrics**: exposed Pebble cache hit ratios, state prefetcher stats, and compaction counters for Grafana dashboards. - **CLI & UX**: new flags for state scheme selection, better defaults for dev/test usage. - **Monitoring tools**: improved block import tracing to distinguish DB read/write paths. Each PR was reviewed by Geth maintainers, who provided deep feedback and merged them after iterations. For me, this was where the grant transformed into real impact. **Ongoing work** Not all improvements are merged yet. Several **work‑in‑progress pull requests** add caching for historic states, remove unused accumulators, index trienodes in pathdb, prevent deep reorganisation below the oldest available state, implement partial state indexing in parallel and expose additional state size metrics. These patches will further reduce latency and memory usage once completed. ## Reflections on the Grant This project has been a very rewarding one. By digging deep into Geth’s state management I learned how the structure of the trie influences storage and performance. The exploration taught me that careful instrumentation, data‑driven analysis and incremental improvements are powerful tools for optimizing complex systems. The Ethereum Foundation’s grants program isn’t just about money; it’s about enabling contributors like me to take risks, try things at scale, and then merge the best results upstream. Working alongside the Geth team: - I learned low-level database internals. - I built benchmarks others now rely on. The improvements we shipped — better benchmarking, state analytics, Bloatnet support and numerous code enhancements — should make running a node smoother and more future‑proof. I hope this journey inspires other developers to dive into the code, share their findings and continue pushing the boundaries of what Geth can do.

Read more

Client Summit

Workflow A: Head-of-Chain Execution & Tracing — Replica Plan

Scaling RPC Nodes Under Rising Gas Limits

7002 withdrawal schedule