Perf track current status

# Perf track current status Short summary per client: **Geth:** - Building blocks with single huge and slow transaction does not work (60M Modexp not included) - ECAdd slow (16 MGas/s - Slowest except Modexp and 2x slower than another one) **Erigon:** - ECAdd slow (16 MGas/s - Slowest except Modexp and 2x slower than another one) - TxDataZero scenario (one big tx with huge amount of zeros in data) is working fine at 30M and 60M level, but failing on 80M and above **Besu:** - Struggled with block production on shadowfork - Modexp Scenarios on perfnet on 60mgas above 3seconds and on 90Mgas already above 4seconds which was causing rewards lose on head votes - TxDataZero scenario (one big tx with huge amount of zeros in data) is working fine at 30M level, but failing on 60M and above **Nethermind:** - Unstable on huge RPC load (crashing and getting blacklisted by peers) - Perf degradation noticed betweek official and master on newPayload which seemed to be resolved but still in place **Reth:** - Significant perf degradation on few opcodes between latest released and performance branch - Occasionally struggling including big transactions during block production (yet to investigate, was an issue on main branch but not anymore on performance) In worst cases analysis for OpCodes initial tests were executed on official latest versions of clients, then moved to latest development branches and finally now all running on `performance` branches and while doing this move we noticed a few regressions/improvements worth exploring. Here to name a few of: **1. Regressions** - *SelfBalance(reth)* FROM **277MGas/s** TO **151MGas/s** - *Transfers(geth)* FROM **641MGas/s** TO **415MGas/s** - *Transfers(reth)* FROM **979MGas/s** TO **715MGas/s** - *IdentityFrom1ByteCACHABLE(reth)* FROM **350MGas/s** TO **257MGas/s** - *Keccak256From32Bytes(reth)* FROM **120MGas/s** TO **94.1MGas/s** - *Keccak256From8Bytes(reth)* FROM **119MGas/s** TO **99.7MGas/s** - *Keccak256From1Byte(reth)* FROM **117MGas/s** TO **99.0MGas/s** - *EcRecoverUNCACHABLE(reth)* FROM **65.1MGas/s** TO **60.5MGas/s** - *Modexp208GasBalancedUNCACHABLE2(reth)* FROM **48.6MGas/s** TO **43.1MGas/s** **2. Improvements** - Reth in all cachable precompiles scenarios with Precompiles Cache implementation default setting - Nethermind struct by major regressions between 1.31.11 and master which was partially addressed but still visible on general block processing performance - Nethermind additional improvements after data analysis across multiple oppcodes (80-130% improvement) - Geth siginificant improvements on "performance" branch - various opcodes from 30-85% improvements - Besu shows strength on warmed scenarios gaining even more than 100% improvement (data collection in progress) Additionally [this table](https://grafana.observability.ethpandaops.io/d/feo4ronhsqv40d/opcodes-benchmarking?orgId=1&from=now-1d&to=now&timezone=browser&var-posgreSQL=benuragv7iuwwb&var-ClientName=$__all&var-TestTitle=$__all&refresh=auto&viewPanel=panel-24) shows that there are significant differences between slowest and fastest ELs on various scenarios which should be also a good ground for sopme breakouts to discuss what can be improved in slower ones to catchup with the fastest ones. There are scenarios like Keccak256From1Byte (biggest one - 16x difference between besu and nethermind - but can be because of lack of proper warming), but also other which are 5-7 times slower between slowest and fastest which are also below 60MGas/s - improving those will make our journey to higher limits much smoother.

Read more

bal-devnet-0 spec

BAL brain dump

epbs-devnet-0 spec

Fusaka Testing checklist: