Paul's EVMn (Including EVM384) Update

# Paul's EVMn (Including EVM384) Update THIS DRAFT IS STILL BEING UPDATED. AN EARLY COPY WAS REQUESTED BY CLIENT DEVS. In this update, I share an EIP draft, some recent breakthroughs since [the last update](https://notes.ethereum.org/@poemm/evm384-update5), and a comparison with precompile gas costs and runtimes. ## EIP Draft The current EIP draft: https://notes.ethereum.org/@poemm/eip_draft_evm_modular_arithmetic_extensions I am still working on simplifications, and to make it more future-proof. This is an iterative process which includes implementing heavy crypto. And sometimes, breakthroughs are made. ## Recent Breakthroughs With EVMn **Breakthrough 1. (DoS mitigation and smaller bytecode.)** A [security analysis](https://notes.ethereum.org/@poemm/EVM384SecurityAnalysis) demonstrated a cache thrashing attack. This is mitigated by switching from four-byte to two-byte offsets, which limits addressable memory to 64kb. Smaller offsets resulted in smaller bytecode -- allowing more optimizations like loop unrolling! **Breakthrough 2. (Immediates without versioning.)** It has been [claimed](https://ethereum-magicians.org/t/eip-663-unlimited-swap-and-dup-instructions/3346/10) that multi-byte opcodes (including immediates) require versioning. This has been [refuted](https://discord.com/channels/595666850260713488/706868829900505180/817455521341636649). The solution is elegant: we wrap extra bytes in a `PUSH*` _after_ the opcode. This `PUSH*` is never executed, it just acts as a syntactic guard to prevent clobbering `JUMPDEST`s. Immediates are a major optimization because they allow us to avoid stack overhead. **Breakthrough 3. (Abstraction to arbitrary algorithms and bitlengths.)** A fourth opcode, `SETMOD`, is introduced to store the modulus and precompute some parameters. `SETMOD` adds statefulness, but allows (i) arbitrary algorithms (including subquadratic runtime), (ii) precomputing things to speed up runtime, (iii) trivial bitlength abstraction (including 256-bit, 384-bit and 768-bit), (iv) fewer runtime checks, (v) even moduli, and (vi) smaller bytecode size. `SETMOD` adds about 50 lines of C code, plus client boilerplate. **Breakthrough 4. (Faster memory copying.)** Cryptography is full of memcopy operations. Earlier prototypes did memcopies through the EVM stack, at high gas cost. Jared Wasinger had a breakthrough suggestion to use `ADDMONT` for memcopies. This only works for copying ring elements (smaller than the modulus), which is the common case in crypto! This has reduced gas costs greatly! **Breakthrough 5. (EVM-specific optimizations.)** In EVM, we can precompute things off-chain and pass them as calldata. See [this ethresearch post](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606). This is still being explored. ## A side-by-side comparison of EIP-2537 precompiles and EVMn Gas Costs EIP-2537 gas costs are from: https://eips.ethereum.org/EIPS/eip-2537 EVMn gas costs are from executing tests in: https://github.com/poemm/EVMcurves/tree/master/tests (These gas costs may have <0.5% error.) (TODO: upload tests for map-to-g1 and ecadds.) --- | | Precompile Gas Cost | EVMn gas costs | Slowdown | Notes | |---| --- | --- | --- | --- | | `BLS12_PAIRINGS` | 108,000 | 240,570* | 2.23x | (*) The EVMn implementation does not include subgroup checks, a 15% overhead, but many cryptosystems don't need these checks. The EVMn miller loop can be optimized by 25% with line precomputation, which many cryptosystems allow e.g. BLS signatures, PLONK, and KZG. Other EVM optimizations may cut another 5%. | | `BLS12_MAP_FP_TO_G1` | 5,500 | 14,513 | 2.64x | The EVMn implementation can be cut to 11,619 if we skip the final jacobian->affine conversion, which many cryptosystems allow. A [precomputation trick for sqrt](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606) can further reduce costs to ~9,500 gas. Other EVM optimizations may cut another 5%. | | `BLS12_MAP_FP2_TO_G2` | 75,000 | - | - | The algorithm has similar structure to `BLS12_MAP_FP_TO_G1`, so maybe the slowdown will be similar, and similar optimizations will be possible. | | `BLS12_G1ADD` | 500 | 3504 | 7.08x | Most of the EVMn cost is jacobian->affine conversion, which many cryptosystems can omit -- removing this conversion gets us down to 567 gas. Most of this remaining gas is setup cost, which we can amortize to get us _below_(!) the precompile cost. Other optimizations are possible if we know that it is a point doubling (i.e. same input) or an add of distinct points. | | `BLS12_G2ADD` | 800 | 4456 | 5.57x | Similar to `BLS12_G1ADD`. Without the jacobian->affine conversion, it costs 1406 gas, some of which is startup cost. Similar optimizations are possible as with `G1ADD`. | | `BLS12_G1MUL` | 12,000 | - | - | This precompile is not needed, it is a special case of `BLS12_G1MULTIEXP`. | | `BLS12_G2MUL` | 45,000 | - | - | This precompile is not needed, it is a special case of `BLS12_G2MULTIEXP`. | | `BLS12_G1MULTIEXP` | (a formula) | - | - | EVMn can use [precomputation tricks for multi-scalar mul](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606). Plus GLV optimizations. I expect only a small slowdown relative to the precompile. | | `BLS12_G2MULTIEXP` | (a formula) | - | - | EVMn can use [precomputation tricks for multi-scalar mul](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606). Plus GLV-GLS optimizations. I expect only a small slowdown relative to the precompile. | **Table 1.** Comparison of gas costs between EIP-2537 and EVM384. --- ## Benchmarks Besides gas, runtimes were also measured. These implementations are "hacky" (dummy `SETMOD`, hard-coded modulus, using blst's asm), but give an idea of what performance clients can expect, relative to the speed-record implementation blst, when they use fast implementations of the montgomery arithmetic. --- | | miller loop | final exponentiation | slowdown to blst | |---|---|---|---| |blst | 379us | 504us | 1x |evmone | 686us | 989us | 1.9x |geth | 1660us | 2190us | 4.3x **Table 2.** Runtimes on an i5-5300u (2.9GHz, launched Q1 2015). Numbers can be reproduced in [geth](https://gist.github.com/poemm/d3fd461da50b8981bfabe4068b82456c), [evmone](https://gist.github.com/poemm/56dc176ff9fe957cb2de4ff93c8eff5e), and [blst](https://gist.github.com/poemm/671dd777e023ab9bf740d4fc70400374). --- EVMn numbers can be improved in the following ways: - The opcode boilerplate (checks for memory bounds, offset preparation) can be optimized. - Cryptosystem-specific EVM optimizations, which aren't available to precompiles. (See table 1.) - Other EVM optimizations, which were not done yet for simplicity. (See table 1.)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.