owned this note
owned this note
Published
Linked with GitHub
# Paul's EVMn (Including EVM384) Update
THIS DRAFT IS STILL BEING UPDATED. AN EARLY COPY WAS REQUESTED BY CLIENT DEVS.
In this update, I share an EIP draft, some recent breakthroughs since [the last update](https://notes.ethereum.org/@poemm/evm384-update5), and a comparison with precompile gas costs and runtimes.
## EIP Draft
The current EIP draft: https://notes.ethereum.org/@poemm/eip_draft_evm_modular_arithmetic_extensions
I am still working on simplifications, and to make it more future-proof. This is an iterative process which includes implementing heavy crypto. And sometimes, breakthroughs are made.
## Recent Breakthroughs With EVMn
**Breakthrough 1. (DoS mitigation and smaller bytecode.)** A [security analysis](https://notes.ethereum.org/@poemm/EVM384SecurityAnalysis) demonstrated a cache thrashing attack. This is mitigated by switching from four-byte to two-byte offsets, which limits addressable memory to 64kb. Smaller offsets resulted in smaller bytecode -- allowing more optimizations like loop unrolling!
**Breakthrough 2. (Immediates without versioning.)** It has been [claimed](https://ethereum-magicians.org/t/eip-663-unlimited-swap-and-dup-instructions/3346/10) that multi-byte opcodes (including immediates) require versioning. This has been [refuted](https://discord.com/channels/595666850260713488/706868829900505180/817455521341636649). The solution is elegant: we wrap extra bytes in a `PUSH*` _after_ the opcode. This `PUSH*` is never executed, it just acts as a syntactic guard to prevent clobbering `JUMPDEST`s. Immediates are a major optimization because they allow us to avoid stack overhead.
**Breakthrough 3. (Abstraction to arbitrary algorithms and bitlengths.)** A fourth opcode, `SETMOD`, is introduced to store the modulus and precompute some parameters. `SETMOD` adds statefulness, but allows (i) arbitrary algorithms (including subquadratic runtime), (ii) precomputing things to speed up runtime, (iii) trivial bitlength abstraction (including 256-bit, 384-bit and 768-bit), (iv) fewer runtime checks, (v) even moduli, and (vi) smaller bytecode size. `SETMOD` adds about 50 lines of C code, plus client boilerplate.
**Breakthrough 4. (Faster memory copying.)** Cryptography is full of memcopy operations. Earlier prototypes did memcopies through the EVM stack, at high gas cost. Jared Wasinger had a breakthrough suggestion to use `ADDMONT` for memcopies. This only works for copying ring elements (smaller than the modulus), which is the common case in crypto! This has reduced gas costs greatly!
**Breakthrough 5. (EVM-specific optimizations.)** In EVM, we can precompute things off-chain and pass them as calldata. See [this ethresearch post](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606). This is still being explored.
## A side-by-side comparison of EIP-2537 precompiles and EVMn Gas Costs
EIP-2537 gas costs are from: https://eips.ethereum.org/EIPS/eip-2537
EVMn gas costs are from executing tests in: https://github.com/poemm/EVMcurves/tree/master/tests (These gas costs may have <0.5% error.) (TODO: upload tests for map-to-g1 and ecadds.)
---
| | Precompile Gas Cost | EVMn gas costs | Slowdown | Notes |
|---| --- | --- | --- | --- |
| `BLS12_PAIRINGS` | 108,000 | 240,570* | 2.23x | (*) The EVMn implementation does not include subgroup checks, a 15% overhead, but many cryptosystems don't need these checks. The EVMn miller loop can be optimized by 25% with line precomputation, which many cryptosystems allow e.g. BLS signatures, PLONK, and KZG. Other EVM optimizations may cut another 5%. |
| `BLS12_MAP_FP_TO_G1` | 5,500 | 14,513 | 2.64x | The EVMn implementation can be cut to 11,619 if we skip the final jacobian->affine conversion, which many cryptosystems allow. A [precomputation trick for sqrt](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606) can further reduce costs to ~9,500 gas. Other EVM optimizations may cut another 5%. |
| `BLS12_MAP_FP2_TO_G2` | 75,000 | - | - | The algorithm has similar structure to `BLS12_MAP_FP_TO_G1`, so maybe the slowdown will be similar, and similar optimizations will be possible. |
| `BLS12_G1ADD` | 500 | 3504 | 7.08x | Most of the EVMn cost is jacobian->affine conversion, which many cryptosystems can omit -- removing this conversion gets us down to 567 gas. Most of this remaining gas is setup cost, which we can amortize to get us _below_(!) the precompile cost. Other optimizations are possible if we know that it is a point doubling (i.e. same input) or an add of distinct points. |
| `BLS12_G2ADD` | 800 | 4456 | 5.57x | Similar to `BLS12_G1ADD`. Without the jacobian->affine conversion, it costs 1406 gas, some of which is startup cost. Similar optimizations are possible as with `G1ADD`. |
| `BLS12_G1MUL` | 12,000 | - | - | This precompile is not needed, it is a special case of `BLS12_G1MULTIEXP`. |
| `BLS12_G2MUL` | 45,000 | - | - | This precompile is not needed, it is a special case of `BLS12_G2MULTIEXP`. |
| `BLS12_G1MULTIEXP` | (a formula) | - | - | EVMn can use [precomputation tricks for multi-scalar mul](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606). Plus GLV optimizations. I expect only a small slowdown relative to the precompile. |
| `BLS12_G2MULTIEXP` | (a formula) | - | - | EVMn can use [precomputation tricks for multi-scalar mul](https://ethresear.ch/t/surveying-precomputation-methods-in-cryptography-request-for-help/8606). Plus GLV-GLS optimizations. I expect only a small slowdown relative to the precompile. |
**Table 1.** Comparison of gas costs between EIP-2537 and EVM384.
---
## Benchmarks
Besides gas, runtimes were also measured. These implementations are "hacky" (dummy `SETMOD`, hard-coded modulus, using blst's asm), but give an idea of what performance clients can expect, relative to the speed-record implementation blst, when they use fast implementations of the montgomery arithmetic.
---
| | miller loop | final exponentiation | slowdown to blst |
|---|---|---|---|
|blst | 379us | 504us | 1x
|evmone | 686us | 989us | 1.9x
|geth | 1660us | 2190us | 4.3x
**Table 2.** Runtimes on an i5-5300u (2.9GHz, launched Q1 2015). Numbers can be reproduced in [geth](https://gist.github.com/poemm/d3fd461da50b8981bfabe4068b82456c), [evmone](https://gist.github.com/poemm/56dc176ff9fe957cb2de4ff93c8eff5e), and [blst](https://gist.github.com/poemm/671dd777e023ab9bf740d4fc70400374).
---
EVMn numbers can be improved in the following ways:
- The opcode boilerplate (checks for memory bounds, offset preparation) can be optimized.
- Cryptosystem-specific EVM optimizations, which aren't available to precompiles. (See table 1.)
- Other EVM optimizations, which were not done yet for simplicity. (See table 1.)