# EVM384 Update 3
[TOC]
## Introduction
This EVM384 update is a follow-up to the [previous update](https://notes.ethereum.org/@poemm/evm384-interface-update). The discussion thread with more links is [here](https://ethereum-magicians.org/t/evm384-feedback-and-discussion/4533/7).
In this update, we compare runtimes for different endianness. We implement a major part of the pairing algorithm, the miller loop (up to a small part, explained below).
## Interface Endianness
In the [last update](https://notes.ethereum.org/@poemm/evm384-interface-update), we compared interfaces v2 through v7 for EVM384, and settled on interface v7. A remaining question was whether we want a little-endian encoding for EVM384 values. Perhaps we should use big-endian, like the rest of the EVM.
We designed interface v8 to use big-endian values. We modified our v7 bytecode and evmone implementation to use this big-endian interface. Our results follow.
![](https://github.com/ewasm/benchmarking/raw/evm384-update-3-charts/charts/evm384-endianness.png)
**Figure 1.** Runtimes of the f6m_mul synthetic pairing check using EVM384 versions 7 and 8.
Version 8 has a slowdown because it reverses the byte ordering of all 384-bit inputs and outputs of EVM384 opcodes. This 1.18x slowdown is large enough for us to favor v7 over v8. A problem remains: to operate on parts of EVM384 values with big-endian EVM opcodes, one may have to reverse the endianness -- if this becomes a significant bottleneck, a new `byteswap` EVM opcode could alleviate it. Please note however that, as an example, a pairing operation can be implemented on the little-endian interface with no byteswapping if the calldata is already in little-endian.
## A Note on Subgroup Checks and Optimizations
None of our pairing benchmarks, including from previous updates, consider subgroup checks. To be clear this includes the benchmark named `rust-eip1962` as well. This is to make a more apples-to-apples comparison. With EVM384, pairing implementations can sometimes omit subgroup checks for hard-coded inputs which are known to be in the correct subgroup.
But for the curious, to estimate overhead of subgroup checks, we measured that subgroup checks in wasmcurves add ~15% modular arithmetic to the pairing, so we approximate a 15% slowdown for subgroup checks.
To preserve our apples-to-apples comparision, we omit more optimized implementations of evm384 opcodes, in particular our `mulmodmont384` is not implemented in assembly.
## Miller Loop Synthetic Benchmark
Pairing algorithms have two major parts: the miller loop and final exponentiation. For this update, we [implemented](https://gist.github.com/poemm/87969fe13ec5656d63230ac95d56fc36) the miller loop algorithm from [blst](https://github.com/supranational/blst), with the exception of multiplication in $F_p^2$ -- we replace blst's `mul_fp2()` with [wasmcurves'](https://github.com/iden3/wasmcurves) `f2m_mul()`, which seem to be equivalent. The reason we don't use blst's `mul_fp2()` is because its optimizations require 768-bit arithmetic, which would be too expensive in EVM384.
It is known that pairing algorithms may have different intermediate and output values, but still preserve bilinearity allowing agreement on the evaluation of pairing equations. Our algorithm too gives different outputs than blst, but the outputs match when we swapped `mul_fp2()` with `f2m_mul()` in blst itself, so at least we are doing a similar amount of computation.
We remain unsure whether our miller loop implementation is correct, and we may not know until we implement the final exponentiation. But our miller loop implementation gives us a better approximation of the full pairing algorithm. So we design a new synthetic pairing benchmark. We note that the final exponentiation costs around 1.5x of a miller loop, eg [see numbers here](https://github.com/mratsim/constantine/tree/244f58350c795ca9306a1f1a14a7ad42bf7c3579#measuring-performance). And our benchmark is for a pairing equation check which includes two miller loops and one final exponentiation. So our new synthetic pairing algorithm is three miller loops, with the last miller loop having a 1.5x adjustment factor to approximate the final exponentiation. Or a 1.16x adjustment factor overall.
![](https://github.com/ewasm/benchmarking/raw/evm384-update-3-charts/charts/evm384-miller-vs-synth.png)
**Figure 2.** Runtimes of two point pairing check in native, EVM384 synthetic pairing check using miller loop, EVM384 synthetic pairing check using f6m_mul, and wasm pairing check.
In figure 2, observe that we approach the performance of a native implementation by using interpreters, with native code for bottlenecks.
## Conclusion
We continue to seek implementers of cryptographic algorithms in EVM384. We have demonstrated that we can implement (something which resembles) the miller loop. We are open to cryptographers implementing the rest of BLS12-381 pairings, or starting their own implementations for whatever cryptography they need. In-demand algorithms are BLS12-381 operations and BLS12-377 operations.