Loading ..

EVM384 Specs

Specs as Addenda to Yellowpaper Appendix H

We use notation from the yellowpaper.

value Mnemonic delta alpha Description
0xc0 ADDMOD384 (below) (below) (below)
0xc1 SUBMOD384 (below) (below) (below)
0xc2 MULMODMONT384 (below) (below) (below)

Recall from the yellowpaper that delta is number of elements popped from stack and alpha is number of elements pushed to stack. In the descriptions below, we replace yellowpaper symbols 𝛍s, 𝛍m, and 𝛍i with stack, memory, and memoryLength, respectively. In particular, stack[0] refers to the first item popped from the stack, stack[1] is the second item popped from the stack, and so on. stack'[0] and stack'[1] are the top and second-to-top item pushed to the stack by the operation. memory[i..i+48] refers to the memory bytes from byte i to i+48. memoryLength is the current number of 32-byte chunks in memory. Finally, function M(current_memoryLength, start_byte_offset, bytelength_accessed) is the β€œmemory expansion for range function” from the yellowpaper H.1.

Interface Version 0

Intuitively, version 0 operates on values in the stack and doesn’t touch memory.

ADDMOD384 | alpha = 6 | delta = 2 | Description:

x = (stack[0] | stack[1])[0..48]
y = (stack[2] | stack[3])[0..48]
mod = (stack[4] | stack[5])[0..48]

stack'[0] | stack'[1] =  = [ x + y        if x+y < mod
                           [ x + y - mod  otherwise

Where | is concatenation.

Where x, y, and mod are in little-endian and the result is little-endian.

SUBMOD384 | alpha = 6 | delta = 2 | Description:

x = (stack[0] | stack[1])[0..48]
y = (stack[2] | stack[3])[0..48]
mod = (stack[4] | stack[5])[0..48]

stack'[0] | stack'[1] =  = [ x - y        if x-y >= 0
                           [ x - y + mod  otherwise

Where | is concatenation.

Where x, y, and mod are in little-endian and the result is little-endian.

MULMODMONT384 | alpha = 6 | delta = 2 | Description:

x = (stack[0] | stack[1])[0..48]
y = (stack[2] | stack[3])[0..48]
mod[0:48] | r_inv[0:8] = ((stack[4] | stack[5])[0..58]

stack'[0] | stack'[1] = (x * y * r_inv) % mod

Where | is concatenation.

Where the multiplication operation is montgomery multiplication, even if modulus is even.

Where x, y, mod, and r_inv are in little-endian and the result is little-endian.

Interface Version 1

Intuitively, version 1 stack items are offsets to memory memory, and the output clobbers the first input value.

ADDMOD384 | alpha = 3 | delta = 0 | Description:

x = memory[stack[0]..(stack[0]+48)]
y = memory[stack[1]..(stack[1]+48)]
mod = memory[stack[2]..(stack[2]+48)]

memory[stack[0]..(stack[0]+48)] = [ x + y        if x+y < mod
                                  [ x + y - mod  otherwise
                                   
Where x, y, and mod are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

SUBMOD384 | alpha = 3 | delta = 0 | Description:

x = memory[stack[0]..(stack[0]+48)]
y = memory[stack[1]..(stack[1]+48)]
mod = memory[stack[2]..(stack[2]+48)]
memory[stack[0]..(stack[0]+48)] = [ x - y        if x-y >= 0
                                   [ x - y + mod  otherwise

Where x, y, and mod are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

MULMODMONT384 | alpha = 4 | delta = 0 | Description:

x = memory[stack[0]..(stack[0]+48)]
y = memory[stack[1]..(stack[1]+48)]
mod = memory[stack[2]..(stack[2]+48)]
r_inv = stack[3]
memory[stack[0]..(stack[0]+48)] = (x * y * r_inv) % mod

Where the multiplication operation is montgomery multiplication, even if modulus is even.
                                   
Where x, y, mod, and r_inv are in little-endian, and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

Interface Versions 2 and 3

Intuivively, the stack items are offsets to values in memory. Note: Interface 3 is interface 2, but, unsafely, omits memoryLength checks and updates.

ADDMOD384 | alpha = 4 | delta = 0 | Description:

x = memory[stack[1]..(stack[1]+48)]
y = memory[stack[2]..(stack[2]+48)]
mod = memory[stack[3]..(stack[3]+48)]

memory[stack[0]..(stack[0]+48)] = [ x + y        if x+y < mod
                                  [ x + y - mod  otherwise
                                   
Where x, y, and mod are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

SUBMOD384 | alpha = 4 | delta = 0 | Description:

x = memory[stack[1]..(stack[1]+48)]
y = memory[stack[2]..(stack[2]+48)]
mod = memory[stack[3]..(stack[3]+48)]
memory[stack[0]..(stack[0]+48)] = [ x - y        if x-y >= 0
                                  [ x - y + mod  otherwise

Where x, y, and mod are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

MULMODMONT384 | alpha = 5 | delta = 0 | Description:

x = memory[stack[1]..(stack[1]+48)]
y = memory[stack[2]..(stack[2]+48)]
mod = memory[stack[3]..(stack[3]+48)]
r_inv = stack[4]

memory[stack[0]..(stack[0]+48)] = (x * y * r_inv) % mod

Where the multiplication operation is montgomery multiplication, even if modulus is even.
                                   
Where x, y, mod, and r_inv are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

Interfaces Versions 4 and 5

Intuitively, this is version 2 and 3 but no stack items for mod and inv, they are somehow hardcoded. Version 5 is version 4, but, unsafely, omits memoryLength checks and updates.

ADDMOD384 | alpha = 3 | delta = 0 | Description:

x = memory[stack[1]..(stack[1]+48)]
y = memory[stack[2]..(stack[2]+48)]
mod = hard-coded value for modulus

memory[stack[0]..(stack[0]+48)] = [ x + y        if x+y < mod
                                  [ x + y - mod  otherwise
                                   
Where x, y, and mod are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

SUBMOD384 | alpha = 3 | delta = 0 | Description:

x = memory[stack[1]..(stack[1]+48)]
y = memory[stack[2]..(stack[2]+48)]
mod = hard-coded value for modulus
memory[stack[0]..(stack[0]+48)] = [ x - y        if x-y >= 0
                                  [ x - y + mod  otherwise

Where x, y, and mod are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

MULMODMONT384 | alpha = 3 | delta = 0 | Description:

x = memory[stack[1]..(stack[1]+48)]
y = memory[stack[2]..(stack[2]+48)]
mod = mod = hard-coded value for modulus
r_inv = hard-coded value for r_inv

memory[stack[0]..(stack[0]+48)] = (x * y * r_inv) % mod

Where the multiplication operation is montgomery multiplication, even if modulus is even.
                                   
Where x, y, mod, and r_inv are in little-endian and the result is little-endian.

memoryLength = M( M( M( memoryLength, stack[0], 48 ), stack[1], 48 ), stack[2], 48 )

Interface Version 6

Intuitively, the beginning of memory has 256 virtual registers, each 48 bytes long. For example, memory[0..48] is the 0th register and memory[12240..12288] is the 255th register. The 48-byte modulus must be in memory[12288..12336] and the 8-byte inverse must be in memory[12336..12344]. Each bigint opcode takes a single stack item whose least-significant bytes correspond to virtual register indicies of inputs and outputs.

ADDMOD384 | alpha = 1 | delta = 0 | Description:

out_idx = stack[0][29] i.e. the third least-significant byte in the top stack item in big-endian
x_idx = stack[0][30]   i.e. the second least-significant byte in the top stack item in big-endian
y_idx = stack[0][31]   i.e. the least-significant byte in the top stack item in big-endian

x = memory[x_idx*48..x_idx*48+48]
y = memory[y_idx*48..y_idx*48+48]
mod = memory[12288..12336]


memory[out_idx*48..out_idx*48+48] = [ x + y        if x+y < mod
                                    [ x + y - mod  otherwise

Where x, y, and mod are in little-endian and the result is little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( memoryLength, 12288, 48 )

SUBMOD384 | alpha = 1 | delta = 0 | Description:

out_idx = stack[0][29]
x_idx = stack[0][30]
y_idx = stack[0][31]

x = memory[x_idx*48..x_idx*48+48]
y = memory[y_idx*48..y_idx*48+48]
mod = memory[12288..12336]


memory[out_idx*48..out_idx*48+48] = [ x - y        if x-y >= 0
                                    [ x - y + mod  otherwise

Where x, y, and mod are in little-endian and the result is little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( memoryLength, 12288, 48 )

MULMODMONT384 | alpha = 1 | delta = 0 | Description:


out_idx = stack[0][29]
x_idx = stack[0][30]
y_idx = stack[0][31]

x = memory[x_idx*48..x_idx*48+48]
y = memory[y_idx*48..y_idx*48+48]
mod = memory[12288..12336]
r_inv = memory[12336..12344]

memory[out_idx*48..out_idx*48+48] =  (x * y * r_inv) % mod
Where the multiplication operations together are montgomery multiplication, even if modulus is even.

Where x, y, r_inv, and mod are in little-endian and the result is little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( memoryLength, 12288, 56 )

Interface Version 7

Intuitively, this is 6 with packed 4-byte offsets instead of register indices, including an offset to mod and inv

ADDMOD384 | alpha = 1 | delta = 0 | Description:

out_offset = stack[0][16:20]
x_offset = stack[0][20:24]
y_offset = stack[0][24:28]
mod_offset = stack[0][28:32]  i.e. least-significant bytes of the big-endian stack item.

Where each offset is a big-endian unsigned 32-bit integer.

x = memory[x_offset:x_offset+48]
y = memory[y_offset:y_offset+48]
mod = memory[mod_offset:mod_offset+48]

memory[out_offset:out_offset+48] = [ x + y        if x+y < mod
                                   [ x + y - mod  otherwise

Where x, y, mod, and result are integers of at-most 384-bits stored in memory as little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( M( M( M( memoryLength, x_offset, 48 ), y_offset, 48 ), out_offset, 48 ), mod_offset, 48 )

SUBMOD384 | alpha = 1 | delta = 0 | Description:

out_offset = stack[0][16:20]
x_offset = stack[0][20:24]
y_offset = stack[0][24:28]
mod_offset = stack[0][28:32]

Where each offset is a big-endian unsigned 32-bit integer.

x = memory[x_offset:x_offset+48]
y = memory[y_offset:y_offset+48]
mod = memory[mod_offset:mod_offset+48]

memory[out_offset:out_offset+48] = [ x - y        if x-y >= 0
                                   [ x - y + mod  otherwise

Where x, y, mod, and result are integers of at-most 384-bits stored in memory as little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( M( M( M( memoryLength, x_offset, 48 ), y_offset, 48 ), out_offset, 48 ), mod_offset, 48 )

MULMODMONT384 | alpha = 1 | delta = 0 | Description:

out_offset = stack[0][16:20]
x_offset = stack[0][20:24]
y_offset = stack[0][24:28]
mod_and_inv_offset = stack[0][28:32]

Where each offset is a big-endian unsigned 32-bit integer.

x = memory[x_offset:x_offset+48]
y = memory[y_offset:y_offset+48]
mod = memory[mod_rinv_offset:mod_rinv_offset+48]
r_inv = memory[mod_rinv_offset+48:mod_rinv_offset+56]

memory[out_offset:out_offset+48] =  (x * y * r_inv) % mod
Where the multiplication operation is montgomery multiplication, even if modulus is even.

Where x, y, mod, and result are integers of at-most 384-bits stored in memory as little-endian. Similarly, mod_r is an integer of at-most 64-bits stored in memory as little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( M( M( M( memoryLength, x_offset, 48 ), y_offset, 48 ), out_offset, 48 ), mod_and_inv_offset, 56 )

Interface Version 8

Like version 7, but using big-endian encoding for 384-bit values in memory.

Interface Version 9

Like version 7, but takes an immediate instead of a stack item.

ADDMOD384 | alpha = 1 | delta = 0 | Description:

input = c(𝛍pc+1,𝛍pc+2,...,𝛍pc+16)
where c() is defined in the yellowpaper when defining PUSH*.

out_offset = input[16:20]
x_offset = input[20:24]
y_offset = input[24:28]
mod_offset = input[28:32]  i.e. least-significant bytes of the big-endian stack item.

Where each offset is a big-endian unsigned 32-bit integer.

x = memory[x_offset:x_offset+48]
y = memory[y_offset:y_offset+48]
mod = memory[mod_offset:mod_offset+48]

memory[out_offset:out_offset+48] = [ x + y        if x+y < mod
                                   [ x + y - mod  otherwise

Where x, y, mod, and result are integers of at-most 384-bits stored in memory as little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( M( M( M( memoryLength, x_offset, 48 ), y_offset, 48 ), out_offset, 48 ), mod_offset, 48 )

SUBMOD384 | alpha = 1 | delta = 0 | Description:

input = c(𝛍pc+1,𝛍pc+2,...,𝛍pc+16)
where c() is defined in the yellowpaper when defining PUSH*.

out_offset = input[16:20]
x_offset = input[20:24]
y_offset = input[24:28]
mod_offset = input[28:32]

Where each offset is a big-endian unsigned 32-bit integer.

x = memory[x_offset:x_offset+48]
y = memory[y_offset:y_offset+48]
mod = memory[mod_offset:mod_offset+48]

memory[out_offset:out_offset+48] = [ x - y        if x-y >= 0
                                   [ x - y + mod  otherwise

Where x, y, mod, and result are integers of at-most 384-bits stored in memory as little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( M( M( M( memoryLength, x_offset, 48 ), y_offset, 48 ), out_offset, 48 ), mod_offset, 48 )

MULMODMONT384 | alpha = 1 | delta = 0 | Description:

input = c(𝛍pc+1,𝛍pc+2,...,𝛍pc+16)
where c() is defined in the yellowpaper when defining PUSH*.

out_offset = input[16:20]
x_offset = input[20:24]
y_offset = input[24:28]
mod_and_inv_offset = stack[0][28:32]

Where each offset is a big-endian unsigned 32-bit integer.

x = memory[x_offset:x_offset+48]
y = memory[y_offset:y_offset+48]
mod = memory[mod_rinv_offset:mod_rinv_offset+48]
r_inv = memory[mod_rinv_offset+48:mod_rinv_offset+56]

memory[out_offset:out_offset+48] =  (x * y * r_inv) % mod
Where the multiplication operation is montgomery multiplication, even if modulus is even.

Where x, y, mod, and result are integers of at-most 384-bits stored in memory as little-endian. Similarly, mod_r is an integer of at-most 64-bits stored in memory as little-endian.

Must also check memory length and possibly grow the memory.
memoryLength = M( M( M( M( memoryLength, x_offset, 48 ), y_offset, 48 ), out_offset, 48 ), mod_and_inv_offset, 56 )

Interface version 10

Interface 10 is like interface 7, but with two bytes per offset, instead of four. This is to reduce the address space, to reduce the possibility of DoS attacks investigated here.

Interface version 11

Recall Yellowpaper notation:


value Mnemonic 𝛅 𝛂 Description Gas cost
0xc0 ADDMONT 0 0 (below) 2 gas plus 1 gas after each 8 limbs
0xc1 SUBMONT 0 0 (below) 2 gas plus 1 gas after each 8 limbs
0xc2 MULMONT 0 0 (below) 2 gas plus floor((n^2)/8) gas for n limbs

Table 1. Data for Yelowpaper H.2, plus gas cost formulas.


The above gas cost formulas will go in Yellowpaper appendix H.1 function C(), but we also give some explicit costs for quick reference.


num_64bit_limbs: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 32 64 128 256
ADDMONT 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 5 9 17 33
SUBMONT 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 5 9 17 33
MULMONT 2 2 3 4 5 6 8 10 12 14 17 20 23 26 30 34 130 514 2050 8194
Table 2. Gas costs for certain numbers of 64-bit limbs.

Finally, the opcode descriptions.

ADDMONT description:

immediate = c(𝛍.pc+1,𝛍.pc+2,...,𝛍.pc+10)
where c() (defined with PUSH*) handles overflow as zeros.

Assert immediate[0] == 0x68 (i.e. the leftmost byte is PUSH9), otherwise halt with exception

n = 1 + immediate[1]
n is interpreted as the number of 8-byte words for our big integers.

out_offset = immediate[2..3]
x_offset = immediate[4..5]
y_offset = immediate[6..7]
mod_offset = immediate[8..9]
Each offset is interpreted as a little-endian integer.

x = memory[x_offset..x_offset+n*8]
y = memory[y_offset..y_offset+n*8]
mod = memory[mod_offset..mod_offset+n*8]
x, y, and mod are interpreted as little-endian integers.

memory[out_offset..out_offset+n*8] = [ x + y        if x+y < mod
                                     [ x + y - mod  otherwise

The result is stored in memory in little-endian, up to n*8 bytes.

Memory length is checked and possibly grown.
memoryLength = M( memoryLength, max(x_offset,y_offset,mod_offset,out_offset), n*8 )

SUBMONT description:

immediate = c(𝛍.pc+1,𝛍.pc+2,...,𝛍.pc+10)
where c() (defined with PUSH*) handles overflow as zeros.

Assert immediate[0] == 0x68 (i.e. the leftmost byte is PUSH9), otherwise halt with exception

n = 1 + immediate[1]
n is interpreted as the number of 8-byte words for our big integers.

out_offset = immediate[2..3]
x_offset = immediate[4..5]
y_offset = immediate[6..7]
mod_offset = immediate[8..9]
Each offset is interpreted as a little-endian integer.

x = memory[x_offset..x_offset+n*8]
y = memory[y_offset..y_offset+n*8]
mod = memory[mod_offset..mod_offset+n*8]
x, y, and mod are interpreted as little-endian integers.

memory[out_offset..out_offset+n*8] = [ x - y        if x-y >= 0
                                     [ x - y + mod  otherwise

The result is stored in memory in little-endian, up to n*8 bytes.

Memory length is checked and possibly grown.
memoryLength = M( memoryLength, max(x_offset,y_offset,mod_offset,out_offset), n*8 )

MULMONT description:

immediate = c(𝛍.pc+1,𝛍.pc+2,...,𝛍.pc+10)
where c() (defined with PUSH*) handles overflow as zeros.

Assert immediate[0] == 0x68 (i.e. the leftmost byte is PUSH9), otherwise halt with exception

n = 1 + immediate[1]
n is interpreted as the number of 8-byte words for our big integers.

out_offset = immediate[2..3]
x_offset = immediate[4..5]
y_offset = immediate[6..7]
mod_offset = immediate[8..9]
Each offset is interpreted as a little-endian integer.

x = memory[x_offset..x_offset+n*8]
y = memory[y_offset..y_offset+n*8]
mod = memory[mod_offset..mod_offset+n*8]
neg_mod_inv = memory[mod_offset+n*8..mod_offset+n*8+8]
x, y, mod, and neg_mod_inv are interpreted as a little-endian integers.

memory[out_offset..out_offset+n*8] =  REDC_multiprecision(x*y,mod,neg_mod_inv,n)
Where the intermediate multiplication operation result may have double the bits.

# Python code:
def REDC_multiprecision(T,N,Nprime,n):
  # following the original [Montgomery85, section 2] notation and algorithm
  # T is the bigint value to be reduced
  # N is the bigint modulus
  # Nprime is the 64-bit montgomery parameter
  # n is the number of limbs of the modulus

  b = 2**64   # the base

  # convert inputs to limbs in base b, least-significant limb first
  T_=[0]*(2*n+1)  # with extra limb for carries
  for i in range(2*n):  T_[i] = T%b;  T = T//b
  N_=[0]*n
  Nbigint = N  # must not destroy N since need it later
  for i in range(n):    N_[i] = Nbigint%b;  Nbigint = Nbigint//b

  # main loop
  c = 0
  for i in range(n):
    # from [Montgomery85]: (d T_{i+n-1} ... T_i)_b <- (0 T_{i+n-1} ... T_i)_b + N*(T_i*N' mod R)
    TixNprime = (T_[i]*Nprime) % b
    d = 0
    for j in range(n):
      temp = T_[i+j] + N_[j]*TixNprime + d
      T_[i+j] = temp % b
      d = temp // b
    # from [Montgomery85]: (c T_{i+n})_b <- c + d + T_{i+n}
    temp = c + d + T_[i+n]
    T_[i+n] = temp % b
    c = temp // b
  T_[2*n] += c

  # convert result T_ back to bigint, shifting (ignoring) lowest n limbs
  assert T==0
  for i in range(n+1): T += T_[n+i] * b**i

  # finally, subtract the modulus if we exceed it
  if T>=N:
    return T-N
  else:
    return T

The result is stored in memory in little-endian, up to n*8 bytes.

Memory length is checked and possibly grown.
memoryLength = M( memoryLength, max(x_offset,y_offset,mod_offset+8,out_offset), n*8 )

Jumpdest analysis

Finally, we modify Yellowpaper section 9.4.3 function N(i,w) to jump forward from bytecode offset i to i+11 if opcode w is ADDMONT, SUBMONT, or MULMONT.

Note that this means that the PUSH9 opcode in the immediate data right after *MONT opcodes is not actually executed, it is just there to not break backwards compatibility with immediates clobbering a JUMPDEST.

Interface 12

Like interface 11, except we have an extra opcode, SETMOD, which stores a new machine state entry, 𝛍 .mc, the modulus context, with entries 𝛍.mc.modulus and 𝛍.mc.length, where the length counts number of 64-bit limbs. At each transaction, the modulus context is initialized to 0 length, but it is not reset at each contract call.


value Mnemonic 𝛅 𝛂 Description Gas cost
0xc3 SETMOD 2 0 (below) 2 gas plus n*ceil(log2(n)) for n 64-bit limbs

Table 1. Data for Yelowpaper H.2, plus a gas cost formula.


The above gas cost formulas will go in Yellowpaper appendix H.1 function C(), but we also give some explicit costs for quick reference.


num_64bit_limbs: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 32 64 128 256
SETMOD 3 6 8 14 17 20 23 34 38 42 46 50 54 58 62 82 194 450 1026 2306
Table 2. Gas costs for certain numbers of 64-bit limbs.

SETMOD description:

offset = 𝛍.s[0]		i.e. top stack item
length = 𝛍.s[1]		i.e. second stack item
offset and length are interpreted as big-endian integers.

If length>2^14 (this may be increased later), only pop stack items and go to the next opcode.

𝛍.mc.length = ceil((max_{0<=i<length} 𝛍.m[offset:offset+i]!=0x00)/8), i.e. the most-significant byte index's 64-bit limb index
𝛍.mc.modulus = 𝛍.m[offset..offset+8*mu.mc.length]

The modulus is interpreted as a little-endian integer.

Implementer's note:
For odd modulus, it may be wise to also precompute things for fast `MULMONT`.
Using notation in [Montgomery85], N=modulus and R=2^(bytelength), the value to precompute is N' which satisfies R*Rinv-N*N'=1.
(Note, Rinv doesn't need to be computed.) This value N' can be computed using the extended Euclidean algorithm (TODO: details). 
Note that N' exists because R and N are coprime. 
Note that when using a multi-precision montgomery multiplication algorithm, it may be sufficient to compute just the least-significant 64-bit limb of N'. 
Store this N' in optional field mu.mc.N'.

ADD/SUB/MULMONT

These are similar, but they read 7 immediate bytes starting with 0x66 (i.e. push7), they don’t read mod from memory, instead using the modulus 𝛍.mc.modulus. Another important difference is that the number of 64-bit limbs for inputs and outputs equals the number of limbs for 𝛍.mc.modulus.

MULMONT is much easier to specify:

If mu.mc.modulus is odd:
𝛍.m[out_offset..out_offset+𝛍.mc.length*8] =  (x*y*Rinv) % mod
Where the intermediate multiplication operation may have more bits.
Where Rinv is the value which satisfies RRinv-NN'=1, bute we may never have to compute it.

If mu.mc.modulus is even: don't write anything to memory. (This may be changed.)

JUMPDEST Analysis

Finally, the jumpdest jumps forward 7 instead of 11 bytes for interface 11.

Interface 13

This is interface 12 plus covering the case for even modulus.

This is a WIP. Most recent draft is: https://notes.ethereum.org/@poemm/eip_draft_evm_modular_arithmetic_extensions