# Unified EOF specification [toc] ## Preface This unified specification should be used as a guide to understand the various changes the EVM Object Format is proposing. The individual EIPs still remain the official specification and should confusion arise those are to be consulted: - 📃[EIP-3540](https://eips.ethereum.org/EIPS/eip-3540): EOF - EVM Object Format v1 [_history_](https://github.com/ethereum/EIPs/commits/master/EIPS/eip-3540.md) - 📃[EIP-3670](https://eips.ethereum.org/EIPS/eip-3670): EOF - Code Validation [_history_](https://github.com/ethereum/EIPs/commits/master/EIPS/eip-3670.md) - 📃[EIP-4200](https://eips.ethereum.org/EIPS/eip-4200): EOF - Static relative jumps [_history_](https://github.com/ethereum/EIPs/commits/master/EIPS/eip-4200.md) - 📃[EIP-4750](https://eips.ethereum.org/EIPS/eip-4750): EOF - Functions [_history_](https://github.com/ethereum/EIPs/commits/master/EIPS/eip-4750.md) - 📃[EIP-5450](https://eips.ethereum.org/EIPS/eip-5450): EOF - Stack Validation [_history_](https://github.com/ethereum/EIPs/commits/master/EIPS/eip-5450.md) - 📃[EIP-6206](https://eips.ethereum.org/EIPS/eip-6206): EOF - JUMPF instruction [_history_](https://github.com/ethereum/EIPs/commits/master/EIPS/eip-6026.md) ## Container EVM bytecode is traditionally an unstructured sequence of instructions. EOF introduces the concept of a container, which brings structure to byte code. The container consists of a header and then several sections. ``` container := header, body header := magic, version, kind_types, types_size, kind_code, num_code_sections, code_size+, kind_data, data_size, terminator body := types_section, code_section+, data_section types_section := (inputs, outputs, max_stack_height)+ ``` _note: `,` is a concatenation operator and `+` should be interpreted as "one or more" of the preceding item_ While EOF is extensible, in this document we discuss the first version, EOFv1. #### Header | name | length | value | description | |-------------------|----------|---------------|-------------| | magic | 2 bytes | 0xEF00 | EOF prefix | | version | 1 byte | 0x01 | EOF version | | kind_types | 1 byte | 0x01 | kind marker for types size section | | types_size | 2 bytes | 0x0004-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the type section content | | kind_code | 1 byte | 0x02 | kind marker for code size section | | num_code_sections | 2 bytes | 0x0001-0xFFFF | 16-bit unsigned big-endian integer denoting the number of the code sections | | code_size | 2 bytes | 0x0001-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the code section content | | kind_data | 1 byte | 0x03 | kind marker for data size section | | data_size | 2 bytes | 0x0000-0xFFFF | 16-bit unsigned big-endian integer denoting the length of the data section content | | terminator | 1 byte | 0x00 | marks the end of the header | #### Body | name | length | value | description | |---------------|----------|---------------|-------------| | types_section | variable | n/a | stores code section metadata | | inputs | 1 byte | 0x00-0x7F | number of stack elements the code section consumes | | outputs | 1 byte | 0x00-0x7F | number of stack elements the code section returns | | max_stack_height | 2 bytes | 0x0000-0x03FF | maximum number of elements ever placed onto the stack by the code section | | code_section | variable | n/a | arbitrary sequence of bytes | | data_section | variable | n/a | arbitrary sequence of bytes | ### Container Validation The following validity constraints are placed on the container format: - minimum valid header size is `15` bytes - `version` must be `0x01` - `types_size` is divisible by `4` - the number of code sections must be equal to `types_size / 4` - the number of code sections must not exceed 1024 - `code_size` may not be 0 - the total size of the container must be `13 + 2*num_code_sections + types_size + code_size[0] + ... + code_size[num_code_sections-1] + data_size` ## Execution Semantics Code executing within an EOF environment will behave differently than legacy code. We can break these differences down into i) changes to existing behavior and ii) introduction of new behavior. ### Modified Behavior - Execution starts at the first byte of code section 0, and `pc` is set to 0. - `pc` is scoped to the executing code section - If `pc` exceeds the size of the code section in context, execution aborts with failure. - `CODECOPY`/`CODESIZE`/`EXTCODECOPY`/`EXTCODESIZE`/`EXTCODEHASH` operate on the entire container. - The instructions `CALLCODE`, `SELFDESTRUCT`, `JUMP`, `JUMPI`, `PC` are deprecated and therefore result in an exceptional abort. - The instruction `JUMPDEST` is renamed to `NOP` and remains charging 1 gas without any effect. - Note: jumpdest-analysis is not performed anymore. - Perform validation on initcode. - If validation fails, only deduct the base cost of the op and push 0 to the stack if not part of a create tx. - See validation rules below. - Perform validation on created code. - If validation fails, all creation gas is deducted, similar to exceptional abort during initcode execution, and push 0 to the stack if not part of a create tx. - See validation rules below. - EOF contract may not deploy legacy code ### New Behavior - `RJUMP (0x5c)` instruction - deduct 2 gas - read int16 operand `offset`, set `pc = offset + pc + 3` - `RJUMPI (0x5d)` - deduct 4 gas - pop one value, `condition` from stack - set `pc += 3` - if `condition != 0`, read int16 operand `offeset` and set `pc += offset` - `RJUMPV (0x5e)` - deduct 4 gas - read uint8 operand `count` - pop one value, `case` from stack - set `pc += 2` - if `case >= count` (out-of-bounds case), fall through and set `pc += count * 2` - otherwise interpret 2 byte operand at `pc + case * 2` as int16, call it `offset`, and set `pc += (count * 2) + offset` - introduce new vm context variables - `current_code_idx` which stores the actively executing code section index - new `return_stack` which stores the triple `(code_section`, `pc`, `stack_height)`. - when instantiating a vm context, push an initial value to the stack of `(0,0,0)` - `CALLF (0xb0)` instruction - deduct 5 gas - read uint16 operand `idx` - if `1024 <= len(stack) + types[idx].max_stack_height`, execution results in an exceptional halt - if `1024 <= len(return_stack)`, execution results in an exceptional halt - push new element to `return_stack` `(current_code_idx, pc+3, len(stack) - types[current_code_idx].inputs)` - update `current_code_idx` to `idx` and set `pc` to 0 - `RETF (0xb1)` instruction - deduct 4 gas - pops `val` from `return_stack` and sets `current_code_idx` to `val.code_section` and `pc` to `val.pc` - `JUMPF (0xb2)` instruction - deduct 5 gas - read uint16 operand `section` - if `1024 <= len(stack) + types[idx].max_stack_height`, execution results in an exceptional halt - set `current_code_idx` to `section` - set `pc = 0` ## Code Validation - no unassigned instructions used - the last opcode must be a terminating instruction (pending discussion) - instructions with immediate operands must not be truncated at the end of a code section - the first code section must have a type signature with 0 inputs and 0 outputs - `RJUMP` / `RJUMPI` operands must not point to an immediate operand and may not point outside of code bounds - `RJUMPV` `count` cannot be zero - `CALLF` and `JUMPF` operand may not exceed `num_code_sections` - `JUMPF` operand must point to a code section with equal number of outputs as the section in which it resides - no instruction may access data stack items below `return_stack.top().stack_height`, validated by ensuring function - no instruction may be unreachable - data stack height is the same for all possible code paths going through the instruction - during `CALLF`, the following must hold: `len(stack) >= return_stack.top().stack_height + types[current_code_idx].inputs` - during `RETF`, the following must hold: `len(stack) == return_stack.top().stack_height + types[current_code_idx].outputs` - `JUMPF` validation regarding inputs - maximum data stack must not exceed 1024 - `types[section].max_stack_height` must match the maximum stack height observed during validation - no section may have more than 127 inputs or outputs