EOF - Container simplification

# EOF - Container simplification [toc] The EOF EIPs were originally written in a sequential manner so that it would be possible to apply them individually in separate forks. With the lastest revelations around a fuller EOF-suite as an atomic package of EIPs, there are some decisions to reconsider. In this document, we will look at a few paths to simplifying the type and code header sections. ## Goals When considering the different approaches, the following attributes should be considered: * size -- although not the absolute most important thing, it is also critical. Superfluous data should be avoided unless it directly benefits parsing. * parsing simplicity -- even at the cost of a few extra bytes, parsing complexity should be kept low. This is important code in clients and the simpler, the easier to implement and audit. * usability -- the header should be descriptive enough that a lot of information can be quickly ascertained, i.e. if the total size is correct, the offset of certain code section, the offset of a certain code section's type definition, etc. Operations that occur frequently. * extensibility -- want to have optional section so we can functionality without bumping the version. WebAssembly has a version as a backup, but generally doesn't expect to update - instead just instroduce new section kinds. * verkle trees -- reads ~128 byte chunks, so streaming header contradicts this a bit as more chunks would need to be read. First byte of verkle chunk is used to point to first instruction (in case first few bytes are actually immediate push data). Don't need this anyone because jumps are validated at deploy. * compression -- resuse type defs? RLE (requires grouping functions by type sig)? ### Functions metadata - index -- used by CALLF - code size - code offset in container - type :num inputs/outputs (2 bytes) - validation - max stack height (2 bytes) - validation and execution Insight: - Large number of functions should be present (16 to 64 on an average contract) - Many functions could have a very similar signature - Some attributes are used only in validation ### Evaluation Create a data set and define metrics on which works the best: 1) Simple contract (1 function) 2) Average contract (64 functions, 1/3rd of the functions have identical type) 3) Lots of helpers (>256 functions, random in/outputs) Metrics: - Code size - How many 128-byte chunks is the header taking Can use https://github.com/ipsilon/eof/tree/eof-rs/eof-rs as the tool for this. --- ## A few attempts ### Status Quo Currently, the type and code sections are completely separate with a few overall validation criteria to ensure containers are well formed. Those checks mostly center around verifying that related header fields line up with other entries (e.g.`type_section_size == len(code_sections)*2`) ##### Header ``` magic || version || [ kind_type || size ]? || [ kind_code || size ]+ || [kind_data || data_size ]? || terminator || type_section? || code_section+ || data_section? ``` ##### Parser This parser is not able to make many assumptions about the EOF header, so it needs to constantly validate whether certain sections have already been read. In the alternative solutions, we'll see that the parser is able to avoid this generally by doing length checks. <details> <summary>Code</summary> ```go= func parse(b []byte) (*Container, error) { if !isEOF(b) { return nil, fmt.Errorf("not an eof container") } var ( c Container i = 3 err error typeRead int codeRead int dataRead int ) outer: for i < len(b) { switch int(b[i]) { case kindTerminator: i += 1 break outer case kindType: // Type section header must be read first. if codeRead != 0 || dataRead != 0 { return nil, fmt.Errorf("type section header not first") } // Only 1 type section is allowed. if typeRead != 0 { return nil, fmt.Errorf("expected only 1 type section") } // Size must be present. if c.typeSize, err = parseSectionSize(b, i+1); err != nil { return nil, fmt.Errorf("unable to read type section size") } // Type section size must not be 0. if c.typeSize == 0 { return nil, fmt.Errorf("type section size must not be zero") } typeRead += 1 case kindCode: // Size must be present. var size uint16 size, err = parseSectionSize(b, i+1) if err != nil { return nil, fmt.Errorf("unable to read code section size") } // Size must not be 0. if size == 0 { return nil, fmt.Errorf("code section size must not be zero") } c.codeSize = append(c.codeSize, size) codeRead += 1 case kindData: // Data section is allowed only after code section. if codeRead == 0 { return nil, fmt.Errorf("unexpected data section before code") } // Only 1 data section is allowed. if dataRead != 0 { return nil, fmt.Errorf("unable to read data section size") } // Size must be present. if c.dataSize, err = parseSectionSize(b, i+1); err != nil { return nil, fmt.Errorf("data section size must not be zero") } // Data section size must not be 0. if c.dataSize == 0 { return nil, fmt.Errorf("data section size must not be zero") } dataRead += 1 default: return nil, fmt.Errorf("unknown section kind: %d", b[i]) } i += 3 } // Must have type section if more than one code section. if len(c.codeSize) > 1 && c.typeSize == 0 { return nil, fmt.Errorf("type section missing but required") } // If type section, ensure type section size is 2n the number of code sections. if c.typeSize != 0 && len(c.codeSize) != int(c.typeSize/2) { return nil, fmt.Errorf("type section size does not match number of code sections") } // Parse type section. if c.typeSize != 0 { for j := 0; j < int(c.typeSize); j += 2 { sig := Annotation{ input: int(b[i+j]), output: int(b[i+j+1]), } c.types = append(c.types, sig) } if c.types[0].input != 0 || c.types[0].output != 0 { return nil, fmt.Errorf("first code section must have 0 inputs and 0 outputs") } i += int(c.typeSize) } // Parse code sections. for _, size := range c.codeSize { c.codeOffsets = append(c.codeOffsets, i) i += int(size) } // Parse data section. if c.dataSize != 0 { c.dataOffset = i } c.data = b return &c, nil } ``` </details> ### Array of Code Sections A natural simplification of the status quo is to condense the code section headers into a single header. This can be done by requiring the type section and deducing the size of the code section header. ##### Header ``` magic || version || type_section_size_size || code_section_size+ || [ kind_data || data_section_size ]? || terminator || type_section || code_section+ || data_section? ``` ##### Parser <details> <summary>Code</summary> ```go= func parse(b []byte) (*Container, error) { if len(b) < 7 || !isEOF(b) { return nil, fmt.Errorf("not an eof container") } var ( c Container i = 5 err error ) // Read type section size. c.typeSize, err = parseSectionSize(b, 3) if err != nil { return nil, fmt.Errorf("unable to read type size") } if len(b) < int(3+c.typeSize/2) { return nil, fmt.Errorf("not enough code section sizes") } // Read code section sizes. for j := 0; j < int(c.typeSize/2); j++ { size, err := parseSectionSize(b, 5+(2*j)) if err != nil { return nil, fmt.Errorf("unable to parse code section size") } c.codeSize = append(c.codeSize, size) } i += int(c.typeSize) // Read data section size (if present). if b[i] != 0x0 { if b[i] != 0x3 { return nil, fmt.Errorf("unknown section kind") } c.dataSize, err = parseSectionSize(b, i+1) if err != nil { return nil, fmt.Errorf("unable to parse data section size") } // Data section size must not be 0. if c.dataSize == 0 { return nil, fmt.Errorf("data section size must not be zero") } i += 3 } // Parse type section. if c.typeSize != 0 { if len(b) <= int(c.typeSize)+i { return nil, fmt.Errorf("invalid type size") } for j := 0; j < int(c.typeSize); j += 2 { sig := Annotation{ input: int(b[i+j]), output: int(b[i+j+1]), } c.types = append(c.types, sig) } if c.types[0].input != 0 || c.types[0].output != 0 { return nil, fmt.Errorf("first code section must have 0 inputs and 0 outputs") } i += int(c.typeSize) } i += 1 // Parse code sections. for _, size := range c.codeSize { c.codeOffsets = append(c.codeOffsets, i) i += int(size) } // Parse data section. if c.dataSize != 0 { c.dataOffset = i } return &c, nil } ``` </details> ### Embedded Type Section It's also possible to just embed the type section within the code section. This is done by prefixing the code with two bytes to denote the inputs and outputs, similar to the type section encoding. ##### Header ``` magic || version || num_code_sections || code_section_size+ || terminator || [ type_info || code_section ]+ || data_section? ``` ##### Parser <details> <summary>Code</summary> ```go= func parseEmbeddedTypeSection(b []byte) (*Container, error) { if len(b) < 7 || !isEOF(b) { return nil, fmt.Errorf("not an eof container") } var ( c Container i = 5 err error ) // Read the number of code sections. n, err := parseSectionSize(b, 3) if err != nil { return nil, fmt.Errorf("unable to read type size") } if int(3+n*2+1) >= len(b) { return nil, fmt.Errorf("invalid number of code sections") } // Read code section sizes. for j := 0; j < int(n); j++ { size, err := parseSectionSize(b, 5+(2*j)) if err != nil { return nil, fmt.Errorf("unable to parse code section size") } if size < 2 { return nil, fmt.Errorf("code size too small to for allow type info") } c.codeSize = append(c.codeSize, size) } i += int(2 * n) // Read data section size (if present). if b[i] != 0x0 { if b[i] != 0x3 { return nil, fmt.Errorf("unknown section kind") } c.dataSize, err = parseSectionSize(b, i+1) if err != nil { return nil, fmt.Errorf("unable to parse data section size") } // Data section size must not be 0. if c.dataSize == 0 { return nil, fmt.Errorf("data section size must not be zero") } i += 3 } i += 1 // Parse code sections. for _, size := range c.codeSize { if len(b) < int(size)+i { fmt.Println(i, c.codeSize, size, len(b)) return nil, fmt.Errorf("invalid code size") } sig := Annotation{ input: int(b[i]), output: int(b[i+1]), } c.types = append(c.types, sig) if c.types[0].input != 0 || c.types[0].output != 0 { return nil, fmt.Errorf("first code section must have 0 inputs and 0 outputs") } c.codeOffsets = append(c.codeOffsets, i+2) i += int(size) } // Parse data section. if c.dataSize != 0 { c.dataOffset = i } return &c, nil } ``` </details> ### Section kinds with arrays Extend the original EOF encoding by defining if it is followed by a scalar size or array of sizes. The array is encoded as 2-byte `<array_length>` followed by the content. - 01: code section kind with array of sizes - 02: data section kind with scalar size - 03: type section kind with scalar size Alternative 1: scalar vs array can be indicated by the top bit in the section kind. In the case code section kind is `0x81`. Alternative 2: always require the array. This wastes 2 bytes in case the array is to have single value. Benefits over original EOF encoding: - section kind for code sections not duplicated, - sections of different kinds cannot be mixed: in the original EOF you can imagine "code, data, code, data, data, code"; here all code sections must be next to each other if a particular section kind can occur only once. ##### Header ``` magic || version || type_section_kind || type_section_size || code_section_kind || num_code_sections || code_section_size* || data_section_kind || data_section_size || terminator ``` ### Multi-level header Separate the header into static and dynamic components. Benefits over original EOF encoding: - parsing static section is straightforward since each element is of the form `kind || size` ##### Header ``` magic || version || type_section_kind || type_section_size || code_section_kind || num_code_sections || data_section_kind || data_section_size || terminator || code_section_kind || code_section_size* ```

Read more

Partial history expiry tweet thread

Partial history expiry announcement

Preferences for Osaka

EL R&D