HackMD
    • Sharing Link copied
    • /edit
    • View mode
      • Edit mode
      • View mode
      • Book mode
      • Slide mode
      Edit mode View mode Book mode Slide mode
    • Note Permission
    • Read
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • Write
      • Only me
      • Signed-in users
      • Everyone
      Only me Signed-in users Everyone
    • More (Comment, Invitee)
    • Publishing
    • Commenting Enable
      Disabled Forbidden Owners Signed-in users Everyone
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Invitee
    • No invitee
    • Options
    • Versions and GitHub Sync
    • Transfer ownership
    • Delete this note
    • Template
    • Save as template
    • Insert from template
    • Export
    • Google Drive Export to Google Drive
    • Gist
    • Import
    • Google Drive Import from Google Drive
    • Gist
    • Clipboard
    • Download
    • Markdown
    • HTML
    • Raw HTML
Menu Sharing Help
Menu
Options
Versions and GitHub Sync Transfer ownership Delete this note
Export
Google Drive Export to Google Drive Gist
Import
Google Drive Import from Google Drive Gist Clipboard
Download
Markdown HTML Raw HTML
Back
Sharing
Sharing Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
More (Comment, Invitee)
Publishing
More (Comment, Invitee)
Commenting Enable
Disabled Forbidden Owners Signed-in users Everyone
Permission
Owners
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Invitee
No invitee
   owned this note    owned this note      
Published Linked with GitHub
Like BookmarkBookmarked
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Everything You Never Wanted To Know About Serialization I've been researching and thinking about serialization for a few weeks now. This document tries to: * Capture what I've learned * Suggest reasonable courses of action. ## Problem Statement When talking about serizalization needs we will be looking at the following two primary use cases. - **Networking**: The act of transporting objects between clients across a network. - **Consensus**: The process of manipulating objects within the logic of the protocol We wish to determine what scheme is appropriate for these use cases in the context of the Eth2.0 network, with consideration taken towards the Eth1.0 network. The Eth1.0 protocol uses RLP for each of these. The Eth2.0 protocol uses SSZ for the *Consensus* use case and does not have a defined scheme for the *Networking* use case. ## Evaluation Criteria In order to make a decision, we must know what the different criteria under which a serialization scheme should be evaluated. For the *Networking* use case I pose the following: TODO: Schemaless at the networking level - **Major**: - **compactness**: How space-efficient is the serialized (bytes) representation of the data. - **expressiveness**: Does the serialization support the data types we use. - **Minor**: - **streamability**: Does the serialization support streaming encoding and decoding For the *Consensus* use case I pose the following criteria. - **Major**: - **expressiveness**: Does the serialization support the data types we use. - **hashing**: Can data structures be efficiently hashed and re-hashed after minor modifications. - **Minor**: - **indexing**: The act of accessing the inner values of a data structure without fully deserializing it. ### On "Expressiveness" of data types This document works under the assumption that we **require** support for the following data types. - Booleans - Unsigned Integers: - 64 bits for beacon chain - 128 bits for shard chain - 256 bits for eth1.0 chain - Fixed length byte strings. - Dynamic length byte strings - Homogenous arrays - Heterogenous struct-likes It's worth noting the following nuances about data types. - Support for fixed-length byte strings allows - adding support for unsigned integers of any size - adding support for booleans ### On "Indexing" A full explanation of this can be found here: https://gist.github.com/karalabe/3a25832b1413ee98daad9f0c47be3632 An optimal serialization scheme would allow: - `O(1)` lookups for elements which are statically sized and who's outer data containers are also statically sized. - `O(log(N))` lookups for elements contained within dynamically sized data containers. ## Candidates Here is the evaluation of various serialization schemes I've looked at. - Protobuf: https://developers.google.com/protocol-buffers/ - MessagePack: https://msgpack.org/ - CBOR: http://cbor.io/ - BSON: TODO: http://bsonspec.org/spec.html - RLP: https://github.com/ethereum/wiki/wiki/RLP - SSZ: https://github.com/ethereum/eth2.0-specs/blob/bed888810d5c99cd114adc9907c16268a2a285a9/specs/simple-serialize.md - SOS: https://gist.github.com/karalabe/3a25832b1413ee98daad9f0c47be3632 - SSS: https://github.com/ethereum/bimini/blob/7c26efec585742ef870bf58ea5d96e2deb242775/spec.md ### Some popular formats that don't suit our needs Protobuf, MessagePack, and CBOR suffer from the same shortfals - No support for integers above 64 bits - No support for fixed size byte strings - No support for fixed size arrays - The *hashing* requirement for eth2.0 would have to be implemented at a second layer. > CBOR does support arbitrary bignums but no defined bit sizes above 64-bit > CBOR contains extraneous metadata in it's serialization scheme that isn't needed by our protocol. ### RLP https://github.com/ethereum/wiki/wiki/RLP #### Compactness RLP can be efficient under the condition that the data structure has relatively few elements. The *extraneous* bytes come from: - need for additional length bytes for fixed length data structures - fixed length byte strings - fixed size integers - lists of known length - structs Experiments show that this inefficiency increases serialized sizes by 5% on most Eth1 data structures, but can be as much as 40% on Eth2 data structures. #### Expressiveness RLP only has first class support for dynamic length byte strings and dynamic sized lists of dynamic length byte strings. All of the additional data types are supported via addidional abstraction layers provided by the various RLP libraries. This allows RLP to support for arbitrary data types but at the cost of a few extra bytes for length prefixes. #### Hashing The current approach to hashing RLP objects does not provide the performance gains needed for the Eth2.0 network. However, it should be easy to define an algorithm similar to [SSZ trie hash](https://github.com/ethereum/eth2.0-specs/blob/bed888810d5c99cd114adc9907c16268a2a285a9/specs/simple-serialize.md#tree-hash) that works well for RLP objects. #### Indexing RLP does not lend itself well to fast indexing into encoded data structures. can approach `O(N)` for long flat data structures. ### SSZ https://github.com/ethereum/eth2.0-specs/blob/bed888810d5c99cd114adc9907c16268a2a285a9/specs/simple-serialize.md #### Compactness SSZ is not efficient. This inefficiency seems to be largely due to the 4-byte length prefixes it uses for dynamic sized data structures and the length prefixes it uses for containers which are not needed. Measurements of Eth1.0 data structures showed SSZ serialized values being 30-50% larger than their RLP equivalents. #### Expressiveness SSZ supports all of the needed data types. #### Hashing SSZ defines [trie hash](https://github.com/ethereum/eth2.0-specs/blob/bed888810d5c99cd114adc9907c16268a2a285a9/specs/simple-serialize.md#tree-hash) which allows for efficient re-hashing of objects with minor modifications. #### Indexing SSZ allows for reasonably fast indexing. It contains some inefficiency here which my intuition says... *TODO* ### SOS https://gist.github.com/karalabe/3a25832b1413ee98daad9f0c47be3632 The SOS scheme has not been deeply researched, nor have firm numbers been measured to compare it to RLP os SSZ so the following is based largely on intuition. #### Compactness SOS should be on par or slightly more efficient than SSZ. The inefficiency of SOS should be attributable to the 4-byte offset indexes it uses to reference the locations of the dynamic elements. #### Expressiveness SSZ supports all of the needed data types. #### Hashing SOS does not define a fast hashing scheme but [SSZ trie hash](https://github.com/ethereum/eth2.0-specs/blob/bed888810d5c99cd114adc9907c16268a2a285a9/specs/simple-serialize.md#tree-hash) should be easily converted to work with SOS #### Indexing SOS provides reliably fast indexing ### SSS https://github.com/ethereum/bimini/blob/7c26efec585742ef870bf58ea5d96e2deb242775/spec.md This is an experimental scheme that I came up with #### Compactness SSS is the most compact of all of the schemes here. This is primarily achieved through the use of [LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128) encoding of integer values that are known to normally be well under their maximum bit width. SSS is able to gain roughly - 5% efficiency vs RLP - 30-50% efficiency vs SSZ #### Expressiveness SSS supports all of the needed data types. #### Hashing SSS does not define a fast hashing scheme but [SSZ trie hash](https://github.com/ethereum/eth2.0-specs/blob/bed888810d5c99cd114adc9907c16268a2a285a9/specs/simple-serialize.md#tree-hash) should be easily converted to work with SSS #### Indexing SSS does not provide fast indexing. ## Where we go from here Here is a table that summarizes the findings above. | | compact | expressiveness | hashing | indexing | |--------|----------|----------------|-----------|-----------| | RLP | yes | flexible | possible | no | | SSZ | no | yes | yes | poor | | SOS | no | yes | possible | yes | | SSS | yes | yes | possible | no | ### Option A: Two Serialization Formats One option is for us to use two formats. - RLP for *Networking* - SSZ for *Consensus* Using two different serialization schemes beneficial in that it appears that *indexing* and *compactness* appear to be somewhat mutually exclusive and that the existing schemes for proviing fast *indexing* preclude support for streaming encoding and decoding. We can improve this approach by bringing the offset format from SOS into SSZ. We can gain some efficiency at the networking level by modifying our use of RLP to include support for LEB128 encoded integers. ### Option B: RLP only The simplest option is to define trie-hash for RLP and accept the loss of fast indexing at the *Consensus* level and streamability when encoding decoding. We get a single established serialization scheme which is flexible and well known and serves all of our *major* requirements adequately. ### Option C: Hybrid SSZ/SOS/SSS As it stands, to pick from any of the proposed options would mean making a meaningful sacrifice in at least one of the evaluation criteria. I believe however that we *can* combine these formats to a single format which serves all of our needs. I propose the following high level approach - Trie Hashing from SSZ - Indexing from SOS - Efficiency from SSS This could be a *new* scheme or a modification to SSZ. Rough details of how this would work are: We split the concept of encoding into two concepts. - **value** encoding: - The encoded format without any metadata such as length prefixes - **structure** encoding: - The manner in which the structure is encoded, including both length prefixes and the layout of the encoded values. Everything would have a single scheme for *value* encodings. For *structure* encoding, we would define two schemes. - **ABI**: The highly efficient SOS style layout. - LEB128 length prefixes - Dynamic data encoded inline - **WIRE**: The highly compact SSS style layout. - Dynamic data referenced by offset and located at the end - Fixed size length prefixes (if applicable) - LEB128 encoded values get padded to full byte width The *Trie Hash* algorithm would be defined only for the *ABI* layout. ## Conclusion **Option A** is *good* because is takes the least work to implement, howeve it has the *downside* of leaving us with two different schemes. **Option B** leverages existing infrastructure, requires some non-trivial changes to the Eth2.0 spec, and requires us to make sacrifices on many of the evaluation criteria. **Option C** builds on the current Eth2.0 spec but with non-trivial changes should result in a single serialization scheme that is highly optimal across all criteria. My personal preferences in order of most preferred to least are: - Option C - Option A - Option B ## On Compression One question which has come up is roughly: > Can we choose a *single* serialization format that compromises on *compactness* (the serialized format is not *highly compact*) and make up those losses using compression (e.g. snappy). My experiments suggest that we *can* get improvements from using compression but that SSS style serialization still results in ~8% reduction in final size. - SSS+snappy vs SSZ+snappy - 8% size reduction for attestations - 9.5% size reduction for beacon blocks - 4% size reduction for beacon headers ## Lots of raw data Here is the latest raw data in comparing RLP, SSZ, and SSS https://gist.github.com/pipermerriam/449b0425a96197a731ed41f8f77ed42f

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lost their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.


Upgrade

All
  • All
  • Team
No template.

Create a template


Upgrade

Delete template

Do you really want to delete this template?

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Sign in via SAML

or

Sign in via GitHub

Help

  • English
  • 中文
  • 日本語

Documents

Tutorials

Book Mode Tutorial

Slide Example

YAML Metadata

Resources

Releases

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions

Versions and GitHub Sync

Sign in to link this note to GitHub Learn more
This note is not linked with GitHub Learn more
 
Add badge Pull Push GitHub Link Settings
Upgrade now

Version named by    

More Less
  • Edit
  • Delete

Note content is identical to the latest version.
Compare with
    Choose a version
    No search result
    Version not found

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub

      Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

       Sign in to GitHub

      HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Available push count

      Upgrade

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Upgrade

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully