owned this note
owned this note
Published
Linked with GitHub
# P2P Message Serialization
## Why not RLP?
See https://github.com/ethereum/wiki/wiki/Wishlist#rlp
## Requirements
- Deterministic field ordering. The same data should always be mapped to the same hash.
## Candidates
| Candidate | Supported Languages | Deterministic | Pros | Cons |
| ------------ | ------------------- | ------------- | ---- | ------ |
| [Protobuf](https://developers.google.com/protocol-buffers/) | C++, Java, Python, Go, C#, Objective C, JavaScript and Rust| No [1] | | The deterministic serialization is NOT canonical across languages. Fields may be in different order which poses hashing problems.|
| [CBOR](http://cbor.io/) | | No [2] | | |
| [Cap’n Proto](https://capnproto.org/) | Serialization + RPC: C++, Erlang, Go, Javascript, OCaml, Python, Rust. Serialization only: C, C#, D, Go, java, Javascript, Lua, Nim, Ruby, Scala | Supported [3] | Schema support | |
| [FlatBuffers](https://google.github.io/flatbuffers/) | C++, C#, C, Go, Java, JavaScript, TypeScript, PHP, and Python | No [4] | Schema support |Non-deterministic |
| [SimpleSerialize](https://github.com/ethereum/eth2.0-specs/blob/master/specs/simple-serialize.md#implementations) | Python, Rust, Nim, Javascript | Always [5] | Simple | Need more tests and impl work
| [MessagePack](https://msgpack.org/index.html) | C/C++, Java, Python, Rust, Go, JavaScript, Nim and dozens other languages | Supported [6] | | No schema support |
|Thrift| C++,C#,Go,Cocoa,D,Delphi,Erlang,Haskell,Java,OCaml,Perl,PHP,Python,Ruby,Smalltalk| Supported [7] | Schema support |
| [Avro](https://avro.apache.org/docs/current/) | Java, C, C++, C# | [Only maps and arrays are non-deterministic](https://stackoverflow.com/questions/28129664/why-isnt-the-avrocoder-deterministic#28131766); there is an [attempt at a deterministic version](https://bitbucket.org/jlewi/dataflow/src/3fc7ae6d353fd8457d757eac72322ec2f768a6e5/dataflow/src/main/java/dataflow/AvroDeterministicCoder.java?at=master) | Uses JSON for schemas; dynamic typing, Untagged data (more compact, faster de/serialization) | Depends on schemas, extra support may be lacking / need development for some languages |
## Custom serialization suggestions
- Add a magic number
- Add major.minor versioning
- Add an "offset" field that tells the offset at which raw data starts.
- Schema as JSON
- Raw data (numbers always stored in specific endianness)
Inspiration from [Numpy file format](https://github.com/numpy/numpy/blob/master/numpy/lib/format.py).
## Further reading
- https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
- etc.: https://duckduckgo.com/?q=comparision+of+data+serialization+formats&t=ffab&ia=web
## References
1. Protobuf Github Issue - "Protobuf serialization is not canonical": https://github.com/google/protobuf/issues/3417
2. CBOR spec - "no requirement that all data formats be uniquely encoded": https://tools.ietf.org/html/rfc7049
3. Cap'n Proto docs - "it is possible to write code which canonicalizes a ... message" - https://capnproto.org/encoding.html#canonicalization
4. FlatBuffers docs - has "flexibility" for ordering: https://google.github.io/flatbuffers/flatbuffers_internals.html
5. SimpleS - Vitalik designed it for this purpose. No known reference but it seems obvious.
6. MessagePack spec - supports "deterministic" serialized data in some cases: https://github.com/msgpack/msgpack/blob/master/spec.md
7. Thrift specification https://thrift.apache.org/static/files/thrift-20070401.pdf