# P2P Message Serialization ## Why not RLP? See https://github.com/ethereum/wiki/wiki/Wishlist#rlp ## Requirements - Deterministic field ordering. The same data should always be mapped to the same hash. ## Candidates | Candidate | Supported Languages | Deterministic | Pros | Cons | | ------------ | ------------------- | ------------- | ---- | ------ | | [Protobuf](https://developers.google.com/protocol-buffers/) | C++, Java, Python, Go, C#, Objective C, JavaScript and Rust| No [1] | | The deterministic serialization is NOT canonical across languages. Fields may be in different order which poses hashing problems.| | [CBOR](http://cbor.io/) | | No [2] | | | | [Cap’n Proto](https://capnproto.org/) | Serialization + RPC: C++, Erlang, Go, Javascript, OCaml, Python, Rust. Serialization only: C, C#, D, Go, java, Javascript, Lua, Nim, Ruby, Scala | Supported [3] | Schema support | | | [FlatBuffers](https://google.github.io/flatbuffers/) | C++, C#, C, Go, Java, JavaScript, TypeScript, PHP, and Python | No [4] | Schema support |Non-deterministic | | [SimpleSerialize](https://github.com/ethereum/eth2.0-specs/blob/master/specs/simple-serialize.md#implementations) | Python, Rust, Nim, Javascript | Always [5] | Simple | Need more tests and impl work | [MessagePack](https://msgpack.org/index.html) | C/C++, Java, Python, Rust, Go, JavaScript, Nim and dozens other languages | Supported [6] | | No schema support | |Thrift| C++,C#,Go,Cocoa,D,Delphi,Erlang,Haskell,Java,OCaml,Perl,PHP,Python,Ruby,Smalltalk| Supported [7] | Schema support | | [Avro](https://avro.apache.org/docs/current/) | Java, C, C++, C# | [Only maps and arrays are non-deterministic](https://stackoverflow.com/questions/28129664/why-isnt-the-avrocoder-deterministic#28131766); there is an [attempt at a deterministic version](https://bitbucket.org/jlewi/dataflow/src/3fc7ae6d353fd8457d757eac72322ec2f768a6e5/dataflow/src/main/java/dataflow/AvroDeterministicCoder.java?at=master) | Uses JSON for schemas; dynamic typing, Untagged data (more compact, faster de/serialization) | Depends on schemas, extra support may be lacking / need development for some languages | ## Custom serialization suggestions - Add a magic number - Add major.minor versioning - Add an "offset" field that tells the offset at which raw data starts. - Schema as JSON - Raw data (numbers always stored in specific endianness) Inspiration from [Numpy file format](https://github.com/numpy/numpy/blob/master/numpy/lib/format.py). ## Further reading - https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats - etc.: https://duckduckgo.com/?q=comparision+of+data+serialization+formats&t=ffab&ia=web ## References 1. Protobuf Github Issue - "Protobuf serialization is not canonical": https://github.com/google/protobuf/issues/3417 2. CBOR spec - "no requirement that all data formats be uniquely encoded": https://tools.ietf.org/html/rfc7049 3. Cap'n Proto docs - "it is possible to write code which canonicalizes a ... message" - https://capnproto.org/encoding.html#canonicalization 4. FlatBuffers docs - has "flexibility" for ordering: https://google.github.io/flatbuffers/flatbuffers_internals.html 5. SimpleS - Vitalik designed it for this purpose. No known reference but it seems obvious. 6. MessagePack spec - supports "deterministic" serialized data in some cases: https://github.com/msgpack/msgpack/blob/master/spec.md 7. Thrift specification https://thrift.apache.org/static/files/thrift-20070401.pdf