Extensible Discover V5 Sub-Protocol Architecture

There has been very little exploration into the use of TALK based sub-prtocols in discovery V5. The teams working on the portal network are the only projects that I (piper) am aware of that are leveraging and building on these APIs.

One problem that I believe we should look into addressing is that of reducing the boilerplate necessary for establishing new overlay networks.

Motivation

In order to establish a new overlay network, a sub protocol must implement the base messages:

  • PING
  • PONG
  • FIND_NODES
  • FOUND_NODES

These messages are what are needed for nodes to populate and maintain a routing table.

Sub-protocols that wish to build functionality that is based on a DHT need these messages, and thus, each sub-protocol that one implements would require establishing these custom message types for their individual protocol as well as writing/duplicating all of the pieces of logic that go into managing these messages.

I propose that we should explore a generic sub-protocol that allows for re-use of these messages across multiple other sub-protocols.

The “overlay” protocol

This DiscV5 sub-protocol provides a generic approach for establishing any number of overlay networks using a single DiscV5 sub-protocol.

The protocol will use the TALK protocol_id: "overlay" for all TALKREQ and TALKRESP messages.

All messages in the “overlay” protocol contain a sub_protocol_id which identifies which sub-protocol this message belongs to. Clients would configure which sub-protocols they support. The “overlay” protocol would then handle maintenence of individual routing tables for each of those protocols based on the sub_protocol_id of each message.

Wire Protocol

All messages in the protocol are transmitted using the TALKREQ and TALKRESP messages from the base protocol.

All messages have a message_id and encoded_message that are concatenated to form the payload for either a TALKREQ or TALKRESP message.

payload         := message_id | encoded_message
message_id      := uint8
encoded_message := bytes

The encoded_message component is the SSZ encoded payload for the message type as indicated by the message_id. Each message has its own sedes which dictates how it should be encoded and decoded.

The SSZ sedes byte_list is used to alias List[uint8, max_length=2048].

All messages have a type which is either request or response.

  • request messages MUST be sent using a TALKREQ
  • response messages MUST be sent using a TALKRESP

Ping (0x01)

Request message to check if a node is reachable, communicate basic information about our node, and request basic information about the other node.

message_id := 1
type       := request
sedes      := Container(enr_seq: uint64, sub_protocol_id: byte_list, sub_protocol_payload: byte_list)
  • enr_seq: The node’s current sequence number of their ENR record
  • sub_protocol_id: The sub-protocol for this message
  • sub_protocol_payload: An opaque data payload specific to the sub protocol.

Pong (0x02)

Response message to Ping(0x01)

message_id := 2
type       := response
sedes      := Container(enr_seq: uint64, sub_protocol_id: byte_list, sub_protocol_payload: byte_list)
  • enr_seq: The node’s current sequence number of their ENR record
  • sub_protocol_id: The sub-protocol for this message
  • sub_protocol_payload: An opaque data payload specific to the sub protocol.

Find Nodes (0x03)

Request nodes from the peer’s routing table at the given logarithmic distances. The distance of 0 indicates a request for the peer’s own ENR record.

message_id := 3
type       := request
sedes      := Container(sub_protocol_id: byte_list, distances: List[uint16, max_length=256])
  • sub_protocol_id: The sub-protocol for this message
  • distances is a list of distances for which the node is requesting ENR records for.
    • Each distance MUST be within the inclusive range [0, 256]
    • Each distance in the list MUST be unique.

Nodes (0x04)

Response message to FindNodes(0x03).

message_id := 4
type       := response
sedes      := Container(sub_protocol_id: byte_list, total: uint8, enrs: List[byte_list, max_length=32])
  • sub_protocol_id: The sub-protocol for this message
  • total: The total number of Nodes response messages being sent.
  • enrs: List of bytestrings, each of which is an RLP encoded ENR record.
    • Individual ENR records MUST correspond to one of the requested distances.
    • It is invalid to return multiple ENR records for the same node_id.

Note: If the number of ENR records cannot be encoded into a single message, then they should be sent back using multiple messages, with the total field representing the total number of messages that are being sent.

The “content” protocol

This protocol contains a generic API for transmission of arbitrary data payloads for any number of sub-protocols using a single DiscV5 sub-protocol.

The protocol will use the TALK protocol_id: "content" for all TALKREQ and TALKRESP messages.

All messages in the “content” protocol contain a sub_protocol_id which identifies which sub-protocol this message belongs to. Clients would configure which sub-protocols they support.

The “content” protocol would

Wire Protocol

Same as “overlay” network in terms of message envelope.

Find Content (0x01)

Request either the data payload for a specific piece of content on the network, or ENR records of nodes that are closer to the requested content.

message_id := 1
type       := request
sedes      := Container(sub_protocol_id: byte_list, content_key: byte_list)
  • sub_protocol_id: The sub-protocol for this message
  • content_key the pre-image key for the content being requested…

Found Content (0x02)

Response message to Find Content (0x01).

This message can contain either the data payload for the requested content or a list of ENR records that are closer to the content than the responding node.

message_id := 6
type       := response
sedes      := Container(sub_protocol_id: byte_list, enrs: List[byte_list, max_length=32], payload: byte_list)
  • sub_protocol_id: The sub-protocol for this message
  • enrs: List of bytestrings, each of which is an RLP encoded ENR record.
    • Individual ENR records MUST be closer to the requested content than the responding node.
    • It is invalid to return multiple ENR records for the same node_id.
    • This field must be empty if payload is non-empty.
  • payload: bytestring of the requested content.
    • This field must be empty if enrs is non-empty.

A response with an empty payload and empty enrs indicates that the node is not aware of any closer nodes, nor does the node have the requested content.

Example of “overlay” and “content” protocol composition

For a protocol like the “state network” we would use sub_protocol_id = "portal:state".

The sub_protocol_payload for the PING/PONG messages would be the SSZ encoded Container[data_radius: uint256].

The "portal:state" sub-protocol would use both the “overlay” and “content” networks for the functionality they provide, and then implement any additional custom messages needed such as the advertisement of proof availability.

Select a repo