# Extensible Discover V5 Sub-Protocol Architecture There has been very little exploration into the use of TALK based sub-prtocols in discovery V5. The teams working on the portal network are the only projects that I (piper) am aware of that are leveraging and building on these APIs. One problem that I believe we should look into addressing is that of reducing the boilerplate necessary for establishing new overlay networks. ## Motivation In order to establish a new overlay network, a sub protocol must implement the base messages: - PING - PONG - FIND_NODES - FOUND_NODES These messages are what are needed for nodes to populate and maintain a routing table. Sub-protocols that wish to build functionality that is based on a DHT need these messages, and thus, each sub-protocol that one implements would require establishing these custom message types for their individual protocol as well as writing/duplicating all of the pieces of logic that go into managing these messages. I propose that we should explore a generic sub-protocol that allows for re-use of these messages across multiple other sub-protocols. ## The "overlay" protocol This DiscV5 sub-protocol provides a generic approach for establishing any number of overlay networks using a single DiscV5 sub-protocol. The protocol will use the TALK `protocol_id`: `"overlay"` for all TALKREQ and TALKRESP messages. All messages in the "overlay" protocol contain a `sub_protocol_id` which identifies which sub-protocol this message belongs to. Clients would configure which sub-protocols they support. The "overlay" protocol would then handle maintenence of individual routing tables for each of those protocols based on the `sub_protocol_id` of each message. ### Wire Protocol All messages in the protocol are transmitted using the `TALKREQ` and `TALKRESP` messages from the base protocol. All messages have a `message_id` and `encoded_message` that are concatenated to form the `payload` for either a `TALKREQ` or `TALKRESP` message. ``` payload := message_id | encoded_message message_id := uint8 encoded_message := bytes ``` The `encoded_message` component is the SSZ encoded payload for the message type as indicated by the `message_id`. Each message has its own `sedes` which dictates how it should be encoded and decoded. The SSZ sedes `byte_list` is used to alias `List[uint8, max_length=2048]`. All messages have a `type` which is either `request` or `response`. * `request` messages **MUST** be sent using a `TALKREQ` * `response` messages **MUST** be sent using a `TALKRESP` #### Ping (0x01) Request message to check if a node is reachable, communicate basic information about our node, and request basic information about the other node. ``` message_id := 1 type := request sedes := Container(enr_seq: uint64, sub_protocol_id: byte_list, sub_protocol_payload: byte_list) ``` * `enr_seq`: The node's current sequence number of their ENR record * `sub_protocol_id`: The sub-protocol for this message * `sub_protocol_payload`: An opaque data payload specific to the sub protocol. #### Pong (0x02) Response message to Ping(0x01) ``` message_id := 2 type := response sedes := Container(enr_seq: uint64, sub_protocol_id: byte_list, sub_protocol_payload: byte_list) ``` * `enr_seq`: The node's current sequence number of their ENR record * `sub_protocol_id`: The sub-protocol for this message * `sub_protocol_payload`: An opaque data payload specific to the sub protocol. #### Find Nodes (0x03) Request nodes from the peer's routing table at the given logarithmic distances. The distance of `0` indicates a request for the peer's own ENR record. ``` message_id := 3 type := request sedes := Container(sub_protocol_id: byte_list, distances: List[uint16, max_length=256]) ``` * `sub_protocol_id`: The sub-protocol for this message * `distances` is a list of distances for which the node is requesting ENR records for. * Each distance **MUST** be within the inclusive range `[0, 256]` * Each distance in the list **MUST** be unique. #### Nodes (0x04) Response message to FindNodes(0x03). ``` message_id := 4 type := response sedes := Container(sub_protocol_id: byte_list, total: uint8, enrs: List[byte_list, max_length=32]) ``` * `sub_protocol_id`: The sub-protocol for this message * `total`: The total number of `Nodes` response messages being sent. * `enrs`: List of bytestrings, each of which is an RLP encoded ENR record. * Individual ENR records **MUST** correspond to one of the requested distances. * It is invalid to return multiple ENR records for the same `node_id`. > Note: If the number of ENR records cannot be encoded into a single message, then they should be sent back using multiple messages, with the `total` field representing the total number of messages that are being sent. ## The "content" protocol This protocol contains a generic API for transmission of arbitrary data payloads for any number of sub-protocols using a single DiscV5 sub-protocol. The protocol will use the TALK `protocol_id`: `"content"` for all TALKREQ and TALKRESP messages. All messages in the "content" protocol contain a `sub_protocol_id` which identifies which sub-protocol this message belongs to. Clients would configure which sub-protocols they support. The "content" protocol would ### Wire Protocol > Same as "overlay" network in terms of message envelope. #### Find Content (0x01) Request either the data payload for a specific piece of content on the network, **or** ENR records of nodes that are closer to the requested content. ``` message_id := 1 type := request sedes := Container(sub_protocol_id: byte_list, content_key: byte_list) ``` * `sub_protocol_id`: The sub-protocol for this message * `content_key` the pre-image key for the content being requested.. #### Found Content (0x02) Response message to Find Content (0x01). This message can contain **either** the data payload for the requested content *or* a list of ENR records that are closer to the content than the responding node. ``` message_id := 6 type := response sedes := Container(sub_protocol_id: byte_list, enrs: List[byte_list, max_length=32], payload: byte_list) ``` * `sub_protocol_id`: The sub-protocol for this message * `enrs`: List of bytestrings, each of which is an RLP encoded ENR record. * Individual ENR records **MUST** be closer to the requested content than the responding node. * It is invalid to return multiple ENR records for the same `node_id`. * This field **must** be empty if `payload` is non-empty. * `payload`: bytestring of the requested content. * This field **must** be empty if `enrs` is non-empty. > A response with an empty `payload` and empty `enrs` indicates that the node is not aware of any closer nodes, *nor* does the node have the requested content. ## Example of "overlay" and "content" protocol composition For a protocol like the "state network" we would use `sub_protocol_id = "portal:state"`. The `sub_protocol_payload` for the PING/PONG messages would be the SSZ encoded `Container[data_radius: uint256]`. The `"portal:state"` sub-protocol would use both the "overlay" and "content" networks for the functionality they provide, and then implement any additional custom messages needed such as the advertisement of proof availability.