owned this note
owned this note
Published
Linked with GitHub
# Extensible Discover V5 Sub-Protocol Architecture
There has been very little exploration into the use of TALK based sub-prtocols in discovery V5. The teams working on the portal network are the only projects that I (piper) am aware of that are leveraging and building on these APIs.
One problem that I believe we should look into addressing is that of reducing the boilerplate necessary for establishing new overlay networks.
## Motivation
In order to establish a new overlay network, a sub protocol must implement the base messages:
- PING
- PONG
- FIND_NODES
- FOUND_NODES
These messages are what are needed for nodes to populate and maintain a routing table.
Sub-protocols that wish to build functionality that is based on a DHT need these messages, and thus, each sub-protocol that one implements would require establishing these custom message types for their individual protocol as well as writing/duplicating all of the pieces of logic that go into managing these messages.
I propose that we should explore a generic sub-protocol that allows for re-use of these messages across multiple other sub-protocols.
## The "overlay" protocol
This DiscV5 sub-protocol provides a generic approach for establishing any number of overlay networks using a single DiscV5 sub-protocol.
The protocol will use the TALK `protocol_id`: `"overlay"` for all TALKREQ and TALKRESP messages.
All messages in the "overlay" protocol contain a `sub_protocol_id` which identifies which sub-protocol this message belongs to. Clients would configure which sub-protocols they support. The "overlay" protocol would then handle maintenence of individual routing tables for each of those protocols based on the `sub_protocol_id` of each message.
### Wire Protocol
All messages in the protocol are transmitted using the `TALKREQ` and `TALKRESP` messages from the base protocol.
All messages have a `message_id` and `encoded_message` that are concatenated to form the `payload` for either a `TALKREQ` or `TALKRESP` message.
```
payload := message_id | encoded_message
message_id := uint8
encoded_message := bytes
```
The `encoded_message` component is the SSZ encoded payload for the message type as indicated by the `message_id`. Each message has its own `sedes` which dictates how it should be encoded and decoded.
The SSZ sedes `byte_list` is used to alias `List[uint8, max_length=2048]`.
All messages have a `type` which is either `request` or `response`.
* `request` messages **MUST** be sent using a `TALKREQ`
* `response` messages **MUST** be sent using a `TALKRESP`
#### Ping (0x01)
Request message to check if a node is reachable, communicate basic information about our node, and request basic information about the other node.
```
message_id := 1
type := request
sedes := Container(enr_seq: uint64, sub_protocol_id: byte_list, sub_protocol_payload: byte_list)
```
* `enr_seq`: The node's current sequence number of their ENR record
* `sub_protocol_id`: The sub-protocol for this message
* `sub_protocol_payload`: An opaque data payload specific to the sub protocol.
#### Pong (0x02)
Response message to Ping(0x01)
```
message_id := 2
type := response
sedes := Container(enr_seq: uint64, sub_protocol_id: byte_list, sub_protocol_payload: byte_list)
```
* `enr_seq`: The node's current sequence number of their ENR record
* `sub_protocol_id`: The sub-protocol for this message
* `sub_protocol_payload`: An opaque data payload specific to the sub protocol.
#### Find Nodes (0x03)
Request nodes from the peer's routing table at the given logarithmic distances. The distance of `0` indicates a request for the peer's own ENR record.
```
message_id := 3
type := request
sedes := Container(sub_protocol_id: byte_list, distances: List[uint16, max_length=256])
```
* `sub_protocol_id`: The sub-protocol for this message
* `distances` is a list of distances for which the node is requesting ENR records for.
* Each distance **MUST** be within the inclusive range `[0, 256]`
* Each distance in the list **MUST** be unique.
#### Nodes (0x04)
Response message to FindNodes(0x03).
```
message_id := 4
type := response
sedes := Container(sub_protocol_id: byte_list, total: uint8, enrs: List[byte_list, max_length=32])
```
* `sub_protocol_id`: The sub-protocol for this message
* `total`: The total number of `Nodes` response messages being sent.
* `enrs`: List of bytestrings, each of which is an RLP encoded ENR record.
* Individual ENR records **MUST** correspond to one of the requested distances.
* It is invalid to return multiple ENR records for the same `node_id`.
> Note: If the number of ENR records cannot be encoded into a single message, then they should be sent back using multiple messages, with the `total` field representing the total number of messages that are being sent.
## The "content" protocol
This protocol contains a generic API for transmission of arbitrary data payloads for any number of sub-protocols using a single DiscV5 sub-protocol.
The protocol will use the TALK `protocol_id`: `"content"` for all TALKREQ and TALKRESP messages.
All messages in the "content" protocol contain a `sub_protocol_id` which identifies which sub-protocol this message belongs to. Clients would configure which sub-protocols they support.
The "content" protocol would
### Wire Protocol
> Same as "overlay" network in terms of message envelope.
#### Find Content (0x01)
Request either the data payload for a specific piece of content on the network, **or** ENR records of nodes that are closer to the requested content.
```
message_id := 1
type := request
sedes := Container(sub_protocol_id: byte_list, content_key: byte_list)
```
* `sub_protocol_id`: The sub-protocol for this message
* `content_key` the pre-image key for the content being requested..
#### Found Content (0x02)
Response message to Find Content (0x01).
This message can contain **either** the data payload for the requested content *or* a list of ENR records that are closer to the content than the responding node.
```
message_id := 6
type := response
sedes := Container(sub_protocol_id: byte_list, enrs: List[byte_list, max_length=32], payload: byte_list)
```
* `sub_protocol_id`: The sub-protocol for this message
* `enrs`: List of bytestrings, each of which is an RLP encoded ENR record.
* Individual ENR records **MUST** be closer to the requested content than the responding node.
* It is invalid to return multiple ENR records for the same `node_id`.
* This field **must** be empty if `payload` is non-empty.
* `payload`: bytestring of the requested content.
* This field **must** be empty if `enrs` is non-empty.
> A response with an empty `payload` and empty `enrs` indicates that the node is not aware of any closer nodes, *nor* does the node have the requested content.
## Example of "overlay" and "content" protocol composition
For a protocol like the "state network" we would use `sub_protocol_id = "portal:state"`.
The `sub_protocol_payload` for the PING/PONG messages would be the SSZ encoded `Container[data_radius: uint256]`.
The `"portal:state"` sub-protocol would use both the "overlay" and "content" networks for the functionality they provide, and then implement any additional custom messages needed such as the advertisement of proof availability.