Latest block exchange module spec

Signed-off-by: Chrysostomos Nanakos <chris@include.gr>
2026-02-19 12:43:21 +00:00 · 2025-11-12 01:15:32 +02:00 · 2025-11-12 01:15:32 +02:00 · 782a283d21
commit 782a283d21
parent 4008ce553b
2 changed files with 244 additions and 18 deletions
--- a/Notes/Specs/Block
+++ b/Notes/Specs/Block
@ -0,0 +1,32 @@
+At a higher level, Block Exchange Module is responsible for sending and receiving blocks related to the given content. Formally a block is described as a tuple consisting of the sequence of bytes and the corresponding CID. Or, using a pseudo-language: `(seq[byte], CID)`.
+
+When uploading the content to the network, the following steps are taken:
+
+1. The content is chunked into fixed size blocks. The default block size is `64KiB`. The last block will be padded with `0`s.
+2. For each chunk, the corresponding `CID` is created, using `(codex-block, 0xCD02)` as a [[Multicodec]] and `(sha2-256, 0x12)` as [[Mutihash]].
+3. The resulting block, `(seq[byte], CID)` becomes the input to the Block Exchange Module `putBlock` operation.
+
+When downloading the content from the network:
+
+1. Given the Codex Manifest CID, the corresponding Codex Manifest is retrieved. Using manifest attributes - `datasetSize` and `blockSize`, the number of blocks - $blockCount$ is computed as $\lceil datasetSize/blockSize \rceil$.
+2. For each $blockIndex \in [0 .. blockCount]$, a $BlockAddress$ is defined as a tuple $(treeCid, blockIndex)$ where $treeCid$ is an attribute in the Codex Manifest. $BlockAddress$ is the input of the Block Exchange Module `requestBlock` operation.
+
+The `putBlock` and `requestBlock` operations defined above, define the external interface of the Block Exchange Module. The Block Exchange Module directly interacts with other modules, mainly [[Discovery Module]] and the [[Repo Store]], used for the node discovery and local storage respectively.
+
+## Functional Requirements
+
+Before defining the operational model of the Block Exchange Module, which describes *how* the exchange module achieves its goals, let's the find the functional requirements that define *what* the exchange module is expected to achieve.
+
+1. Given $BlockAddress = (treeCid, blockIndex)$, retrieve the corresponding block from the network. In the operational model, we will see that the exchange module will attempt $3000$ retires, waiting $500ms$ in each iteration for the block to be delivered. Thus, from the declarative perspective, here, we can claim that exchange module should deliver the block withing $3000 \times 500ms = 1500s = 25min$, or fail otherwise.
+2. When uploading content $C$, for each block $b \in C$, store $b$ in [[Repo Store]] and (1) announce the block *presence* to the peers signaling interest in $b$, (2) deliver $b$ to the peers that earlier requested $b$. Include Merkle Proof in the delivery if $b$ is not Codex Manifest block.
+
+## Operational Model
+
+In this section we focus more how the exchange module operates to deliver on the [[#Functional Requirements]].
+
+In the image below, we provide a high level description of the exchange module operational model. A more detailed discussion follows.
+
+![[BlockExchangeProtocol.svg]]
+
+> The image above has SVG format which allows a high resolution local rendering. You can also access online version at: https://link.excalidraw.com/readonly/GLtqSUDCiRe38gDb2MOX.
+
--- a/Notes/Specs/Block
+++ b/Notes/Specs/Block
@ -1,32 +1,226 @@
-At a higher level, Block Exchange Module is responsible for sending and receiving blocks related to the given content. Formally a block is described as a tuple consisting of the sequence of bytes and the corresponding CID. Or, using a pseudo-language: `(seq[byte], CID)`.
+# Codex Block Exchange Module Specification

-When uploading the content to the network, the following steps are taken:
+## Introduction

-1. The content is chunked into fixed size blocks. The default block size is `64KiB`. The last block will be padded with `0`s.
-2. For each chunk, the corresponding `CID` is created, using `(codex-block, 0xCD02)` as a [[Multicodec]] and `(sha2-256, 0x12)` as [[Mutihash]].
-3. The resulting block, `(seq[byte], CID)` becomes the input to the Block Exchange Module `putBlock` operation.
+The Block Exchange Module is a core component of Codex responsible for peer-to-peer content distribution. It handles the sending and receiving of blocks across the network, enabling efficient data sharing between Codex nodes.

-When downloading the content from the network:
+## Overview

-1. Given the Codex Manifest CID, the corresponding Codex Manifest is retrieved. Using manifest attributes - `datasetSize` and `blockSize`, the number of blocks - $blockCount$ is computed as $\lceil datasetSize/blockSize \rceil$.
-2. For each $blockIndex \in [0 .. blockCount]$, a $BlockAddress$ is defined as a tuple $(treeCid, blockIndex)$ where $treeCid$ is an attribute in the Codex Manifest. $BlockAddress$ is the input of the Block Exchange Module `requestBlock` operation.
+The Codex Block Exchange Protocol is a libp2p-based protocol for exchanging content blocks between Codex nodes.

-The `putBlock` and `requestBlock` operations defined above, define the external interface of the Block Exchange Module. The Block Exchange Module directly interacts with other modules, mainly [[Discovery Module]] and the [[Repo Store]], used for the node discovery and local storage respectively.
+The protocol enables efficient peer-to-peer content distribution by:
+- Advertising block availability to interested peers
+- Requesting blocks from peers who have them
+- Delivering blocks with Merkle proofs for tree-structured data (both erasure-coded and regular datasets)

-## Functional Requirements
+### Block Format

-Before defining the operational model of the Block Exchange Module, which describes *how* the exchange module achieves its goals, let's the find the functional requirements that define *what* the exchange module is expected to achieve.
+In Codex, a block is formally defined as a tuple consisting of raw data and its content identifier: `(data: seq[byte], cid: Cid)`.

-1. Given $BlockAddress = (treeCid, blockIndex)$, retrieve the corresponding block from the network. In the operational model, we will see that the exchange module will attempt $3000$ retires, waiting $500ms$ in each iteration for the block to be delivered. Thus, from the declarative perspective, here, we can claim that exchange module should deliver the block withing $3000 \times 500ms = 1500s = 25min$, or fail otherwise.
-2. When uploading content $C$, for each block $b \in C$, store $b$ in [[Repo Store]] and (1) announce the block *presence* to the peers signaling interest in $b$, (2) deliver $b$ to the peers that earlier requested $b$. Include Merkle Proof in the delivery if $b$ is not Codex Manifest block.
+**Block Creation:**
+- Content is chunked into fixed-size blocks (default: 64 KiB)
+- The last block is padded with zeros if necessary
+- Each block's CID uses:
+  - Multicodec: `codex-block` (0xCD02)
+  - Multihash: `sha2-256` (0x12)

-## Operational Model
+**Block Addressing:**
+Blocks can be addressed in two ways:
+- **Standalone blocks**: Direct CID reference
+- **Tree blocks**: Reference by `(treeCid, blockIndex)` for blocks within Merkle tree structures (both regular files and erasure-coded datasets)

-In this section we focus more how the exchange module operates to deliver on the [[#Functional Requirements]].
+### Module Context

-In the image below, we provide a high level description of the exchange module operational model. A more detailed discussion follows.
+The Block Exchange Protocol is the wire protocol used by Codex's Block Exchange Module. The module provides the following public interface:

-![[BlockExchangeProtocol.svg]]
+- `requestBlock(address: BlockAddress): Future[?!Block]` - Request a single block by address
+- `requestBlock(cid: Cid): Future[?!Block]` - Request a single block by CID
+- `requestBlocks(addresses: seq[BlockAddress]): SafeAsyncIter[Block]` - Request multiple blocks as an async iterator

-> The image above has SVG format which allows a high resolution local rendering. You can also access online version at: https://link.excalidraw.com/readonly/GLtqSUDCiRe38gDb2MOX.
+The module integrates with:
+- **Discovery Module**: DHT-based peer discovery for content
+- **Local Store (Repo Store)**: Persistent block storage
+- **Advertiser**: Announces block availability to the DHT
+- **Network Layer**: libp2p-based peer connections and message transport

+## Protocol Identifier
+
+The Block Exchange Protocol uses the following libp2p protocol identifier:
+
+```
+/codex/blockexc/1.0.0
+```
+
+## Connection Model
+
+The protocol operates over libp2p streams. When a node wants to communicate with a peer:
+
+1. The initiating node dials the peer using the protocol identifier
+2. A bidirectional stream is established
+3. Both sides can send and receive messages on this stream
+4. Messages are encoded using Protocol Buffers
+5. The stream remains open for the duration of the exchange session
+6. Peers track active connections in a peer context store
+
+The protocol handles peer lifecycle events:
+- **Peer Joined**: When a peer connects, it is added to the active peer set
+- **Peer Departed**: When a peer disconnects gracefully, its context is cleaned up
+- **Peer Dropped**: When a peer connection fails, it is removed from the active set
+
+## Message Format
+
+All messages exchanged between peers use Protocol Buffers encoding.
+
+### Message Structure
+
+```protobuf
+message Message {
+  Wantlist wantlist = 1;
+  repeated BlockDelivery payload = 3;
+  repeated BlockPresence blockPresences = 4;
+  int32 pendingBytes = 5;
+  AccountMessage account = 6;
+  StateChannelUpdate payment = 7;
+}
+```
+
+**Field Descriptions:**
+
+- `wantlist`: Requests for blocks (presence checks or full block requests)
+- `payload`: Block deliveries being sent to the peer
+- `blockPresences`: Block presence information (have/don't have)
+- `pendingBytes`: Number of bytes currently pending delivery to this peer
+- `account`: Ethereum account information for receiving payments
+- `payment`: Nitro state channel update for micropayments
+
+### Block Addressing
+
+Codex uses a block addressing scheme that supports both simple content-addressed blocks and blocks within Merkle tree structures.
+
+```protobuf
+message BlockAddress {
+  bool leaf = 1;
+  bytes treeCid = 2;    // Present when leaf = true
+  uint64 index = 3;     // Present when leaf = true
+  bytes cid = 4;        // Present when leaf = false
+}
+```
+
+**Addressing Modes:**
+
+- **Simple Block** (`leaf = false`): Direct CID reference to a standalone content block
+- **Tree Block** (`leaf = true`): Reference to a block within a Merkle tree by tree CID and index. The tree may represent either an erasure-coded dataset or a regular uploaded file organized in a tree structure
+
+### WantList
+
+The WantList allows a peer to request blocks or check for block availability.
+
+```protobuf
+message Wantlist {
+  enum WantType {
+    wantBlock = 0;
+    wantHave = 1;
+  }
+
+  message Entry {
+    BlockAddress address = 1;
+    int32 priority = 2;
+    bool cancel = 3;
+    WantType wantType = 4;
+    bool sendDontHave = 5;
+  }
+
+  repeated Entry entries = 1;
+  bool full = 2;
+}
+```
+
+**Field Descriptions:**
+
+- **Entry.address**: The block being requested
+- **Entry.priority**: Request priority (currently always 0)
+- **Entry.cancel**: If true, cancels a previous want for this block
+- **Entry.wantType**:
+  - `wantHave (1)`: Only check if peer has the block
+  - `wantBlock (0)`: Request full block data
+- **Entry.sendDontHave**: If true, peer should respond even if it doesn't have the block
+- **full**: If true, this WantList replaces all previous wants; if false, it's a delta update
+
+**Delta WantList Updates:**
+
+The protocol supports efficient delta updates where only changes to the WantList are transmitted:
+- New blocks are added to the peer's want list
+- Cancelled blocks are removed from the peer's want list
+- Delta updates use `full = false`
+- Full replacements use `full = true`
+
+### Block Delivery
+
+Block deliveries contain the actual block data with Merkle proofs for tree blocks.
+
+```protobuf
+message BlockDelivery {
+  bytes cid = 1;
+  bytes data = 2;
+  BlockAddress address = 3;
+  bytes proof = 4;  // Present only when address.leaf = true
+}
+```
+
+**Field Descriptions:**
+
+- `cid`: Content identifier of the block
+- `data`: Raw block data (up to 100 MiB)
+- `address`: The address that was requested
+- `proof`: Merkle proof (CodexProof) verifying block correctness (required for tree blocks)
+
+**Merkle Proof Verification:**
+
+When delivering tree blocks (`address.leaf = true`):
+- The delivery must include a Merkle proof (CodexProof)
+- The proof verifies that the block at the given index is correctly part of the Merkle tree identified by the tree CID
+- This applies to all tree-structured data, whether erasure-coded or not
+- Recipients must verify the proof before accepting the block
+- Invalid proofs result in block rejection
+
+### Block Presence
+
+Block presence messages indicate whether a peer has specific blocks.
+
+```protobuf
+enum BlockPresenceType {
+  presenceHave = 0;
+  presenceDontHave = 1;
+}
+
+message BlockPresence {
+  BlockAddress address = 1;
+  BlockPresenceType type = 2;
+  bytes price = 3;
+}
+```
+
+**Field Descriptions:**
+
+- `address`: The block address being referenced
+- `type`: Whether the peer has the block or not
+- `price`: Price (UInt256 format)
+
+### Payment Messages
+
+Payment-related messages for micropayments using Nitro state channels.
+
+```protobuf
+message AccountMessage {
+  bytes address = 1;  // Ethereum address to which payments should be made
+}
+
+message StateChannelUpdate {
+  bytes update = 1;   // Signed Nitro state, serialized as JSON
+}
+```
+
+**Field Descriptions:**
+
+- `AccountMessage.address`: Ethereum address for receiving payments
+- `StateChannelUpdate.update`: Nitro state channel update containing payment information