From 9c46e1148d581b5330005148f4aa56e628835bb5 Mon Sep 17 00:00:00 2001 From: pablo Date: Sun, 19 Oct 2025 15:44:17 +0300 Subject: [PATCH 01/10] feat: add segmentation spec already implemented in https://github.com/waku-org/nim-chat-sdk/blob/main/chat_sdk/segmentation.nim --- standards/application/segmentation.md | 187 ++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) create mode 100644 standards/application/segmentation.md diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md new file mode 100644 index 0000000..8156778 --- /dev/null +++ b/standards/application/segmentation.md @@ -0,0 +1,187 @@ +--- +title: Message Segmentation and Reconstruction over Waku +name: Message Segmentation and Reconstruction +tags: [waku-application, segmentation] +version: 0.1 +status: draft +--- + +## Abstract + +This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over Waku when the original payload exceeds the maximum Waku's message size. Applications partition the payload into multiple Waku envelopes and reconstruct the original on receipt, even when segments arrive out of order or up to **12.5%** of segments are lost. The protocol uses **Reed–Solomon** erasure coding for fault tolerance. Messages whose payload size is **≤ `segmentSize`** are sent unmodified. + +## Motivation + +Waku Relay deployments typically propagate envelopes up to **1 MB**. To support larger application payloads (e.g., up to **10 MiB** or more), a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. + +## Terminology + +- **original message**: the full application payload before segmentation. +- **data segment**: one of the partitioned chunks of the original message payload. +- **parity segment**: an erasure-coded segment derived from the set of data segments. +- **segment message**: a Waku envelope whose `payload` field carries a serialized `SegmentMessageProto`. +- **`segmentSize`**: configured maximum size in bytes of each data segment’s `payload` chunk (before protobuf serialization). +- **sender public key**: the origin identifier used for indexing persistence. + +The key words **“MUST”**, **“MUST NOT”**, **“REQUIRED”**, **“SHALL”**, **“SHALL NOT”**, **“SHOULD”**, **“SHOULD NOT”**, **“RECOMMENDED”**, **“NOT RECOMMENDED”**, **“MAY”**, and **“OPTIONAL”** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + +## Segmentation + +### Sending + +When the original payload exceeds `segmentSize`, the sender **MUST**: + +- Compute a 32-byte `entire_message_hash = Keccak256(originalPayload)`. +- Split the payload into one or more **data segments**, each of size up to `segmentSize` bytes. +- Optionally generate **parity segments** using Reed–Solomon erasure coding, at a fixed parity rate of 12.5%. + Implementations **MUST NOT** produce more than 256 total segments (data + parity). +- Encode each segment as a `SegmentMessageProto` with: + - The `entire_message_hash` + - Either data-segment indices (`segments_count`, `index`) or parity-segment indices (`parity_segments_count`, `parity_segment_index`) + - The raw payload data +- Send all segments as individual Waku envelopes, preserving application-level metadata (e.g., content topic). + +Messages smaller than or equal to `segmentSize` **SHALL** be transmitted unmodified. + +### Receiving + +Upon receiving a segmented message, the receiver **MUST**: + +- Validate each segment according to [Wire Format → Validation](#validation). +- Persist received segments, indexed by `entire_message_hash` and sender. +- Attempt reconstruction when the number of available (data + parity) segments equals or exceeds the data segment count. +- Reconstruct by: + - Concatenating data segments if all are present, or + - Applying Reed–Solomon decoding if parity segments are available. +- Verify `Keccak256(reconstructedPayload)` matches `entire_message_hash`. + On mismatch, the message **MUST** be discarded and logged as invalid. +- Once verified, the reconstructed payload **SHALL** be delivered to the application. +- Incomplete reconstructions **SHOULD** be garbage-collected after a timeout. + +## Wire Format + +```protobuf +syntax = "proto3"; + +message SegmentMessageProto { + // Keccak256(original payload), 32 bytes + bytes entire_message_hash = 1; + + // Data segment indexing + uint32 index = 2; // valid only if segments_count > 0 + uint32 segments_count = 3; // number of data segments (>= 2) + + // Segment payload (data or parity shard) + bytes payload = 4; + + // Parity segment indexing (used iff segments_count == 0) + uint32 parity_segment_index = 5; + uint32 parity_segments_count = 6; // number of parity segments (> 0) +} +``` + +### Validation + +Receivers **MUST** enforce: + +- `entire_message_hash.length == 32` +- **Data segments:** `segments_count >= 2` **AND** `index < segments_count` +- **Parity segments:** `segments_count == 0` **AND** `parity_segments_count > 0` **AND** `parity_segment_index < parity_segments_count` + +No other combinations are permitted. + +--- + +### Constants + +- **Parity rate:** `0.125` (12.5%) +- **Max total segments (data + parity):** `256` (library limitation) +- **Overhead targets:** + - Bandwidth overhead from parity segments ≤ **12.5%** overall + - Serialization/metadata overhead ≤ **100 bytes** per segment (implementation target) + +> **Note:** With a parity rate of 12.5%, reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to 12.5% loss tolerated). + +--- + +## Implementation + +### Reed–Solomon + +Implementations that apply parity **SHALL** use fixed-size shards of length `segmentSize`. +The last data chunk **MUST** be padded to `segmentSize` for encoding. +The reference implementation uses **nim-leopard** (Leopard-RS) with a maximum of **256 total shards**. + +### Storage / Persistence + +Segments **MUST** be persisted (e.g., SQLite) and indexed by `entire_message_hash` and sender public key. +Implementations **SHOULD** support: + +- Duplicate detection and idempotent saves +- Completion flags to prevent duplicate processing +- Timeout-based cleanup of incomplete reconstructions +- Per-sender quotas for stored bytes and concurrent reconstructions + +### Configuration + +- `segmentSize` — **REQUIRED** +- `parityRate` — fixed at **0.125** + +**API simplicity:** Libraries **SHOULD** require only `segmentSize` from the application for normal operation. + +### Supportability + +- **Language / Package:** Nim; **Nimble** package manager +- **Intended for:** all Waku nodes at the application layer + +--- + +## Security Considerations + +### Privacy + +`entire_message_hash` enables correlation of segments that belong to the same original message but does not reveal content. +Applications **SHOULD** encrypt payloads before segmentation. +Traffic analysis may still identify segmented flows. + +### Integrity + +Implementations **MUST** verify the Keccak256 hash post-reconstruction and discard on mismatch. + +### Denial of Service + +To mitigate resource exhaustion: + +- Limit concurrent reconstructions and per-sender storage +- Enforce timeouts and size caps +- Validate segment counts (≤ 256) +- Consider rate-limiting using [17/WAKU2-RLN-RELAY](https://rfc.vac.dev/waku/standards/core/17/rln-relay) + +### Compatibility + +Nodes that do **not** implement this specification cannot reconstruct large messages. + +--- + +## Deployability + +- Bandwidth overhead ≈ **12.5%** from parity (if enabled) +- Additional per-segment overhead ≤ **100 bytes** (protobuf + metadata) +- Larger messages increase gossip traffic and storage; operators **SHOULD** consider policy limits + +--- + +## References + +1. [10/WAKU2 – Waku](https://rfc.vac.dev/waku/standards/core/10/waku2) +2. [11/WAKU2-RELAY – Relay](https://rfc.vac.dev/waku/standards/core/11/relay) +3. [14/WAKU2-MESSAGE – Message](https://rfc.vac.dev/waku/standards/core/14/message) +4. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon) +5. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library +6. [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) – Key words for use in RFCs to Indicate Requirement Levels + +--- + +## Changelog + +- **0.1 — Initial draft** \ No newline at end of file From 7866a68d6ee177832cae43210eea4ec0e22b20c9 Mon Sep 17 00:00:00 2001 From: pablo Date: Sun, 19 Oct 2025 15:56:33 +0300 Subject: [PATCH 02/10] fix: dictionary --- .wordlist.txt | 8 ++++++++ standards/application/segmentation.md | 12 +++--------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/.wordlist.txt b/.wordlist.txt index 8e754de..3ccef6b 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -10,14 +10,22 @@ GossipSub https iana IANA +Keccak libp2p md +Nim +nim +parityRate +protobuf pubsub rfc RFC +RLN +segmentSize SHARDING subnets Waku +Waku's WAKU www ZXCV \ No newline at end of file diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index 8156778..8686fe0 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -12,7 +12,7 @@ This specification defines an application-layer protocol for **segmentation** an ## Motivation -Waku Relay deployments typically propagate envelopes up to **1 MB**. To support larger application payloads (e.g., up to **10 MiB** or more), a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. +Waku Relay deployments typically propagate envelopes up to **1 MB**. To support larger application payloads (e.g., up to **10 MB** or more), a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. ## Terminology @@ -129,7 +129,7 @@ Implementations **SHOULD** support: **API simplicity:** Libraries **SHOULD** require only `segmentSize` from the application for normal operation. -### Supportability +### Support - **Language / Package:** Nim; **Nimble** package manager - **Intended for:** all Waku nodes at the application layer @@ -163,7 +163,7 @@ Nodes that do **not** implement this specification cannot reconstruct large mess --- -## Deployability +## Deploy - Bandwidth overhead ≈ **12.5%** from parity (if enabled) - Additional per-segment overhead ≤ **100 bytes** (protobuf + metadata) @@ -179,9 +179,3 @@ Nodes that do **not** implement this specification cannot reconstruct large mess 4. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon) 5. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library 6. [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) – Key words for use in RFCs to Indicate Requirement Levels - ---- - -## Changelog - -- **0.1 — Initial draft** \ No newline at end of file From 6384633d1bb5bccbec675049b4d976a0d4320fd0 Mon Sep 17 00:00:00 2001 From: pablo Date: Sun, 19 Oct 2025 16:01:13 +0300 Subject: [PATCH 03/10] fix: spell --- .wordlist.txt | 7 +++++++ standards/application/segmentation.md | 6 +++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/.wordlist.txt b/.wordlist.txt index 3ccef6b..febd226 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -1,6 +1,9 @@ ALLOC +Changelog creativecommons danielkaiser +dataSegments +Deployability DHT DoS github @@ -16,14 +19,18 @@ md Nim nim parityRate +proto protobuf pubsub rfc RFC RLN +SegmentMessageProto segmentSize SHARDING subnets +uint +waku Waku Waku's WAKU diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index 8686fe0..4f98bef 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -31,7 +31,7 @@ The key words **“MUST”**, **“MUST NOT”**, **“REQUIRED”**, **“SHALL When the original payload exceeds `segmentSize`, the sender **MUST**: -- Compute a 32-byte `entire_message_hash = Keccak256(originalPayload)`. +- Compute a 32-byte `entire_message_hash = Keccak256(original_payload)`. - Split the payload into one or more **data segments**, each of size up to `segmentSize` bytes. - Optionally generate **parity segments** using Reed–Solomon erasure coding, at a fixed parity rate of 12.5%. Implementations **MUST NOT** produce more than 256 total segments (data + parity). @@ -53,7 +53,7 @@ Upon receiving a segmented message, the receiver **MUST**: - Reconstruct by: - Concatenating data segments if all are present, or - Applying Reed–Solomon decoding if parity segments are available. -- Verify `Keccak256(reconstructedPayload)` matches `entire_message_hash`. +- Verify `Keccak256(reconstructed_payload)` matches `entire_message_hash`. On mismatch, the message **MUST** be discarded and logged as invalid. - Once verified, the reconstructed payload **SHALL** be delivered to the application. - Incomplete reconstructions **SHOULD** be garbage-collected after a timeout. @@ -74,7 +74,7 @@ message SegmentMessageProto { // Segment payload (data or parity shard) bytes payload = 4; - // Parity segment indexing (used iff segments_count == 0) + // Parity segment indexing (used if segments_count == 0) uint32 parity_segment_index = 5; uint32 parity_segments_count = 6; // number of parity segments (> 0) } From 83182e848d18c9adbd61c80b58397227dd24fd90 Mon Sep 17 00:00:00 2001 From: pablo Date: Tue, 21 Oct 2025 17:36:57 +0300 Subject: [PATCH 04/10] fix: pr feedback --- standards/application/segmentation.md | 48 ++++++++++++++------------- 1 file changed, 25 insertions(+), 23 deletions(-) diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index 4f98bef..af1adc2 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -8,11 +8,11 @@ status: draft ## Abstract -This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over Waku when the original payload exceeds the maximum Waku's message size. Applications partition the payload into multiple Waku envelopes and reconstruct the original on receipt, even when segments arrive out of order or up to **12.5%** of segments are lost. The protocol uses **Reed–Solomon** erasure coding for fault tolerance. Messages whose payload size is **≤ `segmentSize`** are sent unmodified. +This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over Waku when the original payload exceeds the maximum Waku's message size. Applications partition the payload into multiple Waku envelopes and reconstruct the original on receipt, even when segments arrive out of order or up to a **predefined percentage** of segments are lost. The protocol uses **Reed–Solomon** erasure coding for fault tolerance. Messages whose payload size is **≤ `segmentSize`** are sent unmodified. ## Motivation -Waku Relay deployments typically propagate envelopes up to **1 MB**. To support larger application payloads (e.g., up to **10 MB** or more), a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. +Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message Size](https://rfc.vac.dev/waku/standards/core/64/network#message-size). To support larger application payloads, a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. ## Terminology @@ -33,7 +33,7 @@ When the original payload exceeds `segmentSize`, the sender **MUST**: - Compute a 32-byte `entire_message_hash = Keccak256(original_payload)`. - Split the payload into one or more **data segments**, each of size up to `segmentSize` bytes. -- Optionally generate **parity segments** using Reed–Solomon erasure coding, at a fixed parity rate of 12.5%. +- Optionally generate **parity segments** using Reed–Solomon erasure coding, at the predefined parity rate. Implementations **MUST NOT** produce more than 256 total segments (data + parity). - Encode each segment as a `SegmentMessageProto` with: - The `entire_message_hash` @@ -92,18 +92,6 @@ No other combinations are permitted. --- -### Constants - -- **Parity rate:** `0.125` (12.5%) -- **Max total segments (data + parity):** `256` (library limitation) -- **Overhead targets:** - - Bandwidth overhead from parity segments ≤ **12.5%** overall - - Serialization/metadata overhead ≤ **100 bytes** per segment (implementation target) - -> **Note:** With a parity rate of 12.5%, reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to 12.5% loss tolerated). - ---- - ## Implementation ### Reed–Solomon @@ -124,8 +112,16 @@ Implementations **SHOULD** support: ### Configuration -- `segmentSize` — **REQUIRED** -- `parityRate` — fixed at **0.125** +**Required parameters:** + +- `segmentSize` — **REQUIRED** configurable parameter; maximum size in bytes of each data segment's payload chunk (before protobuf serialization). + +**Fixed parameters:** + +- `parityRate` — fixed at **0.125** (12.5%) +- `maxTotalSegments` — **256** (library limitation for data + parity segments combined) + +**Reconstruction capability:** With the predefined parity rate, reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to the predefined percentage of loss tolerated). **API simplicity:** Libraries **SHOULD** require only `segmentSize` from the application for normal operation. @@ -163,10 +159,15 @@ Nodes that do **not** implement this specification cannot reconstruct large mess --- -## Deploy +## Deployment Considerations -- Bandwidth overhead ≈ **12.5%** from parity (if enabled) +**Overhead:** + +- Bandwidth overhead ≈ the predefined parity rate from parity (if enabled) - Additional per-segment overhead ≤ **100 bytes** (protobuf + metadata) + +**Network impact:** + - Larger messages increase gossip traffic and storage; operators **SHOULD** consider policy limits --- @@ -175,7 +176,8 @@ Nodes that do **not** implement this specification cannot reconstruct large mess 1. [10/WAKU2 – Waku](https://rfc.vac.dev/waku/standards/core/10/waku2) 2. [11/WAKU2-RELAY – Relay](https://rfc.vac.dev/waku/standards/core/11/relay) -3. [14/WAKU2-MESSAGE – Message](https://rfc.vac.dev/waku/standards/core/14/message) -4. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon) -5. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library -6. [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) – Key words for use in RFCs to Indicate Requirement Levels +3. [14/WAKU2-MESSAGE – Message](https://rfc.vac.dev/waku/standards/core/14/message) +4. [64/WAKU2-NETWORK](https://rfc.vac.dev/waku/standards/core/64/network#message-size) +5. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon) +6. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library +7. [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) – Key words for use in RFCs to Indicate Requirement Levels From fdb95957e4e211f93cfb2abf3f069f5d03200510 Mon Sep 17 00:00:00 2001 From: pablo Date: Tue, 21 Oct 2025 17:40:35 +0300 Subject: [PATCH 05/10] fix: spelling --- .wordlist.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/.wordlist.txt b/.wordlist.txt index febd226..0fe9c8e 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -15,6 +15,7 @@ iana IANA Keccak libp2p +maxTotalSegments md Nim nim From fa2993b427f12796356a232c54be75814fac5d98 Mon Sep 17 00:00:00 2001 From: pablo Date: Tue, 21 Oct 2025 18:01:57 +0300 Subject: [PATCH 06/10] fix: pr feedback --- standards/application/segmentation.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index af1adc2..a7b4c91 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -12,7 +12,7 @@ This specification defines an application-layer protocol for **segmentation** an ## Motivation -Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message Size](https://rfc.vac.dev/waku/standards/core/64/network#message-size). To support larger application payloads, a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. +Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message](https://rfc.vac.dev/waku/standards/core/64/network#message-size). To support larger application payloads, a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. ## Terminology @@ -137,7 +137,6 @@ Implementations **SHOULD** support: ### Privacy `entire_message_hash` enables correlation of segments that belong to the same original message but does not reveal content. -Applications **SHOULD** encrypt payloads before segmentation. Traffic analysis may still identify segmented flows. ### Integrity From 9b5e7b9ac36d1bf192f52925be65277bf0e71d19 Mon Sep 17 00:00:00 2001 From: pablo Date: Wed, 22 Oct 2025 12:03:00 +0300 Subject: [PATCH 07/10] fix: pr feedback --- standards/application/segmentation.md | 115 +++++++++++++++----------- 1 file changed, 67 insertions(+), 48 deletions(-) diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index a7b4c91..130e4c2 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -1,5 +1,5 @@ --- -title: Message Segmentation and Reconstruction over Waku +title: Message Segmentation and Reconstruction name: Message Segmentation and Reconstruction tags: [waku-application, segmentation] version: 0.1 @@ -8,19 +8,27 @@ status: draft ## Abstract -This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over Waku when the original payload exceeds the maximum Waku's message size. Applications partition the payload into multiple Waku envelopes and reconstruct the original on receipt, even when segments arrive out of order or up to a **predefined percentage** of segments are lost. The protocol uses **Reed–Solomon** erasure coding for fault tolerance. Messages whose payload size is **≤ `segmentSize`** are sent unmodified. +This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over a message transport/delivery services with size limitation, when the original payload exceeds said limitation. +Applications partition the payload into multiple wire-messages envelopes and reconstruct the original on receipt, +even when segments arrive out of order or up to a **predefined percentage** of segments are lost. +The protocol uses **Reed–Solomon** erasure coding for fault tolerance. +Messages whose payload size is **≤ `segmentSize`** are sent unmodified. ## Motivation -Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message](https://rfc.vac.dev/waku/standards/core/64/network#message-size). To support larger application payloads, a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering. +Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message](https://rfc.vac.dev/waku/standards/core/64/network#message-size). +To support larger application payloads, +a segmentation layer is required. +This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. +Erasure-coded parity segments provide resilience against partial loss or reordering. ## Terminology -- **original message**: the full application payload before segmentation. -- **data segment**: one of the partitioned chunks of the original message payload. -- **parity segment**: an erasure-coded segment derived from the set of data segments. -- **segment message**: a Waku envelope whose `payload` field carries a serialized `SegmentMessageProto`. -- **`segmentSize`**: configured maximum size in bytes of each data segment’s `payload` chunk (before protobuf serialization). +- **original message**: the full application payload before segmentation. +- **data segment**: one of the partitioned chunks of the original message payload. +- **parity segment**: an erasure-coded segment derived from the set of data segments. +- **segment message**: a wire-message whose `payload` field carries a serialized `SegmentMessageProto`. +- **`segmentSize`**: configured maximum size in bytes of each data segment's `payload` chunk (before protobuf serialization). - **sender public key**: the origin identifier used for indexing persistence. The key words **“MUST”**, **“MUST NOT”**, **“REQUIRED”**, **“SHALL”**, **“SHALL NOT”**, **“SHOULD”**, **“SHOULD NOT”**, **“RECOMMENDED”**, **“NOT RECOMMENDED”**, **“MAY”**, and **“OPTIONAL”** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). @@ -29,33 +37,36 @@ The key words **“MUST”**, **“MUST NOT”**, **“REQUIRED”**, **“SHALL ### Sending -When the original payload exceeds `segmentSize`, the sender **MUST**: +When the original payload exceeds `segmentSize`, the sender: -- Compute a 32-byte `entire_message_hash = Keccak256(original_payload)`. -- Split the payload into one or more **data segments**, each of size up to `segmentSize` bytes. -- Optionally generate **parity segments** using Reed–Solomon erasure coding, at the predefined parity rate. +- **MUST** compute a 32-byte `entire_message_hash = Keccak256(original_payload)`. +- **MUST** split the payload into one or more **data segments**, + each of size up to `segmentSize` bytes. +- **MAY** use Reed–Solomon erasure coding at the predefined parity rate. Implementations **MUST NOT** produce more than 256 total segments (data + parity). - Encode each segment as a `SegmentMessageProto` with: - The `entire_message_hash` - Either data-segment indices (`segments_count`, `index`) or parity-segment indices (`parity_segments_count`, `parity_segment_index`) - The raw payload data -- Send all segments as individual Waku envelopes, preserving application-level metadata (e.g., content topic). +- Send all segments as individual Waku envelopes, + preserving application-level metadata (e.g., content topic). Messages smaller than or equal to `segmentSize` **SHALL** be transmitted unmodified. ### Receiving -Upon receiving a segmented message, the receiver **MUST**: +Upon receiving a segmented message, the receiver: -- Validate each segment according to [Wire Format → Validation](#validation). -- Persist received segments, indexed by `entire_message_hash` and sender. -- Attempt reconstruction when the number of available (data + parity) segments equals or exceeds the data segment count. -- Reconstruct by: +- **MUST** validate each segment according to [Wire Format → Validation](#validation). +- **MUST** cache received segments +- **MUST** attempt reconstruction when the number of available (data + parity) segments equals or exceeds the data segment count: - Concatenating data segments if all are present, or - Applying Reed–Solomon decoding if parity segments are available. -- Verify `Keccak256(reconstructed_payload)` matches `entire_message_hash`. - On mismatch, the message **MUST** be discarded and logged as invalid. -- Once verified, the reconstructed payload **SHALL** be delivered to the application. +- **MUST** verify `Keccak256(reconstructed_payload)` matches `entire_message_hash`. + On mismatch, + the message **MUST** be discarded and logged as invalid. +- Once verified, + the reconstructed payload **SHALL** be delivered to the application. - Incomplete reconstructions **SHOULD** be garbage-collected after a timeout. ## Wire Format @@ -69,7 +80,7 @@ message SegmentMessageProto { // Data segment indexing uint32 index = 2; // valid only if segments_count > 0 - uint32 segments_count = 3; // number of data segments (>= 2) + uint32 segment_count = 3; // number of data segments (>= 2) // Segment payload (data or parity shard) bytes payload = 4; @@ -85,49 +96,56 @@ message SegmentMessageProto { Receivers **MUST** enforce: - `entire_message_hash.length == 32` -- **Data segments:** `segments_count >= 2` **AND** `index < segments_count` -- **Parity segments:** `segments_count == 0` **AND** `parity_segments_count > 0` **AND** `parity_segment_index < parity_segments_count` +- **Data segments:** + `segments_count >= 2` **AND** `index < segments_count` +- **Parity segments:** + `segments_count == 0` **AND** `parity_segments_count > 0` **AND** `parity_segment_index < parity_segments_count` No other combinations are permitted. --- -## Implementation +## Implementation Suggestions ### Reed–Solomon -Implementations that apply parity **SHALL** use fixed-size shards of length `segmentSize`. -The last data chunk **MUST** be padded to `segmentSize` for encoding. +Implementations that apply parity **SHALL** use fixed-size shards of length `segmentSize`. +The last data chunk **MUST** be padded to `segmentSize` for encoding. The reference implementation uses **nim-leopard** (Leopard-RS) with a maximum of **256 total shards**. ### Storage / Persistence -Segments **MUST** be persisted (e.g., SQLite) and indexed by `entire_message_hash` and sender public key. +Segments **MAY** be persisted (e.g., SQLite) and indexed by `entire_message_hash` and sender public key. Implementations **SHOULD** support: -- Duplicate detection and idempotent saves -- Completion flags to prevent duplicate processing -- Timeout-based cleanup of incomplete reconstructions +- Duplicate detection and idempotent saves +- Completion flags to prevent duplicate processing +- Timeout-based cleanup of incomplete reconstructions - Per-sender quotas for stored bytes and concurrent reconstructions ### Configuration **Required parameters:** -- `segmentSize` — **REQUIRED** configurable parameter; maximum size in bytes of each data segment's payload chunk (before protobuf serialization). +- `segmentSize` — **REQUIRED** configurable parameter; + maximum size in bytes of each data segment's payload chunk (before protobuf serialization). **Fixed parameters:** - `parityRate` — fixed at **0.125** (12.5%) - `maxTotalSegments` — **256** (library limitation for data + parity segments combined) -**Reconstruction capability:** With the predefined parity rate, reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to the predefined percentage of loss tolerated). +**Reconstruction capability:** +With the predefined parity rate, +reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to the predefined percentage of loss tolerated). -**API simplicity:** Libraries **SHOULD** require only `segmentSize` from the application for normal operation. +**API simplicity:** +Libraries **SHOULD** require only `segmentSize` from the application for normal operation. ### Support -- **Language / Package:** Nim; **Nimble** package manager +- **Language / Package:** Nim; + **Nimble** package manager - **Intended for:** all Waku nodes at the application layer --- @@ -136,25 +154,25 @@ Implementations **SHOULD** support: ### Privacy -`entire_message_hash` enables correlation of segments that belong to the same original message but does not reveal content. +`entire_message_hash` enables correlation of segments that belong to the same original message but does not reveal content. Traffic analysis may still identify segmented flows. ### Integrity -Implementations **MUST** verify the Keccak256 hash post-reconstruction and discard on mismatch. +Implementations **MUST** verify the Keccak256 hash post-reconstruction and discard on mismatch. ### Denial of Service To mitigate resource exhaustion: -- Limit concurrent reconstructions and per-sender storage -- Enforce timeouts and size caps -- Validate segment counts (≤ 256) +- Limit concurrent reconstructions and per-sender storage +- Enforce timeouts and size caps +- Validate segment counts (≤ 256) - Consider rate-limiting using [17/WAKU2-RLN-RELAY](https://rfc.vac.dev/waku/standards/core/17/rln-relay) ### Compatibility -Nodes that do **not** implement this specification cannot reconstruct large messages. +Nodes that do **not** implement this specification cannot reconstruct large messages. --- @@ -162,21 +180,22 @@ Nodes that do **not** implement this specification cannot reconstruct large mess **Overhead:** -- Bandwidth overhead ≈ the predefined parity rate from parity (if enabled) -- Additional per-segment overhead ≤ **100 bytes** (protobuf + metadata) +- Bandwidth overhead ≈ the predefined parity rate from parity (if enabled) +- Additional per-segment overhead ≤ **100 bytes** (protobuf + metadata) **Network impact:** -- Larger messages increase gossip traffic and storage; operators **SHOULD** consider policy limits +- Larger messages increase gossip traffic and storage; + operators **SHOULD** consider policy limits --- ## References -1. [10/WAKU2 – Waku](https://rfc.vac.dev/waku/standards/core/10/waku2) -2. [11/WAKU2-RELAY – Relay](https://rfc.vac.dev/waku/standards/core/11/relay) +1. [10/WAKU2 – Waku](https://rfc.vac.dev/waku/standards/core/10/waku2) +2. [11/WAKU2-RELAY – Relay](https://rfc.vac.dev/waku/standards/core/11/relay) 3. [14/WAKU2-MESSAGE – Message](https://rfc.vac.dev/waku/standards/core/14/message) 4. [64/WAKU2-NETWORK](https://rfc.vac.dev/waku/standards/core/64/network#message-size) -5. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon) -6. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library +5. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon) +6. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library 7. [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) – Key words for use in RFCs to Indicate Requirement Levels From f1804a8dff77989d7d345396e189b19d44163529 Mon Sep 17 00:00:00 2001 From: pablo Date: Mon, 27 Oct 2025 08:59:44 +0200 Subject: [PATCH 08/10] fix: remove limit from spec, leave in the implementation --- standards/application/segmentation.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index 130e4c2..bc6d855 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -43,7 +43,6 @@ When the original payload exceeds `segmentSize`, the sender: - **MUST** split the payload into one or more **data segments**, each of size up to `segmentSize` bytes. - **MAY** use Reed–Solomon erasure coding at the predefined parity rate. - Implementations **MUST NOT** produce more than 256 total segments (data + parity). - Encode each segment as a `SegmentMessageProto` with: - The `entire_message_hash` - Either data-segment indices (`segments_count`, `index`) or parity-segment indices (`parity_segments_count`, `parity_segment_index`) @@ -133,7 +132,7 @@ Implementations **SHOULD** support: **Fixed parameters:** - `parityRate` — fixed at **0.125** (12.5%) -- `maxTotalSegments` — **256** (library limitation for data + parity segments combined) +- `maxTotalSegments` — **256** **Reconstruction capability:** With the predefined parity rate, From 831af4563d476ea4d30e277711ac7592f5286fd7 Mon Sep 17 00:00:00 2001 From: pablo Date: Mon, 27 Oct 2025 10:10:46 +0200 Subject: [PATCH 09/10] fix: add wire format before descriptions --- standards/application/segmentation.md | 83 ++++++++++++++++----------- 1 file changed, 48 insertions(+), 35 deletions(-) diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index bc6d855..e5a0c7c 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -31,7 +31,54 @@ Erasure-coded parity segments provide resilience against partial loss or reorder - **`segmentSize`**: configured maximum size in bytes of each data segment's `payload` chunk (before protobuf serialization). - **sender public key**: the origin identifier used for indexing persistence. -The key words **“MUST”**, **“MUST NOT”**, **“REQUIRED”**, **“SHALL”**, **“SHALL NOT”**, **“SHOULD”**, **“SHOULD NOT”**, **“RECOMMENDED”**, **“NOT RECOMMENDED”**, **“MAY”**, and **“OPTIONAL”** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). +The key words **"MUST"**, **"MUST NOT"**, **"REQUIRED"**, **"SHALL"**, **"SHALL NOT"**, **"SHOULD"**, **"SHOULD NOT"**, **"RECOMMENDED"**, **"NOT RECOMMENDED"**, **"MAY"**, and **"OPTIONAL"** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + +## Wire Format + +Each segmented message is encoded as a `SegmentMessageProto` protobuf message: + +```protobuf +syntax = "proto3"; + +message SegmentMessageProto { + // Keccak256(original payload), 32 bytes + bytes entire_message_hash = 1; + + // Data segment indexing + uint32 index = 2; // zero-based sequence number; valid only if segments_count > 0 + uint32 segment_count = 3; // number of data segments (>= 2) + + // Segment payload (data or parity shard) + bytes payload = 4; + + // Parity segment indexing (used if segments_count == 0) + uint32 parity_segment_index = 5; // zero-based sequence number for parity segments + uint32 parity_segments_count = 6; // number of parity segments (> 0) +} +``` + +**Field descriptions:** + +- `entire_message_hash`: A 32-byte Keccak256 hash of the original complete payload, used to identify which segments belong together and verify reconstruction integrity. +- `index`: Zero-based sequence number identifying this data segment's position (0, 1, 2, ..., segments_count - 1). +- `segment_count`: Total number of data segments the original message was split into. +- `payload`: The actual chunk of data or parity information for this segment. +- `parity_segment_index`: Zero-based sequence number for parity segments. +- `parity_segments_count`: Total number of parity segments generated. + +A message is either a **data segment** (when `segment_count > 0`) or a **parity segment** (when `segment_count == 0`). + +### Validation + +Receivers **MUST** enforce: + +- `entire_message_hash.length == 32` +- **Data segments:** + `segments_count >= 2` **AND** `index < segments_count` +- **Parity segments:** + `segments_count == 0` **AND** `parity_segments_count > 0` **AND** `parity_segment_index < parity_segments_count` + +No other combinations are permitted. ## Segmentation @@ -68,40 +115,6 @@ Upon receiving a segmented message, the receiver: the reconstructed payload **SHALL** be delivered to the application. - Incomplete reconstructions **SHOULD** be garbage-collected after a timeout. -## Wire Format - -```protobuf -syntax = "proto3"; - -message SegmentMessageProto { - // Keccak256(original payload), 32 bytes - bytes entire_message_hash = 1; - - // Data segment indexing - uint32 index = 2; // valid only if segments_count > 0 - uint32 segment_count = 3; // number of data segments (>= 2) - - // Segment payload (data or parity shard) - bytes payload = 4; - - // Parity segment indexing (used if segments_count == 0) - uint32 parity_segment_index = 5; - uint32 parity_segments_count = 6; // number of parity segments (> 0) -} -``` - -### Validation - -Receivers **MUST** enforce: - -- `entire_message_hash.length == 32` -- **Data segments:** - `segments_count >= 2` **AND** `index < segments_count` -- **Parity segments:** - `segments_count == 0` **AND** `parity_segments_count > 0` **AND** `parity_segment_index < parity_segments_count` - -No other combinations are permitted. - --- ## Implementation Suggestions From 0ff0c97842d56d5a2a6537157d74f5efd6d035f1 Mon Sep 17 00:00:00 2001 From: pablo Date: Mon, 27 Oct 2025 10:15:02 +0200 Subject: [PATCH 10/10] fix: rename --- standards/application/segmentation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index e5a0c7c..3d1d128 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -24,7 +24,7 @@ Erasure-coded parity segments provide resilience against partial loss or reorder ## Terminology -- **original message**: the full application payload before segmentation. +- **original payload**: the full application payload before segmentation. - **data segment**: one of the partitioned chunks of the original message payload. - **parity segment**: an erasure-coded segment derived from the set of data segments. - **segment message**: a wire-message whose `payload` field carries a serialized `SegmentMessageProto`.