Merge 0ff0c97842d56d5a2a6537157d74f5efd6d035f1 into c5fe03e5166e5f8032c445d02a23d57a88a5fe81

2026-03-14 11:03:36 +00:00 · 2025-11-21 17:53:07 +02:00 · 2025-11-21 17:53:07 +02:00 · 8ee28982f1
commit 8ee28982f1
parent c5fe03e516 0ff0c97842
2 changed files with 229 additions and 0 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@ -1,6 +1,9 @@
 ALLOC
+Changelog
 creativecommons
 danielkaiser
+dataSegments
+Deployability
 DHT
 DoS
 github
@ -10,14 +13,27 @@ GossipSub
 https
 iana
 IANA
+Keccak
 libp2p
+maxTotalSegments
 md
+Nim
+nim
+parityRate
+proto
+protobuf
 pubsub
 rfc
 RFC
+RLN
+SegmentMessageProto
+segmentSize
 SHARDING
 subnets
+uint
+waku
 Waku
+Waku's
 WAKU
 www
 ZXCV
--- a/standards/application/segmentation.md
+++ b/standards/application/segmentation.md
@ -0,0 +1,213 @@
+---
+title: Message Segmentation and Reconstruction
+name: Message Segmentation and Reconstruction
+tags: [waku-application, segmentation]
+version: 0.1
+status: draft
+---
+
+## Abstract
+
+This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over a message transport/delivery services with size limitation, when the original payload exceeds said limitation.
+Applications partition the payload into multiple wire-messages envelopes and reconstruct the original on receipt,
+even when segments arrive out of order or up to a **predefined percentage** of segments are lost.
+The protocol uses **Reed–Solomon** erasure coding for fault tolerance.
+Messages whose payload size is **≤ `segmentSize`** are sent unmodified.
+
+## Motivation
+
+Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message](https://rfc.vac.dev/waku/standards/core/64/network#message-size).
+To support larger application payloads,
+a segmentation layer is required.
+This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver.
+Erasure-coded parity segments provide resilience against partial loss or reordering.
+
+## Terminology
+
+- **original payload**: the full application payload before segmentation.
+- **data segment**: one of the partitioned chunks of the original message payload.
+- **parity segment**: an erasure-coded segment derived from the set of data segments.
+- **segment message**: a wire-message whose `payload` field carries a serialized `SegmentMessageProto`.
+- **`segmentSize`**: configured maximum size in bytes of each data segment's `payload` chunk (before protobuf serialization).
+- **sender public key**: the origin identifier used for indexing persistence.
+
+The key words **"MUST"**, **"MUST NOT"**, **"REQUIRED"**, **"SHALL"**, **"SHALL NOT"**, **"SHOULD"**, **"SHOULD NOT"**, **"RECOMMENDED"**, **"NOT RECOMMENDED"**, **"MAY"**, and **"OPTIONAL"** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
+
+## Wire Format
+
+Each segmented message is encoded as a `SegmentMessageProto` protobuf message:
+
+```protobuf
+syntax = "proto3";
+
+message SegmentMessageProto {
+  // Keccak256(original payload), 32 bytes
+  bytes  entire_message_hash    = 1;
+
+  // Data segment indexing
+  uint32 index                  = 2; // zero-based sequence number; valid only if segments_count > 0
+  uint32 segment_count          = 3; // number of data segments (>= 2)
+
+  // Segment payload (data or parity shard)
+  bytes  payload                = 4;
+
+  // Parity segment indexing (used if segments_count == 0)
+  uint32 parity_segment_index   = 5; // zero-based sequence number for parity segments
+  uint32 parity_segments_count  = 6; // number of parity segments (> 0)
+}
+```
+
+**Field descriptions:**
+
+- `entire_message_hash`: A 32-byte Keccak256 hash of the original complete payload, used to identify which segments belong together and verify reconstruction integrity.
+- `index`: Zero-based sequence number identifying this data segment's position (0, 1, 2, ..., segments_count - 1).
+- `segment_count`: Total number of data segments the original message was split into.
+- `payload`: The actual chunk of data or parity information for this segment.
+- `parity_segment_index`: Zero-based sequence number for parity segments.
+- `parity_segments_count`: Total number of parity segments generated.
+
+A message is either a **data segment** (when `segment_count > 0`) or a **parity segment** (when `segment_count == 0`).
+
+### Validation
+
+Receivers **MUST** enforce:
+
+- `entire_message_hash.length == 32`
+- **Data segments:**
+  `segments_count >= 2` **AND** `index < segments_count`
+- **Parity segments:**
+  `segments_count == 0` **AND** `parity_segments_count > 0` **AND** `parity_segment_index < parity_segments_count`
+
+No other combinations are permitted.
+
+## Segmentation
+
+### Sending
+
+When the original payload exceeds `segmentSize`, the sender:
+
+- **MUST** compute a 32-byte `entire_message_hash = Keccak256(original_payload)`.
+- **MUST** split the payload into one or more **data segments**,
+  each of size up to `segmentSize` bytes.
+- **MAY** use Reed–Solomon erasure coding at the predefined parity rate.
+- Encode each segment as a `SegmentMessageProto` with:
+  - The `entire_message_hash`
+  - Either data-segment indices (`segments_count`, `index`) or parity-segment indices (`parity_segments_count`, `parity_segment_index`)
+  - The raw payload data
+- Send all segments as individual Waku envelopes,
+  preserving application-level metadata (e.g., content topic).
+
+Messages smaller than or equal to `segmentSize` **SHALL** be transmitted unmodified.
+
+### Receiving
+
+Upon receiving a segmented message, the receiver:
+
+- **MUST** validate each segment according to [Wire Format → Validation](#validation).
+- **MUST** cache received segments
+- **MUST** attempt reconstruction when the number of available (data + parity) segments equals or exceeds the data segment count:
+  - Concatenating data segments if all are present, or
+  - Applying Reed–Solomon decoding if parity segments are available.
+- **MUST** verify `Keccak256(reconstructed_payload)` matches `entire_message_hash`.
+  On mismatch,
+  the message **MUST** be discarded and logged as invalid.
+- Once verified,
+  the reconstructed payload **SHALL** be delivered to the application.
+- Incomplete reconstructions **SHOULD** be garbage-collected after a timeout.
+
+---
+
+## Implementation Suggestions
+
+### Reed–Solomon
+
+Implementations that apply parity **SHALL** use fixed-size shards of length `segmentSize`.
+The last data chunk **MUST** be padded to `segmentSize` for encoding.
+The reference implementation uses **nim-leopard** (Leopard-RS) with a maximum of **256 total shards**.
+
+### Storage / Persistence
+
+Segments **MAY** be persisted (e.g., SQLite) and indexed by `entire_message_hash` and sender public key.
+Implementations **SHOULD** support:
+
+- Duplicate detection and idempotent saves
+- Completion flags to prevent duplicate processing
+- Timeout-based cleanup of incomplete reconstructions
+- Per-sender quotas for stored bytes and concurrent reconstructions
+
+### Configuration
+
+**Required parameters:**
+
+- `segmentSize` — **REQUIRED** configurable parameter;
+  maximum size in bytes of each data segment's payload chunk (before protobuf serialization).
+
+**Fixed parameters:**
+
+- `parityRate` — fixed at **0.125** (12.5%)
+- `maxTotalSegments` — **256**
+
+**Reconstruction capability:**
+With the predefined parity rate,
+reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to the predefined percentage of loss tolerated).
+
+**API simplicity:**
+Libraries **SHOULD** require only `segmentSize` from the application for normal operation.
+
+### Support
+
+- **Language / Package:** Nim;
+  **Nimble** package manager
+- **Intended for:** all Waku nodes at the application layer
+
+---
+
+## Security Considerations
+
+### Privacy
+
+`entire_message_hash` enables correlation of segments that belong to the same original message but does not reveal content.
+Traffic analysis may still identify segmented flows.
+
+### Integrity
+
+Implementations **MUST** verify the Keccak256 hash post-reconstruction and discard on mismatch.
+
+### Denial of Service
+
+To mitigate resource exhaustion:
+
+- Limit concurrent reconstructions and per-sender storage
+- Enforce timeouts and size caps
+- Validate segment counts (≤ 256)
+- Consider rate-limiting using [17/WAKU2-RLN-RELAY](https://rfc.vac.dev/waku/standards/core/17/rln-relay)
+
+### Compatibility
+
+Nodes that do **not** implement this specification cannot reconstruct large messages.
+
+---
+
+## Deployment Considerations
+
+**Overhead:**
+
+- Bandwidth overhead ≈ the predefined parity rate from parity (if enabled)
+- Additional per-segment overhead ≤ **100 bytes** (protobuf + metadata)
+
+**Network impact:**
+
+- Larger messages increase gossip traffic and storage;
+  operators **SHOULD** consider policy limits
+
+---
+
+## References
+
+1. [10/WAKU2 – Waku](https://rfc.vac.dev/waku/standards/core/10/waku2)
+2. [11/WAKU2-RELAY – Relay](https://rfc.vac.dev/waku/standards/core/11/relay)
+3. [14/WAKU2-MESSAGE – Message](https://rfc.vac.dev/waku/standards/core/14/message)
+4. [64/WAKU2-NETWORK](https://rfc.vac.dev/waku/standards/core/64/network#message-size)
+5. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon)
+6. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library
+7. [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) – Key words for use in RFCs to Indicate Requirement Levels