title: Message Segmentation and Reconstruction over Waku
name: Message Segmentation and Reconstruction
tags: [waku-application, segmentation]
version: 0.1
status: draft
---
## Abstract
This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over Waku when the original payload exceeds the maximum Waku's message size. Applications partition the payload into multiple Waku envelopes and reconstruct the original on receipt, even when segments arrive out of order or up to **12.5%** of segments are lost. The protocol uses **Reed–Solomon** erasure coding for fault tolerance. Messages whose payload size is **≤ `segmentSize`** are sent unmodified.
Waku Relay deployments typically propagate envelopes up to **1 MB**. To support larger application payloads (e.g., up to **10 MB** or more), a segmentation layer is required. This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver. Erasure-coded parity segments provide resilience against partial loss or reordering.
- **original message**: the full application payload before segmentation.
- **data segment**: one of the partitioned chunks of the original message payload.
- **parity segment**: an erasure-coded segment derived from the set of data segments.
- **segment message**: a Waku envelope whose `payload` field carries a serialized `SegmentMessageProto`.
- **`segmentSize`**: configured maximum size in bytes of each data segment’s `payload` chunk (before protobuf serialization).
- **sender public key**: the origin identifier used for indexing persistence.
The key words **“MUST”**, **“MUST NOT”**, **“REQUIRED”**, **“SHALL”**, **“SHALL NOT”**, **“SHOULD”**, **“SHOULD NOT”**, **“RECOMMENDED”**, **“NOT RECOMMENDED”**, **“MAY”**, and **“OPTIONAL”** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
## Segmentation
### Sending
When the original payload exceeds `segmentSize`, the sender **MUST**:
- **Max total segments (data + parity):** `256` (library limitation)
- **Overhead targets:**
- Bandwidth overhead from parity segments ≤ **12.5%** overall
- Serialization/metadata overhead ≤ **100 bytes** per segment (implementation target)
> **Note:** With a parity rate of 12.5%, reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to 12.5% loss tolerated).
---
## Implementation
### Reed–Solomon
Implementations that apply parity **SHALL** use fixed-size shards of length `segmentSize`.
The last data chunk **MUST** be padded to `segmentSize` for encoding.
The reference implementation uses **nim-leopard** (Leopard-RS) with a maximum of **256 total shards**.
### Storage / Persistence
Segments **MUST** be persisted (e.g., SQLite) and indexed by `entire_message_hash` and sender public key.
Implementations **SHOULD** support:
- Duplicate detection and idempotent saves
- Completion flags to prevent duplicate processing
- Timeout-based cleanup of incomplete reconstructions
- Per-sender quotas for stored bytes and concurrent reconstructions
### Configuration
-`segmentSize` — **REQUIRED**
-`parityRate` — fixed at **0.125**
**API simplicity:** Libraries **SHOULD** require only `segmentSize` from the application for normal operation.