This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over a message transport/delivery services with size limitation, when the original payload exceeds said limitation.
Applications partition the payload into multiple wire-messages envelopes and reconstruct the original on receipt,
even when segments arrive out of order or up to a **predefined percentage** of segments are lost.
The protocol uses **Reed–Solomon** erasure coding for fault tolerance.
Messages whose payload size is **≤ `segmentSize`** are sent unmodified.
Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message](https://rfc.vac.dev/waku/standards/core/64/network#message-size).
To support larger application payloads,
a segmentation layer is required.
This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver.
Erasure-coded parity segments provide resilience against partial loss or reordering.
The key words **"MUST"**, **"MUST NOT"**, **"REQUIRED"**, **"SHALL"**, **"SHALL NOT"**, **"SHOULD"**, **"SHOULD NOT"**, **"RECOMMENDED"**, **"NOT RECOMMENDED"**, **"MAY"**, and **"OPTIONAL"** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
## Wire Format
Each segmented message is encoded as a `SegmentMessageProto` protobuf message:
```protobuf
syntax = "proto3";
message SegmentMessageProto {
// Keccak256(original payload), 32 bytes
bytes entire_message_hash = 1;
// Data segment indexing
uint32 index = 2; // zero-based sequence number; valid only if segments_count > 0
uint32 segment_count = 3; // number of data segments (>= 2)
// Segment payload (data or parity shard)
bytes payload = 4;
// Parity segment indexing (used if segments_count == 0)
uint32 parity_segment_index = 5; // zero-based sequence number for parity segments
uint32 parity_segments_count = 6; // number of parity segments (> 0)
}
```
**Field descriptions:**
-`entire_message_hash`: A 32-byte Keccak256 hash of the original complete payload, used to identify which segments belong together and verify reconstruction integrity.
-`index`: Zero-based sequence number identifying this data segment's position (0, 1, 2, ..., segments_count - 1).
-`segment_count`: Total number of data segments the original message was split into.
-`payload`: The actual chunk of data or parity information for this segment.
-`parity_segment_index`: Zero-based sequence number for parity segments.
-`parity_segments_count`: Total number of parity segments generated.
A message is either a **data segment** (when `segment_count > 0`) or a **parity segment** (when `segment_count == 0`).
reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to the predefined percentage of loss tolerated).