mirror of
https://github.com/logos-messaging/specs.git
synced 2026-01-05 23:53:12 +00:00
Merge 0ff0c97842d56d5a2a6537157d74f5efd6d035f1 into c5fe03e5166e5f8032c445d02a23d57a88a5fe81
This commit is contained in:
commit
8ee28982f1
@ -1,6 +1,9 @@
|
|||||||
ALLOC
|
ALLOC
|
||||||
|
Changelog
|
||||||
creativecommons
|
creativecommons
|
||||||
danielkaiser
|
danielkaiser
|
||||||
|
dataSegments
|
||||||
|
Deployability
|
||||||
DHT
|
DHT
|
||||||
DoS
|
DoS
|
||||||
github
|
github
|
||||||
@ -10,14 +13,27 @@ GossipSub
|
|||||||
https
|
https
|
||||||
iana
|
iana
|
||||||
IANA
|
IANA
|
||||||
|
Keccak
|
||||||
libp2p
|
libp2p
|
||||||
|
maxTotalSegments
|
||||||
md
|
md
|
||||||
|
Nim
|
||||||
|
nim
|
||||||
|
parityRate
|
||||||
|
proto
|
||||||
|
protobuf
|
||||||
pubsub
|
pubsub
|
||||||
rfc
|
rfc
|
||||||
RFC
|
RFC
|
||||||
|
RLN
|
||||||
|
SegmentMessageProto
|
||||||
|
segmentSize
|
||||||
SHARDING
|
SHARDING
|
||||||
subnets
|
subnets
|
||||||
|
uint
|
||||||
|
waku
|
||||||
Waku
|
Waku
|
||||||
|
Waku's
|
||||||
WAKU
|
WAKU
|
||||||
www
|
www
|
||||||
ZXCV
|
ZXCV
|
||||||
213
standards/application/segmentation.md
Normal file
213
standards/application/segmentation.md
Normal file
@ -0,0 +1,213 @@
|
|||||||
|
---
|
||||||
|
title: Message Segmentation and Reconstruction
|
||||||
|
name: Message Segmentation and Reconstruction
|
||||||
|
tags: [waku-application, segmentation]
|
||||||
|
version: 0.1
|
||||||
|
status: draft
|
||||||
|
---
|
||||||
|
|
||||||
|
## Abstract
|
||||||
|
|
||||||
|
This specification defines an application-layer protocol for **segmentation** and **reconstruction** of messages carried over a message transport/delivery services with size limitation, when the original payload exceeds said limitation.
|
||||||
|
Applications partition the payload into multiple wire-messages envelopes and reconstruct the original on receipt,
|
||||||
|
even when segments arrive out of order or up to a **predefined percentage** of segments are lost.
|
||||||
|
The protocol uses **Reed–Solomon** erasure coding for fault tolerance.
|
||||||
|
Messages whose payload size is **≤ `segmentSize`** are sent unmodified.
|
||||||
|
|
||||||
|
## Motivation
|
||||||
|
|
||||||
|
Waku Relay deployments typically propagate envelopes up to **150 KB** as per [64/WAKU2-NETWORK - Message](https://rfc.vac.dev/waku/standards/core/64/network#message-size).
|
||||||
|
To support larger application payloads,
|
||||||
|
a segmentation layer is required.
|
||||||
|
This specification enables larger messages by partitioning them into multiple envelopes and reconstructing them at the receiver.
|
||||||
|
Erasure-coded parity segments provide resilience against partial loss or reordering.
|
||||||
|
|
||||||
|
## Terminology
|
||||||
|
|
||||||
|
- **original payload**: the full application payload before segmentation.
|
||||||
|
- **data segment**: one of the partitioned chunks of the original message payload.
|
||||||
|
- **parity segment**: an erasure-coded segment derived from the set of data segments.
|
||||||
|
- **segment message**: a wire-message whose `payload` field carries a serialized `SegmentMessageProto`.
|
||||||
|
- **`segmentSize`**: configured maximum size in bytes of each data segment's `payload` chunk (before protobuf serialization).
|
||||||
|
- **sender public key**: the origin identifier used for indexing persistence.
|
||||||
|
|
||||||
|
The key words **"MUST"**, **"MUST NOT"**, **"REQUIRED"**, **"SHALL"**, **"SHALL NOT"**, **"SHOULD"**, **"SHOULD NOT"**, **"RECOMMENDED"**, **"NOT RECOMMENDED"**, **"MAY"**, and **"OPTIONAL"** in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||||
|
|
||||||
|
## Wire Format
|
||||||
|
|
||||||
|
Each segmented message is encoded as a `SegmentMessageProto` protobuf message:
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
syntax = "proto3";
|
||||||
|
|
||||||
|
message SegmentMessageProto {
|
||||||
|
// Keccak256(original payload), 32 bytes
|
||||||
|
bytes entire_message_hash = 1;
|
||||||
|
|
||||||
|
// Data segment indexing
|
||||||
|
uint32 index = 2; // zero-based sequence number; valid only if segments_count > 0
|
||||||
|
uint32 segment_count = 3; // number of data segments (>= 2)
|
||||||
|
|
||||||
|
// Segment payload (data or parity shard)
|
||||||
|
bytes payload = 4;
|
||||||
|
|
||||||
|
// Parity segment indexing (used if segments_count == 0)
|
||||||
|
uint32 parity_segment_index = 5; // zero-based sequence number for parity segments
|
||||||
|
uint32 parity_segments_count = 6; // number of parity segments (> 0)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Field descriptions:**
|
||||||
|
|
||||||
|
- `entire_message_hash`: A 32-byte Keccak256 hash of the original complete payload, used to identify which segments belong together and verify reconstruction integrity.
|
||||||
|
- `index`: Zero-based sequence number identifying this data segment's position (0, 1, 2, ..., segments_count - 1).
|
||||||
|
- `segment_count`: Total number of data segments the original message was split into.
|
||||||
|
- `payload`: The actual chunk of data or parity information for this segment.
|
||||||
|
- `parity_segment_index`: Zero-based sequence number for parity segments.
|
||||||
|
- `parity_segments_count`: Total number of parity segments generated.
|
||||||
|
|
||||||
|
A message is either a **data segment** (when `segment_count > 0`) or a **parity segment** (when `segment_count == 0`).
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
Receivers **MUST** enforce:
|
||||||
|
|
||||||
|
- `entire_message_hash.length == 32`
|
||||||
|
- **Data segments:**
|
||||||
|
`segments_count >= 2` **AND** `index < segments_count`
|
||||||
|
- **Parity segments:**
|
||||||
|
`segments_count == 0` **AND** `parity_segments_count > 0` **AND** `parity_segment_index < parity_segments_count`
|
||||||
|
|
||||||
|
No other combinations are permitted.
|
||||||
|
|
||||||
|
## Segmentation
|
||||||
|
|
||||||
|
### Sending
|
||||||
|
|
||||||
|
When the original payload exceeds `segmentSize`, the sender:
|
||||||
|
|
||||||
|
- **MUST** compute a 32-byte `entire_message_hash = Keccak256(original_payload)`.
|
||||||
|
- **MUST** split the payload into one or more **data segments**,
|
||||||
|
each of size up to `segmentSize` bytes.
|
||||||
|
- **MAY** use Reed–Solomon erasure coding at the predefined parity rate.
|
||||||
|
- Encode each segment as a `SegmentMessageProto` with:
|
||||||
|
- The `entire_message_hash`
|
||||||
|
- Either data-segment indices (`segments_count`, `index`) or parity-segment indices (`parity_segments_count`, `parity_segment_index`)
|
||||||
|
- The raw payload data
|
||||||
|
- Send all segments as individual Waku envelopes,
|
||||||
|
preserving application-level metadata (e.g., content topic).
|
||||||
|
|
||||||
|
Messages smaller than or equal to `segmentSize` **SHALL** be transmitted unmodified.
|
||||||
|
|
||||||
|
### Receiving
|
||||||
|
|
||||||
|
Upon receiving a segmented message, the receiver:
|
||||||
|
|
||||||
|
- **MUST** validate each segment according to [Wire Format → Validation](#validation).
|
||||||
|
- **MUST** cache received segments
|
||||||
|
- **MUST** attempt reconstruction when the number of available (data + parity) segments equals or exceeds the data segment count:
|
||||||
|
- Concatenating data segments if all are present, or
|
||||||
|
- Applying Reed–Solomon decoding if parity segments are available.
|
||||||
|
- **MUST** verify `Keccak256(reconstructed_payload)` matches `entire_message_hash`.
|
||||||
|
On mismatch,
|
||||||
|
the message **MUST** be discarded and logged as invalid.
|
||||||
|
- Once verified,
|
||||||
|
the reconstructed payload **SHALL** be delivered to the application.
|
||||||
|
- Incomplete reconstructions **SHOULD** be garbage-collected after a timeout.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Suggestions
|
||||||
|
|
||||||
|
### Reed–Solomon
|
||||||
|
|
||||||
|
Implementations that apply parity **SHALL** use fixed-size shards of length `segmentSize`.
|
||||||
|
The last data chunk **MUST** be padded to `segmentSize` for encoding.
|
||||||
|
The reference implementation uses **nim-leopard** (Leopard-RS) with a maximum of **256 total shards**.
|
||||||
|
|
||||||
|
### Storage / Persistence
|
||||||
|
|
||||||
|
Segments **MAY** be persisted (e.g., SQLite) and indexed by `entire_message_hash` and sender public key.
|
||||||
|
Implementations **SHOULD** support:
|
||||||
|
|
||||||
|
- Duplicate detection and idempotent saves
|
||||||
|
- Completion flags to prevent duplicate processing
|
||||||
|
- Timeout-based cleanup of incomplete reconstructions
|
||||||
|
- Per-sender quotas for stored bytes and concurrent reconstructions
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
**Required parameters:**
|
||||||
|
|
||||||
|
- `segmentSize` — **REQUIRED** configurable parameter;
|
||||||
|
maximum size in bytes of each data segment's payload chunk (before protobuf serialization).
|
||||||
|
|
||||||
|
**Fixed parameters:**
|
||||||
|
|
||||||
|
- `parityRate` — fixed at **0.125** (12.5%)
|
||||||
|
- `maxTotalSegments` — **256**
|
||||||
|
|
||||||
|
**Reconstruction capability:**
|
||||||
|
With the predefined parity rate,
|
||||||
|
reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `dataSegments` (i.e., up to the predefined percentage of loss tolerated).
|
||||||
|
|
||||||
|
**API simplicity:**
|
||||||
|
Libraries **SHOULD** require only `segmentSize` from the application for normal operation.
|
||||||
|
|
||||||
|
### Support
|
||||||
|
|
||||||
|
- **Language / Package:** Nim;
|
||||||
|
**Nimble** package manager
|
||||||
|
- **Intended for:** all Waku nodes at the application layer
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
### Privacy
|
||||||
|
|
||||||
|
`entire_message_hash` enables correlation of segments that belong to the same original message but does not reveal content.
|
||||||
|
Traffic analysis may still identify segmented flows.
|
||||||
|
|
||||||
|
### Integrity
|
||||||
|
|
||||||
|
Implementations **MUST** verify the Keccak256 hash post-reconstruction and discard on mismatch.
|
||||||
|
|
||||||
|
### Denial of Service
|
||||||
|
|
||||||
|
To mitigate resource exhaustion:
|
||||||
|
|
||||||
|
- Limit concurrent reconstructions and per-sender storage
|
||||||
|
- Enforce timeouts and size caps
|
||||||
|
- Validate segment counts (≤ 256)
|
||||||
|
- Consider rate-limiting using [17/WAKU2-RLN-RELAY](https://rfc.vac.dev/waku/standards/core/17/rln-relay)
|
||||||
|
|
||||||
|
### Compatibility
|
||||||
|
|
||||||
|
Nodes that do **not** implement this specification cannot reconstruct large messages.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment Considerations
|
||||||
|
|
||||||
|
**Overhead:**
|
||||||
|
|
||||||
|
- Bandwidth overhead ≈ the predefined parity rate from parity (if enabled)
|
||||||
|
- Additional per-segment overhead ≤ **100 bytes** (protobuf + metadata)
|
||||||
|
|
||||||
|
**Network impact:**
|
||||||
|
|
||||||
|
- Larger messages increase gossip traffic and storage;
|
||||||
|
operators **SHOULD** consider policy limits
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
1. [10/WAKU2 – Waku](https://rfc.vac.dev/waku/standards/core/10/waku2)
|
||||||
|
2. [11/WAKU2-RELAY – Relay](https://rfc.vac.dev/waku/standards/core/11/relay)
|
||||||
|
3. [14/WAKU2-MESSAGE – Message](https://rfc.vac.dev/waku/standards/core/14/message)
|
||||||
|
4. [64/WAKU2-NETWORK](https://rfc.vac.dev/waku/standards/core/64/network#message-size)
|
||||||
|
5. [nim-leopard](https://github.com/status-im/nim-leopard) – Nim bindings for Leopard-RS (Reed–Solomon)
|
||||||
|
6. [Leopard-RS](https://github.com/catid/leopard) – Fast Reed–Solomon erasure coding library
|
||||||
|
7. [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) – Key words for use in RFCs to Indicate Requirement Levels
|
||||||
Loading…
x
Reference in New Issue
Block a user