From f7970ee404a1b29fc3953b43d4fcb6877e0813bf Mon Sep 17 00:00:00 2001 From: Igor Sirotin Date: Tue, 5 May 2026 22:37:30 +0100 Subject: [PATCH] address comments, iteration 1 --- standards/application/segmentation.md | 43 ++++++++++++++------------- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/standards/application/segmentation.md b/standards/application/segmentation.md index e1db2cb..fc7d46d 100644 --- a/standards/application/segmentation.md +++ b/standards/application/segmentation.md @@ -44,15 +44,15 @@ message SegmentMessageProto { bytes entire_message_hash = 1; // Data segment indexing - uint32 index = 2; // zero-based sequence number for data segments - uint32 segments_count = 3; // number of data segments (>= 1) + uint32 data_segment_index = 2; // zero-indexed sequence number for data segments + uint32 data_segment_count = 3; // number of data segments (>= 1) // Segment payload (data or parity shard) bytes payload = 4; // Parity segment indexing uint32 parity_segment_index = 5; // zero-based sequence number for parity segments - uint32 parity_segments_count = 6; // number of parity segments + uint32 parity_segment_count = 6; // number of parity segments // Segment type bool is_parity = 7; // true for parity segments, false (default) for data segments @@ -62,11 +62,11 @@ message SegmentMessageProto { **Field descriptions:** - `entire_message_hash`: A 32-byte Keccak256 hash of the original complete payload, used to identify which segments belong together and verify reconstruction integrity. -- `index`: Zero-based sequence number identifying this data segment's position (0, 1, 2, ..., segments_count - 1). Set only on data segments. -- `segments_count`: Total number of data segments the original message was split into. Set on every segment (data and parity). +- `data_segment_index`: Zero-indexed sequence number identifying this data segment's position (0, 1, 2, ..., data_segment_count - 1). Set only on data segments. +- `data_segment_count`: Total number of data segments the original message was split into. Set on every segment (data and parity). - `payload`: The actual chunk of data or parity information for this segment. - `parity_segment_index`: Zero-based sequence number for parity segments. Set only on parity segments. -- `parity_segments_count`: Total number of parity segments generated. Set on every segment (data and parity) when Reed–Solomon parity is used; `0` (default) otherwise. +- `parity_segment_count`: Total number of parity segments generated. Set on every segment (data and parity) when Reed–Solomon parity is used; `0` (default) otherwise. - `is_parity`: Explicit segment type marker. `false` (default) for data segments; `true` for parity segments. A message is either a **data segment** (when `is_parity == false`) or a **parity segment** (when `is_parity == true`). @@ -76,14 +76,15 @@ A message is either a **data segment** (when `is_parity == false`) or a **parity Receivers **MUST** enforce: - `entire_message_hash.length == 32` -- `segments_count >= 1` +- `data_segment_count >= 1` +- `data_segment_count + parity_segment_count < maxTotalSegments` - **Data segments** (`is_parity == false`): - `index < segments_count` + `data_segment_index < data_segment_count` - **Parity segments** (`is_parity == true`): - `parity_segments_count > 0` AND `parity_segment_index < parity_segments_count` + `parity_segment_count > 0` AND `parity_segment_index < parity_segment_count` No other combinations are permitted. -A `SegmentMessageProto` with `segments_count == 1` and `index == 0` is a valid single-segment data message: the `payload` field carries the entire original payload (see [Sending](#sending)). +A `SegmentMessageProto` with `data_segment_count == 1` and `data_segment_index == 0` is a valid single-segment data message: the `payload` field carries the entire original payload (see [Sending](#sending)). ## Segmentation @@ -94,17 +95,16 @@ To transmit a payload, the sender: - **MUST** compute a 32-byte `entire_message_hash = Keccak256(original_payload)`. - **MUST** split the payload into one or more **data segments**, each of size up to `segmentSize` bytes. - A payload of size ≤ `segmentSize` produces a single data segment (`segments_count == 1`). + A payload of size ≤ `segmentSize` produces a single data segment (`data_segment_count == 1`). - **MAY** use Reed–Solomon erasure coding at the predefined parity rate. - **MUST** encode every segment as a `SegmentMessageProto` with: - The `entire_message_hash` - - `segments_count` (total number of data segments, always set) - - When Reed–Solomon parity is used, `parity_segments_count` (total number of parity segments, set on every segment) - - For data segments: `is_parity = false`, `index` + - `data_segment_count` (total number of data segments, always set) + - When Reed–Solomon parity is used, `parity_segment_count` (total number of parity segments, set on every segment) + - For data segments: `is_parity = false`, `data_segment_index` - For parity segments: `is_parity = true`, `parity_segment_index` - The raw payload data -- Send each segment as an individual transport message according to the underlying transport protocol, - preserving application-level metadata (e.g., content topic). +- Send each segment as an individual transport message according to the underlying transport service. This yields a deterministic wire format: every transmitted payload is a `SegmentMessageProto`. @@ -114,8 +114,8 @@ Upon receiving a segmented message, the receiver: - **MUST** validate each segment according to [Wire Format → Validation](#validation). - **MUST** cache received segments -- **MUST** attempt reconstruction once at least `segments_count` distinct segments (data and parity combined) have been received: - - If all data segments are present, concatenate their `payload` fields in `index` order. +- **MUST** attempt reconstruction once at least `data_segment_count` distinct segments (data and parity combined) have been received: + - If all data segments are present, concatenate their `payload` fields in `data_segment_index` order. - Otherwise, recover the payload via Reed–Solomon decoding over the available data and parity segments. - **MUST** verify `Keccak256(reconstructed_payload)` matches `entire_message_hash`. On mismatch, @@ -154,7 +154,7 @@ Implementations **SHOULD** support: Implementation-specific parameter, fixed. The reference implementation uses **256**. **Reconstruction capability:** -With the predefined parity rate, reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `segments_count` (i.e., up to the predefined percentage of loss tolerated). +With the predefined parity rate, reconstruction is possible if **all data segments** are received or if **any combination of data + parity** totals at least `data_segment_count` (i.e., up to the predefined percentage of loss tolerated). **API simplicity:** Libraries **SHOULD** require only `segmentSize` from the application for normal operation. @@ -190,14 +190,15 @@ Implementations **MUST** verify the Keccak256 hash post-reconstruction and disca To mitigate resource exhaustion: -- Limit concurrent reconstructions and per-sender storage +- Limit total concurrent reconstructions and aggregate buffered bytes + - When sender identity is available, apply the same two limits per sender - Enforce timeouts and size caps - Validate segment counts (≤ 256) - Consider rate-limiting at the transport layer (for example, via [17/WAKU2-RLN-RELAY](https://rfc.vac.dev/waku/standards/core/17/rln-relay) on Waku) ### Compatibility -Nodes that do **not** implement this specification cannot reconstruct large messages. +Nodes that do **not** implement this specification cannot reconstruct any messages. ---