--- title: RELIABLE-CHANNEL-API name: Reliable Channel API category: Standards Track status: raw tags: [reliability, application, api] editor: Franck Royer contributors: - Franck Royer --- ## Table of contents * [Table of contents](#table-of-contents) * [Abstract](#abstract) * [Motivation](#motivation) * [Syntax](#syntax) * [API design](#api-design) * [IDL](#idl) * [Primitive types and general guidelines](#primitive-types-and-general-guidelines) * [Architecture](#architecture) * [SDS Integration](#sds-integration) * [Message Segmentation (Future)](#message-segmentation-future) * [Rate Limit Management (Future)](#rate-limit-management-future) * [The Reliable Channel API](#the-reliable-channel-api) * [Create Reliable Channel](#create-reliable-channel) * [Type definitions](#type-definitions) * [Function definitions](#function-definitions) * [Predefined values](#predefined-values) * [Extended definitions](#extended-definitions) * [Send messages](#send-messages) * [Function definitions](#function-definitions-1) * [Event handling](#event-handling) * [Type definitions](#type-definitions-1) * [Extended definitions](#extended-definitions-1) * [Channel lifecycle](#channel-lifecycle) * [Function definitions](#function-definitions-2) * [Implementation Suggestions](#implementation-suggestions) * [SDS MessageChannel integration](#sds-messagechannel-integration) * [Message retries](#message-retries) * [Missing message retrieval](#missing-message-retrieval) * [Synchronization messages](#synchronization-messages) * [Query on connect](#query-on-connect) * [Performance considerations](#performance-considerations) * [Security/Privacy Considerations](#securityprivacy-considerations) * [Copyright](#copyright) ## Abstract This document specifies an Application Programming Interface (API) for Reliable Channel, a high-level abstraction that provides eventual message consistency guarantees for all participants in a channel, as well as message segmentation and rate limit management when using an underlying rate limited delivery protocol with message size restrictions such as [WAKU2](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/10/waku2.md). The Reliable Channel is built on top of: - [WAKU-API](/standards/application/waku-api.md) for Waku protocol integration - [SDS](https://github.com/vacp2p/rfc-index/blob/main/vac/raw/sds.md) (Scalable Data Sync) for causal ordering and acknowledgments - Message segmentation for handling large payloads (TBD) - Rate limit management for RLN compliance (TBD) The Reliable Channel API ensures that: - All messages sent in a channel are eventually received by all participants - Senders are notified when messages are acknowledged by other participants - Missing messages are automatically detected and retrieved - Message delivery is retried until acknowledged or maximum retry attempts are reached - Messages are causally ordered using Lamport timestamps - Large messages can be segmented to fit transport constraints (TBD) ## Motivation While protocols like [SDS](https://github.com/vacp2p/rfc-index/blob/main/vac/raw/sds.md) provide the mechanisms for achieving reliability (causal ordering, acknowledgments, missing message detection), and [WAKU-API](/standards/application/waku-api.md) provides the transport layer, there is a need for an opinionated, high-level API that makes these capabilities accessible and easy to use. The Reliable Channel API provides this accessibility by: - **Simplifying integration**: Wraps SDS, Waku protocols, and other components (segmentation, rate limiting) behind a single, cohesive interface - **Providing sane defaults**: Pre-configures SDS parameters, retry strategies, and sync intervals for common use cases - **Event-driven model**: Exposes message lifecycle through intuitive events rather than requiring manual polling of SDS state - **Automatic task scheduling**: Handles the periodic execution of SDS tasks (sync, buffer sweeps) internally - **Abstracting complexity**: Hides the details of: - SDS message wrapping/unwrapping - Store queries for missing messages - Query-on-connect behavior - Message segmentation for large payloads (TBD) - Rate limit compliance when using RLN (TBD) The goal is to enable application developers to achieve end-to-end reliability with minimal configuration and without deep knowledge of the underlying protocols. This follows the same philosophy as [WAKU-API](/standards/application/waku-api.md): providing an opinionated, accessible interface to powerful but complex underlying mechanisms. ## Syntax The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt). ## API design ### IDL This specification uses the same custom Interface Definition Language (IDL) in YAML as defined in [WAKU-API](/standards/application/waku-api.md). ### Primitive types and general guidelines The primitive types and general guidelines are the same as defined in [WAKU-API](/standards/application/waku-api.md#primitive-types-and-general-guidelines). ## Architecture The Reliable Channel is a layered architecture that combines multiple components: ``` ┌─────────────────────────────────────────┐ │ Reliable Channel API │ ← Application-facing event-driven API ├─────────────────────────────────────────┤ │ Message Segmentation (future) │ ← Large message splitting/reassembly ├─────────────────────────────────────────┤ │ Rate Limit Manager (future) │ ← RLN compliance and pacing ├─────────────────────────────────────────┤ │ SDS (Scalable Data Sync) │ ← Causal ordering & acknowledgments ├─────────────────────────────────────────┤ │ WAKU-API (LightPush/Filter/Store) │ ← Message transport layer └─────────────────────────────────────────┘ ``` ### SDS Integration The Reliable Channel wraps the [SDS](https://github.com/vacp2p/rfc-index/blob/main/vac/raw/sds.md) `MessageChannel` to provide: - **Causal ordering**: Using Lamport timestamps to establish message order - **Acknowledgments**: Via causal history (definitive) and bloom filters (probabilistic) - **Missing message detection**: By tracking gaps in causal history - **Buffering**: For unacknowledged outgoing messages and incoming messages with unmet dependencies The Reliable Channel handles the integration between SDS and Waku protocols: - Wrapping user payloads in SDS messages before encoding - Unwrapping SDS messages after decoding - Scheduling SDS periodic tasks (sync, buffer sweeps, process tasks) - Mapping SDS events to user-facing events ### Message Segmentation (Future) For messages exceeding transport limits (e.g., 150 KiB for Waku with RLN): - Messages SHOULD be split into multiple segments - Each segment SHOULD be tracked independently through SDS - Segments SHOULD be reassembled before delivery to the application - Partial message state SHOULD be managed to handle segment loss ### Rate Limit Management (Future) When using [RLN-RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/17/rln-relay.md): - The Reliable Channel SHOULD pace message sending to comply with rate limits - Messages exceeding the rate limit SHOULD be queued - RLN proofs SHOULD be generated for each message/segment - Rate limit errors SHOULD be surfaced through events ## The Reliable Channel API ```yaml api_version: "0.0.1" library_name: "waku-reliable-channel" description: "Reliable Channel: an event-driven API for eventual message consistency in Waku channels." ``` ### Create Reliable Channel #### Type definitions ```yaml types: ReliableChannel: type: object description: "A Reliable Channel instance that provides eventual consistency guarantees." ReliableChannelOptions: type: object fields: sync_min_interval_ms: type: uint default: 30000 description: "The minimum interval between 2 sync messages in the channel (in milliseconds). This is shared responsibility between channel participants. Set to 0 to disable automatic sync messages." retry_interval_ms: type: uint default: 30000 description: "How long to wait before re-sending a message that has not been acknowledged (in milliseconds)." max_retry_attempts: type: uint default: 10 description: "How many times to attempt resending messages that were not acknowledged." retrieve_frequency_ms: type: uint default: 10000 description: "How often store queries are done to retrieve missing messages (in milliseconds)." sweep_in_buf_interval_ms: type: uint default: 5000 description: "How often the SDS message channel incoming buffer is swept (in milliseconds)." query_on_connect: type: bool default: true description: "Whether to automatically do a store query after connection to store nodes." auto_start: type: bool default: true description: "Whether to automatically start the message channel." process_task_min_elapse_ms: type: uint default: 1000 description: "The minimum elapsed time between calling the underlying channel process task for incoming messages. This prevents overload when processing many messages." causal_history_size: type: uint description: "The number of recent messages to include in causal history. Passed to the underlying SDS MessageChannel." bloom_filter_size: type: uint description: "The size of the bloom filter for probabilistic acknowledgments. Passed to the underlying SDS MessageChannel." ChannelId: type: string description: "An identifier for the channel. All participants of the channel MUST use the same id." SenderId: type: string description: "An identifier for the sender. SHOULD be unique per participant and persisted between sessions to ensure acknowledgements are only valid when originating from different senders." MessageId: type: string description: "A unique identifier for a message, derived from the message payload." DecodedMessage: type: object description: "A decoded Waku message with unwrapped payload." fields: payload: type: array description: "The unwrapped message content." timestamp: type: uint description: "The message timestamp." content_topic: type: string description: "The content topic of the message." pubsub_topic: type: string description: "The pubsub topic of the message." hash: type: array description: "The message hash." ephemeral: type: bool description: "Whether the message is ephemeral." meta: type: array description: "Optional metadata." ``` #### Function definitions ```yaml functions: createReliableChannel: description: "Create a new Reliable Channel instance. All participants in the channel MUST be able to decrypt messages and MUST subscribe to the same content topic(s)." parameters: - name: waku_node type: WakuNode description: "The Waku node instance to use for sending and receiving messages." - name: channel_id type: ChannelId description: "An identifier for the channel. All participants MUST use the same id." - name: sender_id type: SenderId description: "An identifier for this sender. SHOULD be unique and persisted between sessions." - name: encoder type: Encoder description: "The encoder for messages. All messages in the channel use the same encryption layer." - name: decoder type: Decoder description: "The decoder for messages. All messages in the channel use the same encryption layer." - name: options type: ReliableChannelOptions description: "Configuration options for the Reliable Channel." returns: type: result ``` #### Predefined values ```yaml values: DefaultReliableChannelOptions: type: ReliableChannelOptions fields: sync_min_interval_ms: 30000 retry_interval_ms: 30000 max_retry_attempts: 10 retrieve_frequency_ms: 10000 sweep_in_buf_interval_ms: 5000 query_on_connect: true auto_start: true process_task_min_elapse_ms: 1000 ``` #### Extended definitions **`channel_id` and `sender_id`**: The `channel_id` MUST be the same for all participants in a channel. The `sender_id` SHOULD be unique for each participant and SHOULD be persisted between sessions to ensure proper acknowledgment tracking. **`encoder` and `decoder`**: A Reliable Channel operates within a singular encryption layer. All messages sent and received in the channel MUST use the same encoder and decoder. This ensures that: - All participants can decrypt all messages - Messages are sent to the same content topic(s) **`options.auto_start`**: If set to `true` (default), the Reliable Channel SHOULD automatically call `start()` during creation. If set to `false`, the application MUST call `start()` before the channel will process messages. **`options.query_on_connect`**: If set to `true` (default) and the Waku node has store capability, the Reliable Channel SHOULD automatically query the store for missing messages when connecting to store nodes. This helps ensure message consistency when participants come online after being offline. ### Send messages #### Function definitions ```yaml functions: send: description: "Send a message in the channel. The message will be retried if not acknowledged by other participants." parameters: - name: message_payload type: array description: "The message content to send (before SDS wrapping)." returns: type: MessageId description: "A unique identifier for the message, used to track events." getMessageId: description: "Get the message ID for a given payload. Used to track message events before sending." static: true parameters: - name: message_payload type: array description: "The message content (before SDS wrapping)." returns: type: MessageId description: "The unique identifier that will be used for this message." ``` ### Event handling The Reliable Channel uses an event-driven model to notify applications about message lifecycle events. #### Type definitions ```yaml types: ReliableChannelEvents: type: object description: "Events emitted by the Reliable Channel." events: sending-message: type: event description: "Emitted when a message is being sent over the wire. MAY be emitted multiple times if retry mechanism kicks in." message-sent: type: event description: "Emitted when a message has been sent over the wire but has not been acknowledged yet. MAY be emitted multiple times if retry mechanism kicks in." message-possibly-acknowledged: type: event description: "Emitted when a bloom filter indicates the message was possibly received by another party. This is probabilistic. Retry mechanism will wait longer before trying again." message-acknowledged: type: event description: "Emitted when a message was fully acknowledged by other members of the channel (present in their causal history)." sending-message-irrecoverable-error: type: event description: "Emitted when a message could not be sent due to a non-recoverable error (likely an internal error)." message-received: type: event description: "Emitted when a new message has been received from another participant." irretrievable-message: type: event description: "Emitted when the channel is aware of a missing message but failed to retrieve it successfully." PossibleAcknowledgment: type: object fields: message_id: type: MessageId description: "The message ID that was possibly acknowledged." possible_ack_count: type: uint description: "The number of possible acknowledgments detected." MessageError: type: object fields: message_id: type: MessageId description: "The message ID that encountered an error." error: type: error description: "The error that occurred." HistoryEntry: type: object description: "An entry in the message history that could not be retrieved." ``` #### Extended definitions **Event lifecycle**: For each message sent, the following event sequence is expected: 1. `sending-message`: Emitted when the message encoding and sending begins 2. `message-sent`: Emitted when the message has been successfully sent over the network 3. One of: - `message-possibly-acknowledged`: (Optional, probabilistic) Emitted when bloom filters suggest acknowledgment - `message-acknowledged`: Emitted when causal history confirms acknowledgment - `sending-message-irrecoverable-error`: Emitted if an unrecoverable error occurs Events 1-2 MAY be emitted multiple times if the retry mechanism is activated due to lack of acknowledgment. **Irrecoverable errors**: The following errors are considered irrecoverable and will trigger `sending-message-irrecoverable-error`: - Encoding failed - Empty payload - Message size too large - RLN proof generation failed When an irrecoverable error occurs, the retry mechanism SHOULD NOT attempt to resend the message. ### Channel lifecycle #### Function definitions ```yaml functions: start: description: "Start the Reliable Channel. Sets up event listeners, begins sync loop, starts missing message retrieval, and subscribes to messages." returns: type: result description: "True if successfully started, error otherwise." stop: description: "Stop the Reliable Channel. Stops sync loop, missing message retrieval, and clears intervals." returns: type: void isStarted: description: "Check if the Reliable Channel is currently started." returns: type: bool description: "True if the channel is started, false otherwise." ``` ## Implementation Suggestions This section provides practical implementation guidance based on the [js-waku](https://github.com/waku-org/js-waku) implementation. ### SDS MessageChannel integration The Reliable Channel MUST use the [SDS](https://github.com/vacp2p/rfc-index/blob/main/vac/raw/sds.md) `MessageChannel` as its core reliability mechanism. **Reference**: `js-waku/packages/sds/src/message_channel/message_channel.ts:1` **Key integration points**: 1. **Message wrapping**: User payloads MUST be wrapped in SDS `ContentMessage` before sending: ``` User payload → SDS ContentMessage → Waku Message → Network ``` 2. **Message unwrapping**: Received Waku messages MUST be unwrapped to extract user payloads: ``` Network → Waku Message → SDS ContentMessage → User payload ``` 3. **SDS configuration**: Implementations SHOULD configure the SDS MessageChannel with: - `causalHistorySize`: Number of recent message IDs in causal history (default: 200) - `bloomFilterSize`: Bloom filter capacity for probabilistic ACKs (default: 10,000 messages) - `bloomFilterErrorRate`: False positive rate (default: 0.001) 4. **Task scheduling**: Implementations MUST periodically call SDS methods: - `processTasks()`: Process queued send/receive operations - `sweepIncomingBuffer()`: Deliver messages with met dependencies - `sweepOutgoingBuffer()`: Identify messages for retry 5. **Event mapping**: SDS events SHOULD be mapped to Reliable Channel events: - `OutMessageSent` → `message-sent` - `OutMessageAcknowledged` → `message-acknowledged` - `OutMessagePossiblyAcknowledged` → `message-possibly-acknowledged` - `InMessageReceived` → `message-received` - `InMessageMissing` → Missing message retrieval trigger **Default SDS configuration values** (from js-waku): - Bloom filter capacity: 10,000 messages - Bloom filter error rate: 0.001 (0.1% false positive rate) - Causal history size: 200 message IDs (≈12.8 KB overhead per message) - Possible ACKs threshold: 2 bloom filter hits before considering acknowledged ### Message retries The retry mechanism SHOULD use a simple fixed-interval retry strategy: - When a message is sent, start a retry timer - Every `retry_interval_ms`, attempt to resend the message - Stop retrying when: - The message is acknowledged (via causal history) - `max_retry_attempts` is reached - An irrecoverable error occurs **Reference**: `js-waku/packages/sdk/src/reliable_channel/retry_manager.ts:1` **Implementation notes**: - Retry intervals SHOULD be configurable (default: 30 seconds) - Maximum retry attempts SHOULD be configurable (default: 10 attempts) - Future implementations MAY implement exponential back-off strategies - Implementations SHOULD NOT retry sending if the node is known to be offline ### Missing message retrieval The Reliable Channel SHOULD implement automatic detection and retrieval of missing messages: 1. **Detection**: When processing incoming messages, the SDS layer detects gaps in causal history 2. **Tracking**: Missing messages are tracked with their message IDs and retrieval hints (Waku message hashes) 3. **Retrieval**: Periodic store queries retrieve missing messages using retrieval hints 4. **Processing**: Retrieved messages are processed through the normal message pipeline **Reference**: `js-waku/packages/sdk/src/reliable_channel/missing_message_retriever.ts:1` **Implementation notes**: - Missing message checks SHOULD run periodically (default: every 10 seconds) - Store queries SHOULD use message hashes for targeted retrieval - Retrieved messages SHOULD be removed from the missing messages list once received - If a message cannot be retrieved, implementations MAY emit an `irretrievable-message` event ### Synchronization messages Sync messages are empty messages that carry only causal history and bloom filter information. They serve to: - Acknowledge received messages without sending new content - Keep the channel active - Allow participants to learn about each other's message state **Implementation notes**: - Sync messages SHOULD be sent periodically with randomized delays - The sync interval is shared responsibility: `sync_interval = random() * sync_min_interval_ms` - When a content message is received, sync SHOULD be scheduled sooner (multiplier: 0.5) - When a content message is sent, sync SHOULD be rescheduled at normal interval (multiplier: 1.0) - When a sync message is received, sync SHOULD be rescheduled at normal interval - After failing to send a sync, retry SHOULD use a longer interval (multiplier: 2.0) **Reference**: `js-waku/packages/sdk/src/reliable_channel/reliable_channel.ts:515-529` ### Query on connect When enabled, the Reliable Channel SHOULD automatically query the store when connecting to store nodes. This helps participants catch up on missed messages after being offline. **Implementation notes**: - Query on connect SHOULD be triggered when: - A store node becomes available - The node reconnects after being offline (health status changes) - A configurable time threshold has elapsed since the last query (default: 5 minutes) - Queries SHOULD stop when finding a message with causal history from the same channel - Queries SHOULD continue if all retrieved messages are from different channels **Reference**: `js-waku/packages/sdk/src/reliable_channel/reliable_channel.ts:183-196` ### Performance considerations To avoid overload when processing many messages: 1. **Throttled processing**: Don't call the SDS process task more frequently than `process_task_min_elapse_ms` (default: 1 second) 2. **Batched sweeping**: Sweep the incoming buffer at regular intervals rather than per-message (default: every 5 seconds) 3. **Lazy task execution**: Queue process tasks with a minimum elapsed time between executions **Reference**: `js-waku/packages/sdk/src/reliable_channel/reliable_channel.ts:460-473` ## Security/Privacy Considerations 1. **Encryption**: All participants in a Reliable Channel MUST be able to decrypt messages. Implementations SHOULD use the same encryption layer (encoder/decoder) for all messages. 2. **Sender identity**: The `sender_id` is used to differentiate acknowledgments. Implementations SHOULD ensure that acknowledgments are only considered valid when they originate from a different sender. 3. **Channel isolation**: Messages in different channels are isolated. A participant SHOULD only process messages that match their channel ID. 4. **Message ordering**: While the Reliable Channel ensures eventual consistency, it does not guarantee strict message ordering across participants. 5. **Resource exhaustion**: Implementations SHOULD implement limits on: - Number of missing messages tracked - Number of active retry attempts - Frequency of store queries 6. **Privacy**: Store queries reveal message interest patterns. Implementations MAY consider privacy-preserving retrieval strategies in the future. See [WAKU2-ADVERSARIAL-MODELS](https://github.com/waku-org/specs/blob/master/informational/adversarial-models.md). ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).