mirror of
https://github.com/logos-messaging/js-waku.git
synced 2026-01-09 17:23:11 +00:00
364 lines
16 KiB
Markdown
364 lines
16 KiB
Markdown
|
|
---
|
|||
|
|
title: SDS
|
|||
|
|
name: Scalable Data Sync protocol for distributed logs
|
|||
|
|
status: raw
|
|||
|
|
editor: Hanno Cornelius <hanno@status.im>
|
|||
|
|
contributors:
|
|||
|
|
- Akhil Peddireddy <akhil@status.im>
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Abstract
|
|||
|
|
|
|||
|
|
This specification introduces the Scalable Data Sync (SDS) protocol
|
|||
|
|
to achieve end-to-end reliability
|
|||
|
|
when consolidating distributed logs in a decentralized manner.
|
|||
|
|
The protocol is designed for a peer-to-peer (p2p) topology
|
|||
|
|
where an append-only log is maintained by each member of a group of nodes
|
|||
|
|
who may individually append new entries to their local log at any time and
|
|||
|
|
is interested in merging new entries from other nodes in real-time or close to real-time
|
|||
|
|
while maintaining a consistent order.
|
|||
|
|
The outcome of the log consolidation procedure is
|
|||
|
|
that all nodes in the group eventually reflect in their own logs
|
|||
|
|
the same entries in the same order.
|
|||
|
|
The protocol aims to scale to very large groups.
|
|||
|
|
|
|||
|
|
## Motivation
|
|||
|
|
|
|||
|
|
A common application that fits this model is a p2p group chat (or group communication),
|
|||
|
|
where the participants act as log nodes
|
|||
|
|
and the group conversation is modelled as the consolidated logs
|
|||
|
|
maintained on each node.
|
|||
|
|
The problem of end-to-end reliability can then be stated as
|
|||
|
|
ensuring that all participants eventually see the same sequence of messages
|
|||
|
|
in the same causal order,
|
|||
|
|
despite the challenges of network latency, message loss,
|
|||
|
|
and scalability present in any communications transport layer.
|
|||
|
|
The rest of this document will assume the terminology of a group communication:
|
|||
|
|
log nodes being the _participants_ in the group chat
|
|||
|
|
and the logged entries being the _messages_ exchanged between participants.
|
|||
|
|
|
|||
|
|
## Design Assumptions
|
|||
|
|
|
|||
|
|
We make the following simplifying assumptions for a proposed reliability protocol:
|
|||
|
|
|
|||
|
|
* **Broadcast routing:**
|
|||
|
|
Messages are broadcast disseminated by the underlying transport.
|
|||
|
|
The selected transport takes care of routing messages
|
|||
|
|
to all participants of the communication.
|
|||
|
|
* **Store nodes:**
|
|||
|
|
There are high-availability caches (a.k.a. Store nodes)
|
|||
|
|
from which missed messages can be retrieved.
|
|||
|
|
These caches maintain the full history of all messages that have been broadcast.
|
|||
|
|
This is an optional element in the protocol design,
|
|||
|
|
but improves scalability by reducing direct interactions between participants.
|
|||
|
|
* **Message ID:**
|
|||
|
|
Each message has a globally unique, immutable ID (or hash).
|
|||
|
|
Messages can be requested from the high-availability caches or
|
|||
|
|
other participants using the corresponding message ID.
|
|||
|
|
* **Participant ID:**
|
|||
|
|
Each participant has a globally unique, immutable ID
|
|||
|
|
visible to other participants in the communication.
|
|||
|
|
|
|||
|
|
## Wire protocol
|
|||
|
|
|
|||
|
|
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,
|
|||
|
|
“SHOULD NOT”, “RECOMMENDED”, “MAY”, and
|
|||
|
|
“OPTIONAL” in this document are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
|
|||
|
|
|
|||
|
|
### Message
|
|||
|
|
|
|||
|
|
Messages MUST adhere to the following meta structure:
|
|||
|
|
|
|||
|
|
```protobuf
|
|||
|
|
syntax = "proto3";
|
|||
|
|
|
|||
|
|
message HistoryEntry {
|
|||
|
|
string message_id = 1; // Unique identifier of the SDS message, as defined in `Message`
|
|||
|
|
optional bytes retrieval_hint = 2; // Optional information to help remote parties retrieve this SDS message; For example, A Waku deterministic message hash or routing payload hash
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
message Message {
|
|||
|
|
string sender_id = 1; // Participant ID of the message sender
|
|||
|
|
string message_id = 2; // Unique identifier of the message
|
|||
|
|
string channel_id = 3; // Identifier of the channel to which the message belongs
|
|||
|
|
optional int32 lamport_timestamp = 10; // Logical timestamp for causal ordering in channel
|
|||
|
|
repeated HistoryEntry causal_history = 11; // List of preceding message IDs that this message causally depends on. Generally 2 or 3 message IDs are included.
|
|||
|
|
optional bytes bloom_filter = 12; // Bloom filter representing received message IDs in channel
|
|||
|
|
optional bytes content = 20; // Actual content of the message
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The sending participant MUST include its own globally unique identifier in the `sender_id` field.
|
|||
|
|
In addition, it MUST include a globally unique identifier for the message in the `message_id` field,
|
|||
|
|
likely based on a message hash.
|
|||
|
|
The `channel_id` field MUST be set to the identifier of the channel of group communication
|
|||
|
|
that is being synchronized.
|
|||
|
|
For simple group communications without individual channels,
|
|||
|
|
the `channel_id` SHOULD be set to `0`.
|
|||
|
|
The `lamport_timestamp`, `causal_history` and
|
|||
|
|
`bloom_filter` fields MUST be set according to the [protocol steps](#protocol-steps)
|
|||
|
|
set out below.
|
|||
|
|
These fields MAY be left unset in the case of [ephemeral messages](#ephemeral-messages).
|
|||
|
|
The message `content` MAY be left empty for [periodic sync messages](#periodic-sync-message),
|
|||
|
|
otherwise it MUST contain the application-level content
|
|||
|
|
|
|||
|
|
> **_Note:_** Close readers may notice that, outside of filtering messages originating from the sender itself,
|
|||
|
|
the `sender_id` field is not used for much.
|
|||
|
|
Its importance is expected to increase once a p2p retrieval mechanism is added to SDS, as is planned for the protocol.
|
|||
|
|
|
|||
|
|
### Participant state
|
|||
|
|
|
|||
|
|
Each participant MUST maintain:
|
|||
|
|
|
|||
|
|
* A Lamport timestamp for each channel of communication,
|
|||
|
|
initialized to current epoch time in nanosecond resolution.
|
|||
|
|
* A bloom filter for received message IDs per channel.
|
|||
|
|
The bloom filter SHOULD be rolled over and
|
|||
|
|
recomputed once it reaches a predefined capacity of message IDs.
|
|||
|
|
Furthermore,
|
|||
|
|
it SHOULD be designed to minimize false positives through an optimal selection of
|
|||
|
|
size and hash functions.
|
|||
|
|
* A buffer for unacknowledged outgoing messages
|
|||
|
|
* A buffer for incoming messages with unmet causal dependencies
|
|||
|
|
* A local log (or history) for each channel,
|
|||
|
|
containing all message IDs in the communication channel,
|
|||
|
|
ordered by Lamport timestamp.
|
|||
|
|
|
|||
|
|
Messages in the unacknowledged outgoing buffer can be in one of three states:
|
|||
|
|
|
|||
|
|
1. **Unacknowledged** - there has been no acknowledgement of message receipt
|
|||
|
|
by any participant in the channel
|
|||
|
|
2. **Possibly acknowledged** - there has been ambiguous indication that the message
|
|||
|
|
has been _possibly_ received by at least one participant in the channel
|
|||
|
|
3. **Acknowledged** - there has been sufficient indication that the message
|
|||
|
|
has been received by at least some of the participants in the channel.
|
|||
|
|
This state will also remove the message from the outgoing buffer.
|
|||
|
|
|
|||
|
|
### Protocol Steps
|
|||
|
|
|
|||
|
|
For each channel of communication,
|
|||
|
|
participants MUST follow these protocol steps to populate and interpret
|
|||
|
|
the `lamport_timestamp`, `causal_history` and `bloom_filter` fields.
|
|||
|
|
|
|||
|
|
#### Send Message
|
|||
|
|
|
|||
|
|
Before broadcasting a message:
|
|||
|
|
|
|||
|
|
* the participant MUST increase its local Lamport timestamp by `1` and
|
|||
|
|
include this in the `lamport_timestamp` field.
|
|||
|
|
* the participant MUST determine the preceding few message IDs in the local history
|
|||
|
|
and include these in an ordered list in the `causal_history` field.
|
|||
|
|
The number of message IDs to include in the `causal_history` depends on the application.
|
|||
|
|
We recommend a causal history of two message IDs.
|
|||
|
|
* the participant MAY include a `retrieval_hint` in the `HistoryEntry`
|
|||
|
|
for each message ID in the `causal_history` field.
|
|||
|
|
This is an application-specific field to facilitate retrieval of messages,
|
|||
|
|
e.g. from high-availability caches.
|
|||
|
|
* the participant MUST include the current `bloom_filter`
|
|||
|
|
state in the broadcast message.
|
|||
|
|
|
|||
|
|
After broadcasting a message,
|
|||
|
|
the message MUST be added to the participant’s buffer
|
|||
|
|
of unacknowledged outgoing messages.
|
|||
|
|
|
|||
|
|
#### Receive Message
|
|||
|
|
|
|||
|
|
Upon receiving a message,
|
|||
|
|
|
|||
|
|
* the participant SHOULD ignore the message if it has a `sender_id` matching its own.
|
|||
|
|
* the participant MAY deduplicate the message by comparing its `message_id` to previously received message IDs.
|
|||
|
|
* the participant MUST [review the ACK status](#review-ack-status) of messages
|
|||
|
|
in its unacknowledged outgoing buffer
|
|||
|
|
using the received message's causal history and bloom filter.
|
|||
|
|
* if the message has a populated `content` field,
|
|||
|
|
the participant MUST include the received message ID in its local bloom filter.
|
|||
|
|
* the participant MUST verify that all causal dependencies are met
|
|||
|
|
for the received message.
|
|||
|
|
Dependencies are met if the message IDs in the `causal_history` of the received message
|
|||
|
|
appear in the local history of the receiving participant.
|
|||
|
|
|
|||
|
|
If all dependencies are met and the message has a populated `content` field,
|
|||
|
|
the participant MUST [deliver the message](#deliver-message).
|
|||
|
|
If dependencies are unmet,
|
|||
|
|
the participant MUST add the message to the incoming buffer of messages
|
|||
|
|
with unmet causal dependencies.
|
|||
|
|
|
|||
|
|
#### Deliver Message
|
|||
|
|
|
|||
|
|
Triggered by the [Receive Message](#receive-message) procedure.
|
|||
|
|
|
|||
|
|
If the received message’s Lamport timestamp is greater than the participant's
|
|||
|
|
local Lamport timestamp,
|
|||
|
|
the participant MUST update its local Lamport timestamp to match the received message.
|
|||
|
|
The participant MUST insert the message ID into its local log,
|
|||
|
|
based on Lamport timestamp.
|
|||
|
|
If one or more message IDs with the same Lamport timestamp already exists,
|
|||
|
|
the participant MUST follow the [Resolve Conflicts](#resolve-conflicts) procedure.
|
|||
|
|
|
|||
|
|
#### Resolve Conflicts
|
|||
|
|
|
|||
|
|
Triggered by the [Deliver Message](#deliver-message) procedure.
|
|||
|
|
|
|||
|
|
The participant MUST order messages with the same Lamport timestamp
|
|||
|
|
in ascending order of message ID.
|
|||
|
|
If the message ID is implemented as a hash of the message,
|
|||
|
|
this means the message with the lowest hash would precede
|
|||
|
|
other messages with the same Lamport timestamp in the local log.
|
|||
|
|
|
|||
|
|
#### Review ACK Status
|
|||
|
|
|
|||
|
|
Triggered by the [Receive Message](#receive-message) procedure.
|
|||
|
|
|
|||
|
|
For each message in the unacknowledged outgoing buffer,
|
|||
|
|
based on the received `bloom_filter` and `causal_history`:
|
|||
|
|
|
|||
|
|
* the participant MUST mark all messages in the received `causal_history` as **acknowledged**.
|
|||
|
|
* the participant MUST mark all messages included in the `bloom_filter`
|
|||
|
|
as **possibly acknowledged**.
|
|||
|
|
If a message appears as **possibly acknowledged** in multiple received bloom filters,
|
|||
|
|
the participant MAY mark it as acknowledged based on probabilistic grounds,
|
|||
|
|
taking into account the bloom filter size and hash number.
|
|||
|
|
|
|||
|
|
#### Periodic Incoming Buffer Sweep
|
|||
|
|
|
|||
|
|
The participant MUST periodically check causal dependencies for each message
|
|||
|
|
in the incoming buffer.
|
|||
|
|
For each message in the incoming buffer:
|
|||
|
|
|
|||
|
|
* the participant MAY attempt to retrieve missing dependencies from the Store node
|
|||
|
|
(high-availability cache) or other peers.
|
|||
|
|
It MAY use the application-specific `retrieval_hint` in the `HistoryEntry` to facilitate retrieval.
|
|||
|
|
* if all dependencies of a message are met,
|
|||
|
|
the participant MUST proceed to [deliver the message](#deliver-message).
|
|||
|
|
|
|||
|
|
If a message's causal dependencies have failed to be met
|
|||
|
|
after a predetermined amount of time,
|
|||
|
|
the participant MAY mark them as **irretrievably lost**.
|
|||
|
|
|
|||
|
|
#### Periodic Outgoing Buffer Sweep
|
|||
|
|
|
|||
|
|
The participant MUST rebroadcast **unacknowledged** outgoing messages
|
|||
|
|
after a set period.
|
|||
|
|
The participant SHOULD use distinct resend periods for **unacknowledged** and
|
|||
|
|
**possibly acknowledged** messages,
|
|||
|
|
prioritizing **unacknowledged** messages.
|
|||
|
|
|
|||
|
|
#### Periodic Sync Message
|
|||
|
|
|
|||
|
|
For each channel of communication,
|
|||
|
|
participants SHOULD periodically send sync messages to maintain state.
|
|||
|
|
These sync messages:
|
|||
|
|
|
|||
|
|
* MUST be sent with empty content
|
|||
|
|
* MUST include an incremented Lamport timestamp
|
|||
|
|
* MUST include causal history and bloom filter according to regular message rules
|
|||
|
|
* MUST NOT be added to the unacknowledged outgoing buffer
|
|||
|
|
* MUST NOT be included in causal histories of subsequent messages
|
|||
|
|
* MUST NOT be included in bloom filters
|
|||
|
|
* MUST NOT be added to the local log
|
|||
|
|
|
|||
|
|
Since sync messages are not persisted,
|
|||
|
|
they MAY have non-unique message IDs without impacting the protocol.
|
|||
|
|
To avoid network activity bursts in large groups,
|
|||
|
|
a participant MAY choose to only send periodic sync messages
|
|||
|
|
if no other messages have been broadcast in the channel after a random backoff period.
|
|||
|
|
|
|||
|
|
Participants MUST process the causal history and bloom filter of these sync messages
|
|||
|
|
following the same steps as regular messages,
|
|||
|
|
but MUST NOT persist the sync messages themselves.
|
|||
|
|
|
|||
|
|
#### Ephemeral Messages
|
|||
|
|
|
|||
|
|
Participants MAY choose to send short-lived messages for which no synchronization
|
|||
|
|
or reliability is required.
|
|||
|
|
These messages are termed _ephemeral_.
|
|||
|
|
|
|||
|
|
Ephemeral messages SHOULD be sent with `lamport_timestamp`, `causal_history`, and
|
|||
|
|
`bloom_filter` unset.
|
|||
|
|
Ephemeral messages SHOULD NOT be added to the unacknowledged outgoing buffer
|
|||
|
|
after broadcast.
|
|||
|
|
Upon reception,
|
|||
|
|
ephemeral messages SHOULD be delivered immediately without buffering for causal dependencies
|
|||
|
|
or including in the local log.
|
|||
|
|
|
|||
|
|
## Implementation Suggestions
|
|||
|
|
|
|||
|
|
This section provides practical guidance based on the js-waku implementation of SDS.
|
|||
|
|
|
|||
|
|
### Default Configuration Values
|
|||
|
|
|
|||
|
|
The js-waku implementation uses the following defaults:
|
|||
|
|
- **Bloom filter capacity**: 10,000 messages
|
|||
|
|
- **Bloom filter error rate**: 0.001 (0.1% false positive rate)
|
|||
|
|
- **Causal history size**: 200 message IDs
|
|||
|
|
- **Possible ACKs threshold**: 2 bloom filter hits before considering a message acknowledged
|
|||
|
|
|
|||
|
|
With 200 messages in causal history, assuming 32-byte message IDs and 32-byte retrieval hints (e.g., Waku message hashes),
|
|||
|
|
each message carries 200 × 64 bytes = 12.8 KB of causal history overhead.
|
|||
|
|
|
|||
|
|
### External Task Scheduling
|
|||
|
|
|
|||
|
|
The js-waku implementation delegates periodic task scheduling to the library consumer by providing methods:
|
|||
|
|
|
|||
|
|
- `processTasks()`: Process queued send/receive operations
|
|||
|
|
- `sweepIncomingBuffer()`: Check and deliver messages with met dependencies, returns missing dependencies
|
|||
|
|
- `sweepOutgoingBuffer()`: Return unacknowledged and possibly acknowledged messages for retry
|
|||
|
|
- `pushOutgoingSyncMessage(callback)`: Send a sync message
|
|||
|
|
|
|||
|
|
The implementation does not include internal timers,
|
|||
|
|
allowing applications to integrate SDS with their existing scheduling infrastructure.
|
|||
|
|
|
|||
|
|
### Message Processing
|
|||
|
|
|
|||
|
|
#### Handling Missing Dependencies
|
|||
|
|
|
|||
|
|
When `sweepIncomingBuffer()` returns missing dependencies,
|
|||
|
|
the implementation emits an `InMessageMissing` event with `HistoryEntry[]` containing:
|
|||
|
|
- `messageId`: The missing message identifier
|
|||
|
|
- `retrievalHint`: Optional bytes to help retrieve the message (e.g., transport-specific hash)
|
|||
|
|
|
|||
|
|
#### Timeout for Lost Messages
|
|||
|
|
|
|||
|
|
The `timeoutForLostMessagesMs` option allows marking messages as irretrievably lost after a timeout.
|
|||
|
|
When configured, the implementation emits an `InMessageLost` event after the timeout expires.
|
|||
|
|
|
|||
|
|
### Events Emitted
|
|||
|
|
|
|||
|
|
The js-waku implementation uses a `TypedEventEmitter` pattern to emit events for:
|
|||
|
|
- **Incoming messages**: received, delivered, missing dependencies, lost (after timeout)
|
|||
|
|
- **Outgoing messages**: sent, acknowledged, possibly acknowledged
|
|||
|
|
- **Sync messages**: sent, received
|
|||
|
|
- **Errors**: task execution failures
|
|||
|
|
|
|||
|
|
### SDK Usage: ReliableChannel
|
|||
|
|
|
|||
|
|
The SDK provides a high-level `ReliableChannel` abstraction that wraps the core SDS `MessageChannel` with automatic task scheduling and Waku protocol integration:
|
|||
|
|
|
|||
|
|
#### Configuration
|
|||
|
|
|
|||
|
|
The ReliableChannel uses these default intervals:
|
|||
|
|
- **Sync message interval**: 30 seconds minimum between sync messages (randomized backoff)
|
|||
|
|
- **Retry interval**: 30 seconds for unacknowledged messages
|
|||
|
|
- **Max retry attempts**: 10 attempts before giving up
|
|||
|
|
- **Store query interval**: 10 seconds for missing message retrieval
|
|||
|
|
|
|||
|
|
#### Task Scheduling Implementation
|
|||
|
|
|
|||
|
|
The SDK automatically schedules SDS periodic tasks:
|
|||
|
|
- **Sync messages**: Uses exponential backoff with randomization; sent faster (0.5x multiplier) after receiving content to acknowledge others
|
|||
|
|
- **Outgoing buffer sweeps**: Triggered after each retry interval for unacknowledged messages
|
|||
|
|
- **Incoming buffer sweeps**: Performed after each incoming message and during missing message retrieval
|
|||
|
|
- **Process tasks**: Called immediately after sending/receiving messages and during sync
|
|||
|
|
|
|||
|
|
#### Integration with Waku Protocols
|
|||
|
|
|
|||
|
|
ReliableChannel integrates SDS with Waku:
|
|||
|
|
- **Sending**: Uses LightPush or Relay protocols; includes Waku message hash as retrieval hint (32 bytes)
|
|||
|
|
- **Receiving**: Subscribes via Filter protocol; unwraps SDS messages before passing to application
|
|||
|
|
- **Missing message retrieval**: Queries Store nodes using retrieval hints from causal history
|
|||
|
|
- **Query on connect**: Automatically queries Store when connecting to new peers (enabled by default)
|
|||
|
|
|
|||
|
|
## Copyright
|
|||
|
|
|
|||
|
|
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|