From 6f0587b7672aaf7b0991a70468c2ed618765a0bc Mon Sep 17 00:00:00 2001 From: jm-clius Date: Thu, 5 Jun 2025 19:59:15 +0100 Subject: [PATCH] docs: updates to the Store Sync and Waku Sync specs --- standards/core/store-sync.md | 113 ++++++++++--- standards/core/sync.md | 305 ++++++++++++++++++++++++----------- 2 files changed, 308 insertions(+), 110 deletions(-) diff --git a/standards/core/store-sync.md b/standards/core/store-sync.md index 892a0a0..6928df3 100644 --- a/standards/core/store-sync.md +++ b/standards/core/store-sync.md @@ -7,38 +7,111 @@ contributors: ## Abstract -This document describe the strategy Waku Store node will employ to stay synchronized. -The goal being that all store nodes eventually archive the same set of messages. +This document describes a strategy to keep [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) nodes synchronised, +using a combination of [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) queries +and the [WAKU-SYNC](../core/sync.md) protocol. ## Background / Rationale / Motivation -Message propagation in the network is not perfect, -even with GossipSub mechanisms to detect missed messages. -Nodes can also go offline for various reason outside our control. -Store nodes that want to provide a good service must be able to remedy situations like these. -By having store nodes synchronize with each other through various protocols, -the set of archived messages network wide will be eventually consistent. +Message propagation in [10/WAKU2](https://rfc.vac.dev/waku/standards/core/10/waku2) networks is not perfect. +Even with [peer-to-peer reliability](../application/p2p-reliability.md) mechanisms, +a certain amount of routing losses are always expected between Waku nodes. +For example, nodes could experience brief, undetected disconnections, +undergo restarts in order to update software, +or suffer losses due to resource constraints. + +Whatever the source of the losses, +this affects applications and services relying on the message routing layer. +One such service is the [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) protocol +that allows nodes to cache historical [14/WAKU2-MESSAGE](https://rfc.vac.dev/waku/standards/core/14/message)s from the routing layer, +and provision these to clients. +Using Waku Store Sync, +[13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) can remain synchronised +and reach eventual consistency despite occasional losses on the routing layer. + +## Scope: + +Waku Store Sync aims to provide a way for [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) nodes +to compare and retrieve differences with other [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) nodes, +in order to remedy messages that might have been missed or lost on the routing layer. + +It seeks to cover the following loss scenarios: +1. Short-term offline periods, for example due to a restart or short-term node maintenance +2. Occasional message losses that occur during normal operation, due to short-term instability, churn, etc. + +For the purposes of this document, +we define short-term offline periods as no more than `1` hour +and occasional message losses as no more than `20%` of total routed messages. + +It does not aim to address recovery after long-term offline periods, +or to address massive message losses due to extraordinary circumstances, +such as adversarial behaviour. +Although Store Sync could perhaps work in such cases, +it's not optimised or designed for catastrophic loss recovery. +Large scale recovery falls beyond the scope of this document. +We provide further recommendations for reasonable parameter defaults below. ## Theory / Semantics The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). -Various protocols and features that help with message consistency are described below. +A [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) node with Store Sync enabled: +1. MAY use [Store Resume](#store-resume) to recover messages after detectable short-term offline periods +2. MUST use [Waku Sync](#waku-sync) to maintain consistency with other nodes and recover occasional message losses ### Store Resume -This feature allow a node to fill the gap in messages for the period it was last offline. -At startup, a node SHOULD use the Store protocol to query a random node for -the time interval since it was last online. +Store Sync nodes MAY use Store Resume to fill the gap in messages for any short-term offline period. +Such a node SHOULD keep track of its last online timestamp. +It MAY do so by periodically storing the current timestamp on disk while online. +After a detected offline period has been resolved, +or at startup, +a Store Sync node using Store Resume SHOULD select another [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) node using any available discovery mechanism. +We RECOMMEND that this to be a random node. +Next, the Store Sync node SHOULD perform a [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store#content-filtered-queries) query +to the selected node for the time interval since it was last online. Messages returned by the query are then added to the local node storage. -It is RECOMMENDED to limit the time interval to a maximum of 6 hours. +It is RECOMMENDED to limit the time interval to a maximum of `6` hours. ### Waku Sync -Nodes that stay online can still miss messages. -[Waku Sync](https://github.com/waku-org/specs/blob/master/standards/core/sync.md) consists of two libp2p protocols, -respectively used to (i) find those messages and (ii) mend the differences by periodically syncing with random nodes. -It is RECOMMENDED to trigger a sync with a random peer that supports the protocols every 5 minutes for a time range of the last hour. +Even while online, Store Sync nodes may occasionally miss messages. +To remedy any such losses and to achieve eventual consistency, +Store Sync nodes MUST mount [WAKU2-SYNC](./sync.md) protocol +to detect and exchange differences with other Store Sync nodes. +As described in that specification, +[WAKU2-SYNC](./sync.md) consists of two sub-protocols. +Both sub-protocols MUST be used by Store Sync nodes in the following way: +1. `reconciliation` MUST be used to detect and exchange differences between [14/WAKU2-MESSAGE](https://rfc.vac.dev/waku/standards/core/14/message)s cached by the [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) node +2. `transfer` MUST be used to transfer the actual content of such differences. +Messages received via `transfer` MUST be cached in the same archive backend +where the [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) node caches messages received via normal routing. + +#### Periodic syncing + +Store Sync nodes SHOULD periodically trigger [WAKU2-SYNC](./sync.md). +We RECOMMEND syncing at least once every `5` minutes with `1` other Store Sync peer. +The node MAY choose to sync more often with more peers +to achieve faster consistency. +Any peer selected for Store Sync SHOULD be chosen at random. + +Discovery of other Store Sync peers falls outside the scope of this document. +For simplicity, a Store Sync node MAY assume that any other [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) peer +supports Store Sync and attempt to trigger a sync operation with that node. +If the sync operation then fails (due to unsupported protocol), +it could continue attempting to sync with other [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) peers on a trial-and-error basis +until it finds a suitable Store Sync peer. + +#### Sync window + +For every [WAKU2-SYNC](./sync.md) operation, +the Store Sync node SHOULD choose a reasonable window of time into the past +over which to sync cached messages. +We RECOMMEND a sync window of `1` hour into the past. +This means that the syncing peers will compare +and exchange differences in cached messages up to 1 hour into the past. +A Store Sync node MAY choose to sync over a shorter time window to save resources and sync faster. +A Store Sync node MAY choose to sync over a longer time window to remedy losses over a longer period. ## Copyright @@ -47,4 +120,8 @@ Copyright and related rights waived via [CC0](https://creativecommons.org/public ## References -- [Waku Sync](https://github.com/waku-org/specs/blob/master/standards/core/sync.md) \ No newline at end of file +- [10/WAKU2](https://rfc.vac.dev/waku/standards/core/10/waku2) +- [13/WAKU2-STORE](https://rfc.vac.dev/waku/standards/core/13/store) +- [14/WAKU2-MESSAGE](https://rfc.vac.dev/waku/standards/core/14/message) +- [WAKU-P2P-RELIABILITY](../application/p2p-reliability.md) +- [WAKU2-SYNC](./sync.md) \ No newline at end of file diff --git a/standards/core/sync.md b/standards/core/sync.md index 3fec88b..f5489f5 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -9,7 +9,7 @@ contributors: # Abstract This specification explains `WAKU-SYNC` -which enables the synchronization of messages between nodes storing sets of [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message) +which enables the synchronization of messages between nodes storing sets of [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message)s. # Specification @@ -26,121 +26,234 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL **Libp2p Protocol identifier**: `/vac/waku/reconciliation/1.0.0` -The protocol finds differences between two peers by -comparing _fingerprints_ of _ranges_ of message _IDs_. -_Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, split into smaller (sub)ranges. -This process repeats until _ranges_ include a small number of messages. -At this point lists of message _IDs_ are sent for comparison instead of _fingerprints_ over entire ranges of messages. +The `reconciliation` protocol finds the differences between two sets of [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message) on different nodes. +It assumes that each [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message) maps to a uniquely identifying `SyncID`, +which is maintained in an ordered set within each node. +An ordered set of `SyncID`s is termed a `Range`. +This implies that any contiguous subset of a `Range` is also a `Range`. +In other words, the `reconciliation` protocol allows two nodes to find differences between `Range`s of `SyncID`s, +which would map to an equivalent difference in cached [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message)s. +These terms, the wire protocol and message flows are explained below. -#### Overview -The `reconciliation` protocol follows the following heuristic: -1. The requestor chooses a time range to sync. -2. The range is encoded into a payload and sent. -3. The requestee receives the payload and decodes it. -4. The range is processed and, if a difference with the local range is detected, a set of subranges are produced. -5. The new ranges are encoded and sent. -6. This process repeats while differences found are sent to the `transfer` protocol. -7. The synchronization ends when all ranges have been processed and no differences are left. +### Wire protocol -#### Message IDs -Message _IDs_ MUST be composed of the timestamp and the hash of the [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). +#### Reconcilation payload -The timestamp MUST be the time of creation and -the hash MUST follow the -[deterministic message hashing specification](https://rfc.vac.dev/waku/standards/core/14/message#deterministic-message-hashing) +Nodes participating in the `reconciliation` protocol exchange encoded `RangesData` messages. -> This way the message IDs can always be totally ordered, -first chronologically according to the timestamp and then -disambiguated based on the hash lexical order -in cases where the timestamp is the same. +The `RangesData` structure represents a complete `reconciliation` payload: -#### Range Bounds -A _range_ MUST consist of two _IDs_, the first bound is -inclusive the second bound exclusive. -The first bound MUST be strictly smaller than the second one. +| Field | Type | Description | +|---------------|--------------------------|------------------------------------------| +| cluster | varint | Cluster ID of the sender | +| shards | seq[varint] | Shards supported by the sender | +| ranges | seq[Range] | Sequence of ranges | -#### Range Fingerprinting -The _fingerprint_ of a range MUST be the XOR operation applied to -the hash of all message _IDs_ included in that _range_. +The `cluster` and `shards` fields +represent the sharding elements as defined in [RELAY-SHARDING](https://github.com/waku-org/specs/blob/master/standards/core/relay-sharding.md#static-sharding). +The `ranges` field contain a sequence of ranges for reconciliation. -#### Range Type -Every _range_ MUST have one of the following types; _fingerprint_, _skip_ or _item set_. +We identify the following subtypes: -- _Fingerprint_ type contains a _fingerprint_. -- _Skip_ type contains nothing and is used to signal already processed _ranges_. -- _Item set_ type contains message _IDs_ and a _resolved_ boolean. -> _Item sets_ are an optimization, sending multiple _IDs_ instead of -recursing further reduce the number of round-trips. +##### Range -#### Range Processing -_Ranges_ have to be processed differently according to their types. +A `Range` represents a representation of the bounds, type and, optionally, an encoded representation of the content of a range. -- _Fingerprint_ ranges MUST be compared. - - **Equal** ranges MUST become _skip ranges_. - - **Unequal** ranges MUST be split into smaller _fingerprint_ or _item set_ ranges based on a implementation specific threshold. -- **Unresolved** _item set_ ranges MUST be compared, differences sent to the `transfer` protocol and marked resolved. -- **Resolved** _item set_ ranges MUST be compared, differences sent to the `transfer` protocol and become skip ranges. -- _Skip_ ranges MUST be merged with other consecutive _skip ranges_. +| Field | Type | Description | +|---------------|--------------------------|------------------------------------------| +| bounds | RangeBounds | The bounds of the range | +| type | RangeType | The type of the range | +| Option(content)| Fingerprint OR ItemSet | Optional field depending on `type`. If set, possible values are a `Fingerprint` of the range, or the complete `ItemSet` in the range | -In the case where only skip ranges remains, the synchronization is done. +##### RangeBounds -### Delta Encoding -_Ranges_ and timestamps MUST be delta encoded as follows for efficient transmission. +`RangeBounds` defines a `Range` by two bounding `SyncID` values, forming a time-hash interval: -All _ranges_ to be transmitted MUST be ordered and only upper bounds used. -> Inclusive lower bounds can be omitted because they are always -the same as the exclusive upper bounds of the previous range or zero. +| Field | Type | Description | +|---------|---------|--------------------------------------| +| a | SyncID | Lower bound (inclusive) | +| b | SyncID | Upper bound (exclusive) | -To achieve this, it MAY be needed to add _skip ranges_. -> For example, a _skip range_ can be added with -an exclusive upper bound equal to the first range lower bound. -This way the receiving peer knows to ignore the range from zero to the start of the sync time window. +The lower bound MUST be strictly smaller than the upper bound. -Every _ID_'s timestamps after the first MUST be noted as the difference from the previous one. -If the timestamp is the same, zero MUST be used and the hash MUST be added. -The added hash MUST be truncated up to and including the first differentiating byte. +##### SyncID -| Timestamp | Hash | Timestamp (encoded) | Hash (encoded) +A `SyncID` consists of a message timestamp and hash to uniquely identify a [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). + +| Field | Type | Description | +|-----------|------------|-------------------------------------------| +| timestamp | varint | Timestamp of the message (nanoseconds) | +| hash | seq[Byte] | 32-byte message hash | + +The timestamp MUST be the time of message creation and +the hash MUST follow the [deterministic message hashing specification](https://rfc.vac.dev/waku/standards/core/14/message#deterministic-message-hashing). +SyncIDs MUST be totally ordered by timestamp first, then by hash to disambiguate messages with identical timestamps. + +##### RangeType + +A `RangeType` indicates the type of `content` encoding for the `Range`: + +| Type | Value | Description | +|-------------|-------|-------------------------------------------------| +| Skip | 0 | Range has been processed and `content` field is empty | +| Fingerprint | 1 | Range is encoded in `content` field as a 32-byte `Fingerprint` | +| ItemSet | 2 | Range is encoded in `content` field as an `ItemSet` of `SyncID`s | + +##### Fingerprint + +A `Fingerprint` is a compact 32-byte representation of all message hashes within a `Range`. +It MUST be implemented as an XOR of all contained hashes, +extracted from the `SyncID`s present in the `Range`. +Fingerprints allow for efficient difference detection without transmitting complete message sets. + +##### ItemSet + +An `ItemSet` contains a full representation of all `SyncID`s within a `Range` +and a reconciliation status: + +| Field | Type | Description | +|------------|------------|--------------------------------------| +| elements | seq[`SyncID`] | Sequence of `SyncID`s in the `Range` | +| reconciled | bool | Whether the `Range` has been reconciled | + +#### Payload encoding + +We can now define a heuristic for encoding the `RangesData` payload. +All varints MUST be encoded according to the [specified varint encoding](#varint-encoding) procedure. + +Concatenate each of the following, in order: +1. Encode the cluster ID as a varint +2. Encode the number of shards as a varint +3. For each shard, encode the shard as a varint and concatenate it to the previous +4. Encode the `ranges`, according to ["Delta encoding for Ranges"](#delta-encoding-for-ranges). + +##### Varint encoding + +All variable integers (varints) MUST be encoded as little-endian base 128 variable-length integers (LEB128) and MUST be minimally encoded. + +##### Delta encoding for Ranges + +The `ranges` field contain a sequence of `Ranges`. + +It can be delta encoded as follows: + +For each `Range` element, concatenate the following: + 1. From the `RangeBounds`, select only the `SyncID` representing the _upper bound_ of that range. + Inclusive lower bounds are omitted because they are always the same as the exclusive upper bounds of the previous range. + The first range is always assumed to have a lower bound `SyncID` of `0` (both the `timestamp` and `hash` are `0`). + 2. [Delta encode](#delta-encoding-for-sequential-syncids) the selected `SyncID` by comparing it to the previous ranges' `SyncID`s. + The first range's `SyncID` will be fully encoded, as described in that section. + 3. Encode the `RangeType` as a single byte and concatenate it to the delta encoded `SyncID`. + 4. If the `RangeType` is: + - _Skip_: encode nothing more + - _Fingerprint_: encode the 32-byte fingerprint + - _ItemSet_: + - Encode the number of elements as a varint + - Delta encode the item set, according to ["Delta encoding for ItemSets"](#delta-encoding-for-itemsets) + +##### Delta encoding for sequential SyncIDs + +Sequential `SyncID`s can be delta encoded to minimize payload size. + +Given an ordered sequence of `SyncID`s: +1. Encode the first timestamp in full +2. Delta encode subsequent timestamps, i.e., encode only the difference from the previous timestamp +3. If the timestamps are identical, encode a `SyncID` `hash` delta as follows: +3.1. Compared to the previous hash, truncate the hash up to and including the first differentiating byte +3.2. Encode the number of bytes in the truncated hash (as a single byte) +3.3. Encode the truncated hash + +See the table below as an example: + +| Timestamp | Hash | Encoded Timestamp (diff from previous) | Encoded Hash (length + all bytes up to first diff) | - | - | - | - | 1000 | 0x4a8a769a... | 1000 | - | 1002 | 0x351c5e86... | 2 | - -| 1002 | 0x3560d9c4... | 0 | 0x3560 +| 1002 | 0x3560d9c4... | 0 | 0x023560 | 1003 | 0xbeabef25... | 1 | - -#### Varints -All _varints_ MUST be little-endian base 128 variable length integers (LEB128) and minimally encoded. +##### Delta encoding for ItemSets -#### Payload encoding -The wire level payload MUST be encoded as follow. -> The & denote concatenation. +An `ItemSet` is delta encoded as follows: -> Refer to [RELAY-SHARDING](https://github.com/waku-org/specs/blob/master/standards/core/relay-sharding.md#static-sharding) -RFC for cluster and shard specification. +Concatenate each of the following, in order: +1. From the first `SyncID`, encode the timestamp in full +2. From the first `SyncID`, encode the hash in full +3. For each subsequent `SyncID`: + - Delta encode the timestamp, i.e., encode only the difference from the previous `SyncID`'s timestamp + - Append the encoded hash in full +4. Encode a single byte for the `reconciled` boolean (0 or 1) -1. _varint_ bytes of the node's cluster ID & -2. _varint_ bytes of the node's number of shard supported & -3. _varint_ bytes for each shard index supported & -4. _varint_ bytes of the delta encoded timestamp & -5. if timestamp is zero, 1 byte for the hash bytes length & the hash bytes & -6. 1 byte, the _range_ type & -7. either - - 32 bytes _fingerprint_ or - - _varint_ bytes of the item set length & bytes of every items or - - if _skip range_, do nothing +### Reconciliation Message Flow -8. repeat steps 4 to 7 for all ranges. +The `reconciliation` message flow is triggered by an `initiator` with an initial `RangesData` payload. +The selection of sync peers +and triggers for initiating a `reconciliation` is out of scope of this document. + +The response to a `RangesData` payload is another `RangesData` payload, according to the heuristic below. +The syncing peers SHOULD continue exchanging `RangesData` payloads until all ranges have been processed. + +The output of a `reconciliation` flow is a set of differences in `SyncID`s. +These could either be `SyncID`s present in the remote peer and missing locally, +or `SyncID`s present locally but missing in the remote peer. +Each sync peer MAY use the `transfer` protocol to exchange the full messages corresponding to these computed differences, +proactively transferring messages missing in the remote peer. + +#### Initial Message + +The initiator +1. Selects a time range to sync +2. Computes a fingerprint for the entire range +3. Constructs an initial `RangesData` payload with: + - The initiator's cluster ID + - The initiator's supported shards + - A single range of type `Fingerprint` covering the entire sync period + - The fingerprint for this range +4. [Delta encodes](#payload-encoding) the payload +5. Sends the payload using libp2p length-prefixed encoding + +#### Responding to a `RangesData` payload + +Each syncing peer performs the following in response to a `RangesData` payload: + +The responder: +1. Receives and decodes the payload +2. If shards and cluster match, processes each range: + - If the received range is of type `Skip`, ignores it. + - If the received range is of type `Fingerprint`, computes the fingerprint over the local matching range + - If the local fingerprint matches the received fingerprint, includes this range in the response as type `Skip` + - If the local fingerprint does _not_ match the received fingerprint: + - If the corresponding range is small enough, includes this range in the response as type `ItemSet` with all `SyncID`s + - If the corresponding range is too large, divide it into subranges. + For each subrange, if the range is small enough, includes it in the response as type `ItemSet` with all `SyncID`s. + If the subrange is too large, includes it in the response as type `Fingerprint`. + - If the received range is of type `ItemSet`, compares it to the local items in the corresponding range + - If there are any differences between the local and remote items, adds these as part of the output of the `reconciliation` procedure. + At this point, the syncing peers MAY start to exchange the full messages corresponding to differences using the `transfer` protocol. + - If the received `ItemSet` is _not_ marked as `reconciled` by the remote peer, includes the corresponding local range in the response as type `ItemSet`. + - If the received `ItemSet` is marked as `reconciled` by the remote peer, includes the corresponding local range in the response as type `Skip`. +3. If shards or cluster don't match: + - Responds with an empty payload +4. Delta encodes the response +5. Sends the response using libp2p length-prefixed encoding + +This process continues until the syncing peer crafts an empty response, +i.e., when all ranges have been processed and reconciled by both syncing peers and there are no differences left. ## Transfer Protocol **Libp2p Protocol identifier**: `/vac/waku/transfer/1.0.0` -The transfer protocol SHOULD send messages as soon as -a difference is found via reconciliation. -It MUST only accept messages from peers the node is reconciliating with. -New message IDs MUST be added to the reconciliation protocol. -The payload sent MUST follow the wire specification below. +Once the `reconciliation` protocol starts finding differences in `SyncID`s, +the `transfer` protocol MAY be used to exchange actual message contents between peers. +A node using `transfer` SHOULD proactively send [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message)s missing in the remote party. +Nodes SHOULD only accept incoming transfers from peers for which they have an active reconciliation session. +The `SyncID`s corresponding to messages received via `transfer`, +MUST be added to the corresponding `Range` tracked by the `reconciliation` protocol. +The `transfer`payload MUST follow the wire specification below. ### Wire specification + ```protobuf syntax = "proto3"; @@ -155,17 +268,20 @@ message WakuMessageAndTopic { } ``` -# Implementation +## Implementation Suggestions + The flexibility of the protocol implies that much is left to the implementers. What will follow is NOT part of the specification. This section was created to inform implementations. -#### Cluster & Shards +### Cluster & Shards + To prevent nodes from synchronizing messages from shard they don't support, cluster and shards information has been added to each payload. On reception, if two peers don't share the same set of shards the sync is aborted. -#### Parameters +### Parameters + Two useful parameters to add to your implementation are partitioning count and the item set threshold. The partitioning count is the number of time a range is split. @@ -175,7 +291,8 @@ The item set threshold determines when item sets are sent instead of fingerprint A higher value sends more items which means higher chance of duplicates but reduces the amount of round trips overall. -#### Storage +### Storage + The storage implementation should reflect the context. Most messages that will be added will be recent and removed messages will be older ones. @@ -184,26 +301,30 @@ It is expected to be a less likely case than time based insertion and removal. Last but not least it must be optimized for fingerprinting as it is the most often used operation. -#### Sync Interval +### Sync Interval + Ad-hoc syncing can be useful in some cases but continuous periodic sync minimize the differences in messages stored across the network. Syncing early and often is the best strategy. The default used in Nwaku is 5 minutes interval between sync with a range of 1 hour. -#### Sync Window +### Sync Window + By default we offset the sync window by 20 seconds in the past. The actual start of the sync range is T-01:00:20 and the end T-00:00:20 in most cases. This is to handle the inherent jitters of GossipSub. In other words, it is the amount of time needed to confirm if a message is missing or not. -#### Peer Choice +### Peer Choice + Wrong peering strategies can lead to inadvertently segregating peers and reduce sampling diversity. Nwaku randomly select peers to sync with for simplicity and robustness. More sophisticated strategies may be implemented in future. -## Attack Vectors +## Security/Privacy Considerations + Nodes using `WAKU-SYNC` are fully trusted. Message hashes are assumed to be of valid messages received via Waku Relay or Light push.