From b6e5b7418a89909e1381c37c3adf9668c7ee191d Mon Sep 17 00:00:00 2001 From: SionoiS Date: Mon, 3 Mar 2025 16:12:37 -0500 Subject: [PATCH 1/4] full topic support --- standards/core/sync.md | 66 +++++++++++++++++++++++++++++------------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 3fec88b..eabf83a 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -28,13 +28,16 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL The protocol finds differences between two peers by comparing _fingerprints_ of _ranges_ of message _IDs_. -_Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, split into smaller (sub)ranges. +_Ranges_ are encoded into payloads, +exchanged between the peers and when +the range _fingerprints_ are different, split into smaller (sub)ranges. This process repeats until _ranges_ include a small number of messages. -At this point lists of message _IDs_ are sent for comparison instead of _fingerprints_ over entire ranges of messages. +At this point lists of message _IDs_ are sent for comparison +instead of _fingerprints_ over entire ranges of messages. #### Overview -The `reconciliation` protocol follows the following heuristic: -1. The requestor chooses a time range to sync. +The `reconciliation` protocol operate on the following heuristic: +1. The requestor chooses a sync range including time, cluster id and topics. 2. The range is encoded into a payload and sent. 3. The requestee receives the payload and decodes it. 4. The range is processed and, if a difference with the local range is detected, a set of subranges are produced. @@ -72,6 +75,12 @@ Every _range_ MUST have one of the following types; _fingerprint_, _skip_ or _it > _Item sets_ are an optimization, sending multiple _IDs_ instead of recursing further reduce the number of round-trips. +#### Sync Scope +On payload reception, the intersection of the two peers topic sets is used as the sync scope. +If the intersection is empty the sync is aborted. +If a peer does not specify a set of supported topics, +the protocol assumes all topics are supproted for that peer. + #### Range Processing _Ranges_ have to be processed differently according to their types. @@ -112,23 +121,41 @@ All _varints_ MUST be little-endian base 128 variable length integers (LEB128) a #### Payload encoding The wire level payload MUST be encoded as follow. -> The & denote concatenation. > Refer to [RELAY-SHARDING](https://github.com/waku-org/specs/blob/master/standards/core/relay-sharding.md#static-sharding) RFC for cluster and shard specification. -1. _varint_ bytes of the node's cluster ID & -2. _varint_ bytes of the node's number of shard supported & -3. _varint_ bytes for each shard index supported & -4. _varint_ bytes of the delta encoded timestamp & -5. if timestamp is zero, 1 byte for the hash bytes length & the hash bytes & -6. 1 byte, the _range_ type & -7. either - - 32 bytes _fingerprint_ or - - _varint_ bytes of the item set length & bytes of every items or - - if _skip range_, do nothing +> Please note that for each steps, bytes are concatenated. -8. repeat steps 4 to 7 for all ranges. +1. _Varint_ bytes of the node's cluster ID +2. _Varint_ bytes of the number of pubsub topics +3. For each pubsub topic, if any + + a. _varint_ bytes of the pubsub topic length + + b. bytes content of the pubsub topic + +4. _Varint_ bytes of the number of content topics +5. For each content topic, if any + + a. _varint_ bytes of the content topic length + + b. bytes content of the content topic + +6. For each range + + a. _varint_ bytes of the delta encoded timestamp + + b. if timestamp is zero + - 1 byte for the partial hash length + - the hash bytes content + + c. 1 byte, the _range_ type + + d. either + - 32 bytes _fingerprint_ or + - _varint_ bytes of the item set length and the bytes of every items or + - if _skip range_, nothing ## Transfer Protocol @@ -160,10 +187,9 @@ The flexibility of the protocol implies that much is left to the implementers. What will follow is NOT part of the specification. This section was created to inform implementations. -#### Cluster & Shards -To prevent nodes from synchronizing messages from shard they don't support, -cluster and shards information has been added to each payload. -On reception, if two peers don't share the same set of shards the sync is aborted. +#### Cluster, Pubsub and Content Topics +To prevent nodes from synchronizing messages from cluster and topics they don't support, +cluster and topics information is added to each payload. #### Parameters Two useful parameters to add to your implementation are partitioning count and the item set threshold. From a0d99497f22b82b5b4179e70e2ed52a0e9a10237 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Tue, 4 Mar 2025 11:35:48 -0500 Subject: [PATCH 2/4] explicit topic naming --- standards/core/sync.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index eabf83a..123695d 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -37,7 +37,7 @@ instead of _fingerprints_ over entire ranges of messages. #### Overview The `reconciliation` protocol operate on the following heuristic: -1. The requestor chooses a sync range including time, cluster id and topics. +1. The requestor chooses a sync range including time, cluster id, pubsub and content topics. 2. The range is encoded into a payload and sent. 3. The requestee receives the payload and decodes it. 4. The range is processed and, if a difference with the local range is detected, a set of subranges are produced. @@ -76,10 +76,10 @@ Every _range_ MUST have one of the following types; _fingerprint_, _skip_ or _it recursing further reduce the number of round-trips. #### Sync Scope -On payload reception, the intersection of the two peers topic sets is used as the sync scope. +On payload reception, the intersection of the two peers pubsub and content topic sets is used as the sync scope. If the intersection is empty the sync is aborted. -If a peer does not specify a set of supported topics, -the protocol assumes all topics are supproted for that peer. +If a peer does not specify a set of supported pubsub or content topics, +the protocol assumes all pubsub or content topics are supported for that peer. #### Range Processing _Ranges_ have to be processed differently according to their types. @@ -188,8 +188,8 @@ What will follow is NOT part of the specification. This section was created to inform implementations. #### Cluster, Pubsub and Content Topics -To prevent nodes from synchronizing messages from cluster and topics they don't support, -cluster and topics information is added to each payload. +To prevent nodes from synchronizing messages from cluster, pubsub and content topics they don't support, +cluster and topic information is added to each payload. #### Parameters Two useful parameters to add to your implementation are partitioning count and the item set threshold. From 484ccbb5ca27db5f0c218476be0c0d2fbfa48bca Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Mon, 31 Mar 2025 10:05:20 -0400 Subject: [PATCH 3/4] Update standards/core/sync.md Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 123695d..59049d6 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -36,7 +36,7 @@ At this point lists of message _IDs_ are sent for comparison instead of _fingerprints_ over entire ranges of messages. #### Overview -The `reconciliation` protocol operate on the following heuristic: +The `reconciliation` protocol operates on the following heuristic: 1. The requestor chooses a sync range including time, cluster id, pubsub and content topics. 2. The range is encoded into a payload and sent. 3. The requestee receives the payload and decodes it. From 85cef2cdd16756785ebfcb2c5f875aeed9cf38e4 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Thu, 10 Apr 2025 10:19:12 -0400 Subject: [PATCH 4/4] remove cluster --- standards/core/sync.md | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 59049d6..b219fef 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -36,8 +36,8 @@ At this point lists of message _IDs_ are sent for comparison instead of _fingerprints_ over entire ranges of messages. #### Overview -The `reconciliation` protocol operates on the following heuristic: -1. The requestor chooses a sync range including time, cluster id, pubsub and content topics. +The `reconciliation` protocol operate on the following heuristic: +1. The requestor chooses a sync range including time, id, pubsub and content topics. 2. The range is encoded into a payload and sent. 3. The requestee receives the payload and decodes it. 4. The range is processed and, if a difference with the local range is detected, a set of subranges are produced. @@ -122,27 +122,23 @@ All _varints_ MUST be little-endian base 128 variable length integers (LEB128) a #### Payload encoding The wire level payload MUST be encoded as follow. -> Refer to [RELAY-SHARDING](https://github.com/waku-org/specs/blob/master/standards/core/relay-sharding.md#static-sharding) -RFC for cluster and shard specification. - > Please note that for each steps, bytes are concatenated. -1. _Varint_ bytes of the node's cluster ID -2. _Varint_ bytes of the number of pubsub topics -3. For each pubsub topic, if any +1. _Varint_ bytes of the number of pubsub topics +2. For each pubsub topic, if any a. _varint_ bytes of the pubsub topic length b. bytes content of the pubsub topic -4. _Varint_ bytes of the number of content topics -5. For each content topic, if any +3. _Varint_ bytes of the number of content topics +4. For each content topic, if any a. _varint_ bytes of the content topic length b. bytes content of the content topic -6. For each range +5. For each range a. _varint_ bytes of the delta encoded timestamp @@ -187,9 +183,9 @@ The flexibility of the protocol implies that much is left to the implementers. What will follow is NOT part of the specification. This section was created to inform implementations. -#### Cluster, Pubsub and Content Topics -To prevent nodes from synchronizing messages from cluster, pubsub and content topics they don't support, -cluster and topic information is added to each payload. +#### Pubsub and Content Topics +To prevent nodes from synchronizing messages from pubsub and content topics they don't support, +topic information is added to each payload. #### Parameters Two useful parameters to add to your implementation are partitioning count and the item set threshold.