From cba8da96bfd26a54b75660b33603ef1fb9be4b46 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Thu, 5 Dec 2024 11:34:36 -0500 Subject: [PATCH 01/27] waku sync 2.0 initial draft --- standards/core/sync.md | 193 +++++++++++++++++++++++------------------ 1 file changed, 108 insertions(+), 85 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 644df0c..74c366e 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -11,78 +11,131 @@ contributors: This specification explains the `WAKU-SYNC` protocol which enables the reconciliation of two sets of message hashes in the context of keeping multiple Store nodes synchronized. -Waku Sync is a wrapper around -[Negentropy](https://github.com/hoytech/negentropy) a [range-based set reconciliation protocol](https://logperiodic.com/rbsr.html). ## Specification -**Protocol identifier**: `/vac/waku/sync/1.0.0` +**Protocol identifier**: `/vac/waku/reconciliation/1.0.0` -### Terminology +#### Terminology The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt). -The term Negentropy refers to the protocol of the same name. -Negentropy payload refers to -the messages created by the Negentropy protocol. -Client always refers to the initiator -and the server the receiver of the first payload. +#### Message Ids +Message Ids MUST be composed of the timestamp and the hash of the Waku messages. -### Design Requirements -Nodes enabling Waku Sync SHOULD -manage and keep message hashes in a local cache -for the range of time -during which synchronization is required. -Nodes SHOULD use the same time range, -for Waku we chose one hour as the global default. -Waku Relay or Light Push protocol MAY be enabled -and used in conjunction with Sync -as a source of new message hashes -for the cache. +The timestamp MUST be the time of creation and +the hash MUST follow the +[deterministic message hashing specification](https://rfc.vac.dev/waku/standards/core/14/message#deterministic-message-hashing) -Nodes MAY use the Store protocol -to request missing messages once reconciliation is complete -or to provide messages to requesting clients. +> This way the message Ids can always be totally ordered. +Chronologically according to the timestamp and +disambiguate based on the hash lexical order +in cases where the timestamp is the same. -### Payload +#### Range Bounds +A range MUST consists of 2 Id bounds, the first bound is +inclusive the second bound exclusive. +The first bound MUST be strictly smaller than the second one. -```protobuf -syntax = "proto3"; +#### Range Fingerprinting +The fingerprint of a range MUST be the XOR operation applied to +all the hashes of the Ids included in that range. -package waku.sync.v1; +#### Range Type +Every range MUST have one of the following types; skip, fingerprint or item set. -message SyncPayload { - optional bytes negentropy = 1; +- Skip type is used to signal already processed ranges that MUST be ignored. +- Fingerprint type signify that this range fingerprint MUST be compared when received. +- Item set type contain multiple message Ids that MUST all be compared when received. +> Item sets are an optimization, stopping the recursion early can +save many network roundtrips. - repeated bytes hashes = 20; -} -``` +#### Range Processing +Ranges have to be processed differently acording to their types. -### Session Flow -A client initiates a session with a server -by sending a `SyncPayload` with -only the `negentropy` field set. -This field MUST contain -the first negentropy payload -created by the client -for this session. +- Skip ranges MUST be merged with other consequtive ones if possible. +- Equal fingerprint ranges MUST become skip ranges. +- Unequal fingerprint ranges MUST be splitted into smaller ranges. The new type MAY be either fingerprint or item set. +- Unresolved item set ranges MUST be checked for differences and marked resolved. +- Resolved item set ranges MUST be checked for differences and become skip ranges. -The server receives a `SyncPayload`. -A new negentropy payload is computed from the received one. -The server sends back a `SyncPayload` to the client. +### Delta Encoding +For efficient transmition of timestamps, hashes and ranges. Payloads are delta encoded as follow. -The client receives a `SyncPayload`. -A new negentropy payload OR an empty one is computed. -If a new payload is computed then -the exchanges between client and server continues until -the client computes an empty payload. -This client computation also outputs any hash differences found, -those MUST be stored. -In the case of an empty payload, -the reconciliation is done, -the client MUST send back a `SyncPayload` -with all the missing server hashes in the `hashes` field and -an empty `nengentropy` field. +All ranges to be transmitted MUST be ordered and only upper bounds used. +> Inclusive lower bounds can be omitted because they are always +the same as the exclusive upper bounds of the previous range or zero. + +To achieve this, it MAY be needed to add skip ranges. +> For example, a skip range can be added with +an exclusive upper bound equal to the first range lower bound. +This way the receiving peer knows to ignore the range from zero to the start of the ranges + +Every timestamps after the first MUST be noted as the difference from the previous one. +If the timestamp is the same, zero MUST be used and the hash MUST be added. +The added hash MUST be trucated up to and including the first differetiating byte. + +| Timestamp | Hash | Timestamp (encoded) | Hash (encoded) +| - | - | - | - +| 1000 | 0x4a8a769a... | 1000 | - +| 1002 | 0x351c5e86... | 2 | - +| 1002 | 0x3560d9c4... | 0 | 0x3560 +| 1003 | 0xbeabef25... | 1 | - + +#### Varints +TODO + +## Implementation + +#### Parameters +#TODO fix copy pasta from research issue + +T -> Item set threshold. If a range length is <= than T, all items are sent. Higher T sends more items which means higher chance of duplicates but reduce the amount of round trips overall. + +B -> Partitioning count. When recursively splitting a range, it is split into B sub ranges. Higher B reduce round trips at the cost of computing more fingerprints. + +#### Storage +TODO + +A local cache of message Ids MUST be maintained when the node is online. +This storage MUST keep Ids ordered at all times. + +This storage is critical for the various function of the protocol and should as efficient as possible. +How this storage is implemented however, is outside the scope of this specification. + +TODO mention trees vs arrays??? + +The storage implementation should reflect the Waku context. +Most messages that will be added will be recent and +all removed messages will be older ones. +When differences are found some messages will have to be inserted randomly. +It is expected to be a less likely case than time based insertion and removal. +Last but not least it must be optimized for sequential read +as it is the most often used operation. + +#### Range +TODO + +We also offset the sync range by 20 seconds in the past. +The actual start of the sync range is T-01:00:20 and the end T-00:00:20 +This is to handle the inherent jitters of GossipSub. +In other words, it is the amount of time needed to confirm if a message is missing or not. + +#### Interval +TODO + +Ad-hoc syncing can be useful in some cases but continuous periodic sync +minimize the differences in messages stored across the network. +Syncing early and often is the best strategy. +The default used in nwaku is 5 minutes interval between sync with a range of 1 hour. + +#### Peer Choice +TODO + +Peering strategies can lead to inadvertently segregating peers and reduce sampling diversity. +We randomly select peers to sync with for simplicity and robustness. + +A good strategy can be devised but we chose not to. ## Attack Vectors Nodes using `WAKU-SYNC` are fully trusted. @@ -92,36 +145,6 @@ Further refinements to the protocol are planned to reduce the trust level required to operate. Notably by verifying messages RLN proof at reception. -## Implementation -The following is not part of the specifications but good to know implementation details. - -### Peer Choice -Peering strategies can lead to inadvertently segregating peers and reduce sampling diversity. -We randomly select peers to sync with for simplicity and robustness. - -A good strategy can be devised but we chose not to. - -### Interval -Ad-hoc syncing can be useful in some cases but continuous periodic sync -minimize the differences in messages stored across the network. -Syncing early and often is the best strategy. -The default used in nwaku is 5 minutes interval between sync with a range of 1 hour. - -### Range -We also offset the sync range by 20 seconds in the past. -The actual start of the sync range is T-01:00:20 and the end T-00:00:20 -This is to handle the inherent jitters of GossipSub. -In other words, it is the amount of time needed to confirm if a message is missing or not. - -### Storage -The storage implementation should reflect the Waku context. -Most messages that will be added will be recent and -all removed messages will be older ones. -When differences are found some messages will have to be inserted randomly. -It is expected to be a less likely case than time based insertion and removal. -Last but not least it must be optimized for sequential read -as it is the most often used operation. - ## Copyright Copyright and related rights waived via From 4732b141dce0f3d018d726fdcabb2c64deb1114d Mon Sep 17 00:00:00 2001 From: SionoiS Date: Fri, 6 Dec 2024 09:57:29 -0500 Subject: [PATCH 02/27] payload & transfer --- standards/core/sync.md | 111 ++++++++++++++++++++++++++++------------- 1 file changed, 77 insertions(+), 34 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 74c366e..ce6f658 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -7,19 +7,25 @@ contributors: - Hanno Cornelius --- -## Abstract -This specification explains the `WAKU-SYNC` protocol -which enables the reconciliation of two sets of message hashes -in the context of keeping multiple Store nodes synchronized. +# Abstract +This specification explains `WAKU-SYNC` +which enables the syncronization of messages between 2 Store nodes. -## Specification +# Specification -**Protocol identifier**: `/vac/waku/reconciliation/1.0.0` +Waku Sync consists of 2 protocols; reconciliation and transfer. +Reconciliation is the process of finding differences in 2 sets of message hashes. +Transfer is then used to bilateraly send messages to the other peer. +The end goal being that both peers have the same set of hashes and messages. #### Terminology The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt). +## Reconciliation + +**Libp2p Protocol identifier**: `/vac/waku/reconciliation/1.0.0` + #### Message Ids Message Ids MUST be composed of the timestamp and the hash of the Waku messages. @@ -39,16 +45,16 @@ The first bound MUST be strictly smaller than the second one. #### Range Fingerprinting The fingerprint of a range MUST be the XOR operation applied to -all the hashes of the Ids included in that range. +all the hashes of the messages included in that range. #### Range Type Every range MUST have one of the following types; skip, fingerprint or item set. - Skip type is used to signal already processed ranges that MUST be ignored. -- Fingerprint type signify that this range fingerprint MUST be compared when received. +- Fingerprint type signify that fingerprints MUST be compared when received. - Item set type contain multiple message Ids that MUST all be compared when received. > Item sets are an optimization, stopping the recursion early can -save many network roundtrips. +save network roundtrips. #### Range Processing Ranges have to be processed differently acording to their types. @@ -69,7 +75,7 @@ the same as the exclusive upper bounds of the previous range or zero. To achieve this, it MAY be needed to add skip ranges. > For example, a skip range can be added with an exclusive upper bound equal to the first range lower bound. -This way the receiving peer knows to ignore the range from zero to the start of the ranges +This way the receiving peer knows to ignore the range from zero to the start of the sync window. Every timestamps after the first MUST be noted as the difference from the previous one. If the timestamp is the same, zero MUST be used and the hash MUST be added. @@ -85,7 +91,51 @@ The added hash MUST be trucated up to and including the first differetiating byt #### Varints TODO -## Implementation +#### Payload encoding +The wire level payload MUST be encoded as follow. +> The & denote concatenation + +1. varint bytes of the delta encoded timestamp & +2. if timestamp is zero, delta encoded hash bytes & +3. 1 byte, the range type & +4. either + - 32 bytes fingerprint & + - varint bytes of the item set length & bytes of every items & + - if skip range, nothing + +5. repeat 1 to 4 for all ranges + +## Transfer Protocol + +**Libp2p Protocol identifier**: `/vac/waku/transfer/1.0.0` + +TODO + +should not accept messages from peers not being syncing with. + +should send message as soon as a diff is found. + +in the future verify RLN proof of messages. + +### Wire specification +```protobuf +syntax = "proto3"; + +package waku.sync.v2; + +import "waku/message/v1/message.proto"; + +message WakuMessageAndTopic { + // Full message content and associated pubsub_topic as value + optional waku.message.v1.WakuMessage message = 1; + optional string pubsub_topic = 2; +} +``` + +# Implementation +The flexibitity of the protocol implies that much is left to the implementers. +What will follow is NOT part of the specification. +This section was created to inform implementations. #### Parameters #TODO fix copy pasta from research issue @@ -95,34 +145,26 @@ T -> Item set threshold. If a range length is <= than T, all items are sent. Hig B -> Partitioning count. When recursively splitting a range, it is split into B sub ranges. Higher B reduce round trips at the cost of computing more fingerprints. #### Storage -TODO - -A local cache of message Ids MUST be maintained when the node is online. -This storage MUST keep Ids ordered at all times. - -This storage is critical for the various function of the protocol and should as efficient as possible. -How this storage is implemented however, is outside the scope of this specification. +The storage implementation should reflect the context. +Most messages that will be added will be recent and +removed messages will be older ones. +When differences are found some messages will have to be inserted randomly. +It is expected to be a less likely case than time based insertion and removal. +Last but not least it must be optimized for fingerprinting +as it is the most often used operation. TODO mention trees vs arrays??? -The storage implementation should reflect the Waku context. -Most messages that will be added will be recent and -all removed messages will be older ones. -When differences are found some messages will have to be inserted randomly. -It is expected to be a less likely case than time based insertion and removal. -Last but not least it must be optimized for sequential read -as it is the most often used operation. +#### Sync Window +TODO rephrase -#### Range -TODO - -We also offset the sync range by 20 seconds in the past. +We also offset the sync window by 20 seconds in the past. The actual start of the sync range is T-01:00:20 and the end T-00:00:20 This is to handle the inherent jitters of GossipSub. In other words, it is the amount of time needed to confirm if a message is missing or not. -#### Interval -TODO +#### Sync Interval +TODO rephrase Ad-hoc syncing can be useful in some cases but continuous periodic sync minimize the differences in messages stored across the network. @@ -130,7 +172,7 @@ Syncing early and often is the best strategy. The default used in nwaku is 5 minutes interval between sync with a range of 1 hour. #### Peer Choice -TODO +TODO rephrase Peering strategies can lead to inadvertently segregating peers and reduce sampling diversity. We randomly select peers to sync with for simplicity and robustness. @@ -151,5 +193,6 @@ Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). ## References - - https://logperiodic.com/rbsr.html - - https://github.com/hoytech/negentropy \ No newline at end of file + - [RBSR](https://github.com/AljoschaMeyer/rbsr_short/blob/main/main.pdf) + - [Negentropy Explainer](https://logperiodic.com/rbsr.html) + - [Master Thesis](https://github.com/AljoschaMeyer/master_thesis/blob/main/main.pdf) \ No newline at end of file From 8eb1db73bbb40249a093cee29dbeca7f7adb85aa Mon Sep 17 00:00:00 2001 From: SionoiS Date: Mon, 9 Dec 2024 11:52:40 -0500 Subject: [PATCH 03/27] rephrasing and typos --- standards/core/sync.md | 64 +++++++++++++++++++----------------------- 1 file changed, 29 insertions(+), 35 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index ce6f658..9b74049 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -9,13 +9,13 @@ contributors: # Abstract This specification explains `WAKU-SYNC` -which enables the syncronization of messages between 2 Store nodes. +which enables the synchronization of messages between 2 Store nodes. # Specification Waku Sync consists of 2 protocols; reconciliation and transfer. Reconciliation is the process of finding differences in 2 sets of message hashes. -Transfer is then used to bilateraly send messages to the other peer. +Transfer is then used to bilaterally send messages to the other peer. The end goal being that both peers have the same set of hashes and messages. #### Terminology @@ -57,16 +57,16 @@ Every range MUST have one of the following types; skip, fingerprint or item set. save network roundtrips. #### Range Processing -Ranges have to be processed differently acording to their types. +Ranges have to be processed differently according to their types. -- Skip ranges MUST be merged with other consequtive ones if possible. +- Skip ranges MUST be merged with other consecutive ones if possible. - Equal fingerprint ranges MUST become skip ranges. - Unequal fingerprint ranges MUST be splitted into smaller ranges. The new type MAY be either fingerprint or item set. - Unresolved item set ranges MUST be checked for differences and marked resolved. - Resolved item set ranges MUST be checked for differences and become skip ranges. ### Delta Encoding -For efficient transmition of timestamps, hashes and ranges. Payloads are delta encoded as follow. +For efficient transmission of timestamps, hashes and ranges. Payloads are delta encoded as follow. All ranges to be transmitted MUST be ordered and only upper bounds used. > Inclusive lower bounds can be omitted because they are always @@ -79,7 +79,7 @@ This way the receiving peer knows to ignore the range from zero to the start of Every timestamps after the first MUST be noted as the difference from the previous one. If the timestamp is the same, zero MUST be used and the hash MUST be added. -The added hash MUST be trucated up to and including the first differetiating byte. +The added hash MUST be truncated up to and including the first differentiating byte. | Timestamp | Hash | Timestamp (encoded) | Hash (encoded) | - | - | - | - @@ -109,13 +109,11 @@ The wire level payload MUST be encoded as follow. **Libp2p Protocol identifier**: `/vac/waku/transfer/1.0.0` -TODO - -should not accept messages from peers not being syncing with. - -should send message as soon as a diff is found. - -in the future verify RLN proof of messages. +The transfer protocol SHOULD send messages as soon as +a difference is found via reconciliation. +It MUST only accept messages from peers the node is reconciliating with. +New message Ids MUST be added to the reconciliation protocol. +The payload sent MUST follow the wire specification below. ### Wire specification ```protobuf @@ -133,16 +131,19 @@ message WakuMessageAndTopic { ``` # Implementation -The flexibitity of the protocol implies that much is left to the implementers. +The flexibility of the protocol implies that much is left to the implementers. What will follow is NOT part of the specification. This section was created to inform implementations. #### Parameters -#TODO fix copy pasta from research issue +Two useful parameters to add to your implementation are partitioning count and the item set threshold. -T -> Item set threshold. If a range length is <= than T, all items are sent. Higher T sends more items which means higher chance of duplicates but reduce the amount of round trips overall. +The partitioning count is the number of time a range is splitted. +Higher value reduce round trips at the cost of computing more fingerprints. -B -> Partitioning count. When recursively splitting a range, it is split into B sub ranges. Higher B reduce round trips at the cost of computing more fingerprints. +The threshold for which item sets are sent instead of fingerprints. +Higher value sends more items which means higher chance of duplicates but +reduce the amount of round trips overall. #### Storage The storage implementation should reflect the context. @@ -153,31 +154,24 @@ It is expected to be a less likely case than time based insertion and removal. Last but not least it must be optimized for fingerprinting as it is the most often used operation. -TODO mention trees vs arrays??? - -#### Sync Window -TODO rephrase - -We also offset the sync window by 20 seconds in the past. -The actual start of the sync range is T-01:00:20 and the end T-00:00:20 -This is to handle the inherent jitters of GossipSub. -In other words, it is the amount of time needed to confirm if a message is missing or not. - #### Sync Interval -TODO rephrase - Ad-hoc syncing can be useful in some cases but continuous periodic sync minimize the differences in messages stored across the network. Syncing early and often is the best strategy. -The default used in nwaku is 5 minutes interval between sync with a range of 1 hour. +The default used in Nwaku is 5 minutes interval between sync with a range of 1 hour. + +#### Sync Window +By default we offset the sync window by 20 seconds in the past. +The actual start of the sync range is T-01:00:20 and the end T-00:00:20 in most cases. +This is to handle the inherent jitters of GossipSub. +In other words, it is the amount of time needed to confirm if a message is missing or not. #### Peer Choice -TODO rephrase +Wrong peering strategies can lead to inadvertently segregating peers and +reduce sampling diversity. +Nwaku randomly select peers to sync with for simplicity and robustness. -Peering strategies can lead to inadvertently segregating peers and reduce sampling diversity. -We randomly select peers to sync with for simplicity and robustness. - -A good strategy can be devised but we chose not to. +Good strategies can be devised but we chose not to. ## Attack Vectors Nodes using `WAKU-SYNC` are fully trusted. From 86fa391a22023e175e3b58eee14929d176ccc0f0 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Wed, 11 Dec 2024 16:44:01 -0500 Subject: [PATCH 04/27] rephrasing & varints --- standards/core/sync.md | 77 ++++++++++++++++++++++-------------------- 1 file changed, 41 insertions(+), 36 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 9b74049..4630126 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -9,14 +9,14 @@ contributors: # Abstract This specification explains `WAKU-SYNC` -which enables the synchronization of messages between 2 Store nodes. +which enables the synchronization of messages between nodes storing sets of [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message) # Specification -Waku Sync consists of 2 protocols; reconciliation and transfer. -Reconciliation is the process of finding differences in 2 sets of message hashes. -Transfer is then used to bilaterally send messages to the other peer. -The end goal being that both peers have the same set of hashes and messages. +Waku Sync consists of 2 libp2p protocols; reconciliation and transfer. +The Reconciliation protocol finds differences in sets of messages. +The Transfer protocol is used to exchange the differences found with other peers. +The end goal being that peers have the same set of messages. #### Terminology The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, @@ -26,58 +26,63 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL **Libp2p Protocol identifier**: `/vac/waku/reconciliation/1.0.0` +The protocol finds differences between 2 peers by comparing _fingerprints_ of _ranges_ of messages. +When the _fingerprints_ are different, _ranges_ are splitted into smaller ones. +This process repeats until _ranges_ include a small number of messages. +At this point, messages are individually compared. + #### Message Ids -Message Ids MUST be composed of the timestamp and the hash of the Waku messages. +Message _Ids_ MUST be composed of the timestamp and the hash of the Waku messages. The timestamp MUST be the time of creation and the hash MUST follow the [deterministic message hashing specification](https://rfc.vac.dev/waku/standards/core/14/message#deterministic-message-hashing) -> This way the message Ids can always be totally ordered. -Chronologically according to the timestamp and +> This way the message Ids can always be totally ordered, +first chronologically according to the timestamp and then disambiguate based on the hash lexical order in cases where the timestamp is the same. #### Range Bounds -A range MUST consists of 2 Id bounds, the first bound is +A _range_ MUST consists of 2 _Ids_, the first bound is inclusive the second bound exclusive. The first bound MUST be strictly smaller than the second one. #### Range Fingerprinting -The fingerprint of a range MUST be the XOR operation applied to -all the hashes of the messages included in that range. +The _fingerprint_ of a range MUST be the XOR operation applied to +the hash of all message _IDs_ included in that _range_. #### Range Type -Every range MUST have one of the following types; skip, fingerprint or item set. +Every _range_ MUST have one of the following types; _fingerprint_, _skip_ or _item set_. -- Skip type is used to signal already processed ranges that MUST be ignored. -- Fingerprint type signify that fingerprints MUST be compared when received. -- Item set type contain multiple message Ids that MUST all be compared when received. -> Item sets are an optimization, stopping the recursion early can +- _Fingerprint_ type contains a _fingerprint_. +- _Skip_ type contains nothing and is used to signal already processed _ranges_. +- _Item set_ type contains message _Ids_ and a _resolved_ boolean. +> _Item sets_ are an optimization, stopping the recursion early can save network roundtrips. #### Range Processing -Ranges have to be processed differently according to their types. +_Ranges_ have to be processed differently according to their types and sent back. -- Skip ranges MUST be merged with other consecutive ones if possible. -- Equal fingerprint ranges MUST become skip ranges. -- Unequal fingerprint ranges MUST be splitted into smaller ranges. The new type MAY be either fingerprint or item set. -- Unresolved item set ranges MUST be checked for differences and marked resolved. -- Resolved item set ranges MUST be checked for differences and become skip ranges. +- _Skip_ ranges MUST be merged with other consecutive _skip ranges_. +- **Equal** _fingerprint ranges_ MUST become _skip ranges_. +- **Unequal** _fingerprint ranges_ MUST be splitted into smaller ranges. The new type MAY be either _fingerprint_ or _item set_. +- **Unresolved** _item set_ ranges MUST be checked for differences and marked resolved. +- **Resolved** _item set_ ranges MUST be checked for differences and become skip ranges. ### Delta Encoding -For efficient transmission of timestamps, hashes and ranges. Payloads are delta encoded as follow. +Payloads MUST be delta encoded as follows for efficient transmission of _IDs_ and _ranges_. -All ranges to be transmitted MUST be ordered and only upper bounds used. +All _ranges_ to be transmitted MUST be ordered and only upper bounds used. > Inclusive lower bounds can be omitted because they are always the same as the exclusive upper bounds of the previous range or zero. -To achieve this, it MAY be needed to add skip ranges. -> For example, a skip range can be added with +To achieve this, it MAY be needed to add _skip ranges_. +> For example, a _skip range_ can be added with an exclusive upper bound equal to the first range lower bound. -This way the receiving peer knows to ignore the range from zero to the start of the sync window. +This way the receiving peer knows to ignore the range from zero to the start of the sync time window. -Every timestamps after the first MUST be noted as the difference from the previous one. +Every _ID_'s timestamps after the first MUST be noted as the difference from the previous one. If the timestamp is the same, zero MUST be used and the hash MUST be added. The added hash MUST be truncated up to and including the first differentiating byte. @@ -89,19 +94,19 @@ The added hash MUST be truncated up to and including the first differentiating b | 1003 | 0xbeabef25... | 1 | - #### Varints -TODO +All _varints_ MUST be little-endian base 128 variable length integers (LEB128) and minimally encoded. #### Payload encoding The wire level payload MUST be encoded as follow. > The & denote concatenation -1. varint bytes of the delta encoded timestamp & -2. if timestamp is zero, delta encoded hash bytes & -3. 1 byte, the range type & +1. _varint_ bytes of the delta encoded timestamp & +2. if timestamp is zero, 1 byte for the hash bytes length & the hash bytes & +3. 1 byte, the _range_ type & 4. either - - 32 bytes fingerprint & - - varint bytes of the item set length & bytes of every items & - - if skip range, nothing + - 32 bytes _fingerprint_ & + - _varint_ bytes of the item set length & bytes of every items & + - if _skip range_, nothing 5. repeat 1 to 4 for all ranges @@ -171,7 +176,7 @@ Wrong peering strategies can lead to inadvertently segregating peers and reduce sampling diversity. Nwaku randomly select peers to sync with for simplicity and robustness. -Good strategies can be devised but we chose not to. +More sophisticated strategies may be implemented in future. ## Attack Vectors Nodes using `WAKU-SYNC` are fully trusted. From fdde545b3cab5595afdbd0671a3f5649bb338109 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Tue, 7 Jan 2025 10:49:01 -0500 Subject: [PATCH 05/27] waku message and IDs clarification --- standards/core/sync.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 4630126..296db6a 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -26,13 +26,13 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL **Libp2p Protocol identifier**: `/vac/waku/reconciliation/1.0.0` -The protocol finds differences between 2 peers by comparing _fingerprints_ of _ranges_ of messages. +The protocol finds differences between 2 peers by comparing _fingerprints_ of _ranges_ of message _IDs_. When the _fingerprints_ are different, _ranges_ are splitted into smaller ones. -This process repeats until _ranges_ include a small number of messages. -At this point, messages are individually compared. +This process repeats until _ranges_ include a small number of message _IDs_. +At this point, message _IDs_ are individually compared. #### Message Ids -Message _Ids_ MUST be composed of the timestamp and the hash of the Waku messages. +Message _Ids_ MUST be composed of the timestamp and the hash of the [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). The timestamp MUST be the time of creation and the hash MUST follow the From f0f37b642dde41eb0e3c7cfd44dcf35f7a300ab6 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Wed, 8 Jan 2025 08:53:15 -0500 Subject: [PATCH 06/27] rephrasing --- standards/core/sync.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 296db6a..020a006 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -26,13 +26,15 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL **Libp2p Protocol identifier**: `/vac/waku/reconciliation/1.0.0` -The protocol finds differences between 2 peers by comparing _fingerprints_ of _ranges_ of message _IDs_. -When the _fingerprints_ are different, _ranges_ are splitted into smaller ones. -This process repeats until _ranges_ include a small number of message _IDs_. -At this point, message _IDs_ are individually compared. +The protocol finds differences between 2 peers by +comparing _fingerprints_ of _ranges_ of message _IDs_. +_Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, splitted into smaller ones. +This process repeats until _ranges_ include a small number of messages +at this point _IDs_ are sent instead of _fingerprints_. +When received, _IDs_ are individually compared for differences. #### Message Ids -Message _Ids_ MUST be composed of the timestamp and the hash of the [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). +Message _IDs_ MUST be composed of the timestamp and the hash of the [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). The timestamp MUST be the time of creation and the hash MUST follow the @@ -44,7 +46,7 @@ disambiguate based on the hash lexical order in cases where the timestamp is the same. #### Range Bounds -A _range_ MUST consists of 2 _Ids_, the first bound is +A _range_ MUST consists of 2 _IDs_, the first bound is inclusive the second bound exclusive. The first bound MUST be strictly smaller than the second one. @@ -98,7 +100,7 @@ All _varints_ MUST be little-endian base 128 variable length integers (LEB128) a #### Payload encoding The wire level payload MUST be encoded as follow. -> The & denote concatenation +> The & denote concatenation. 1. _varint_ bytes of the delta encoded timestamp & 2. if timestamp is zero, 1 byte for the hash bytes length & the hash bytes & @@ -106,9 +108,9 @@ The wire level payload MUST be encoded as follow. 4. either - 32 bytes _fingerprint_ & - _varint_ bytes of the item set length & bytes of every items & - - if _skip range_, nothing + - if _skip range_, do nothing -5. repeat 1 to 4 for all ranges +5. repeat steps 1 to 4 for all ranges. ## Transfer Protocol @@ -117,7 +119,7 @@ The wire level payload MUST be encoded as follow. The transfer protocol SHOULD send messages as soon as a difference is found via reconciliation. It MUST only accept messages from peers the node is reconciliating with. -New message Ids MUST be added to the reconciliation protocol. +New message IDs MUST be added to the reconciliation protocol. The payload sent MUST follow the wire specification below. ### Wire specification From e91e99d251a530394e3f71b2efaef0ed5128dcd8 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Wed, 8 Jan 2025 09:44:36 -0500 Subject: [PATCH 07/27] overview --- standards/core/sync.md | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 020a006..35e3756 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -30,8 +30,17 @@ The protocol finds differences between 2 peers by comparing _fingerprints_ of _ranges_ of message _IDs_. _Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, splitted into smaller ones. This process repeats until _ranges_ include a small number of messages -at this point _IDs_ are sent instead of _fingerprints_. -When received, _IDs_ are individually compared for differences. +at this point _IDs_ are sent for comparison instead of _fingerprints_. + +#### Overview + +1. The requestor choose a time range to sync. +2. The range is encoded into a payload and sent. +3. The requestee receive the payload and decode it. +4. The range is processed and more ranges produced. +5. The new ranges are encoded and sent. +6. Payloads are repeatedly exchanged and differences between the peers are discovered. +7. The synchronization is done when no ranges are left to process. #### Message Ids Message _IDs_ MUST be composed of the timestamp and the hash of the [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). @@ -60,17 +69,19 @@ Every _range_ MUST have one of the following types; _fingerprint_, _skip_ or _it - _Fingerprint_ type contains a _fingerprint_. - _Skip_ type contains nothing and is used to signal already processed _ranges_. - _Item set_ type contains message _Ids_ and a _resolved_ boolean. -> _Item sets_ are an optimization, stopping the recursion early can -save network roundtrips. +> _Item sets_ are an optimization, sending multiple _IDs_ instead of +recursing further reduce the number of round-trips. #### Range Processing -_Ranges_ have to be processed differently according to their types and sent back. +_Ranges_ have to be processed differently according to their types. -- _Skip_ ranges MUST be merged with other consecutive _skip ranges_. - **Equal** _fingerprint ranges_ MUST become _skip ranges_. - **Unequal** _fingerprint ranges_ MUST be splitted into smaller ranges. The new type MAY be either _fingerprint_ or _item set_. - **Unresolved** _item set_ ranges MUST be checked for differences and marked resolved. - **Resolved** _item set_ ranges MUST be checked for differences and become skip ranges. +- _Skip_ ranges MUST be merged with other consecutive _skip ranges_. + +In the case where only skip ranges remains, the synchronization is done. ### Delta Encoding Payloads MUST be delta encoded as follows for efficient transmission of _IDs_ and _ranges_. From 3a5598cb4d7b3759ffaa3aee6d837e880a7c4b12 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Thu, 9 Jan 2025 08:00:37 -0500 Subject: [PATCH 08/27] fix --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 35e3756..f1fa65b 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -84,7 +84,7 @@ _Ranges_ have to be processed differently according to their types. In the case where only skip ranges remains, the synchronization is done. ### Delta Encoding -Payloads MUST be delta encoded as follows for efficient transmission of _IDs_ and _ranges_. +_Ranges_ and timestamps MUST be delta encoded as follows for efficient transmission. All _ranges_ to be transmitted MUST be ordered and only upper bounds used. > Inclusive lower bounds can be omitted because they are always From 46acd038520435ebf6338e2a419df473f85934e5 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Mon, 20 Jan 2025 10:55:11 -0500 Subject: [PATCH 09/27] update proto package name --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index f1fa65b..7586598 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -137,7 +137,7 @@ The payload sent MUST follow the wire specification below. ```protobuf syntax = "proto3"; -package waku.sync.v2; +package waku.sync.transfer.v1; import "waku/message/v1/message.proto"; From 7d730e0ba304a170cdfc945a2dafdbf29c617a17 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:29:50 -0500 Subject: [PATCH 10/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 7586598..d5ae610 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -13,7 +13,7 @@ which enables the synchronization of messages between nodes storing sets of [`14 # Specification -Waku Sync consists of 2 libp2p protocols; reconciliation and transfer. +Waku Sync consists of two libp2p protocols: `reconciliation` and `transfer`. The Reconciliation protocol finds differences in sets of messages. The Transfer protocol is used to exchange the differences found with other peers. The end goal being that peers have the same set of messages. From 2f8bd43da52ed8051b3891902fce2d91fabc518b Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:30:00 -0500 Subject: [PATCH 11/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index d5ae610..03ea315 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -26,7 +26,7 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL **Libp2p Protocol identifier**: `/vac/waku/reconciliation/1.0.0` -The protocol finds differences between 2 peers by +The protocol finds differences between two peers by comparing _fingerprints_ of _ranges_ of message _IDs_. _Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, splitted into smaller ones. This process repeats until _ranges_ include a small number of messages From b0fc86e7ea97c24da18ade4f810b10a750c478f3 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:31:10 -0500 Subject: [PATCH 12/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 03ea315..c7b95a8 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -28,7 +28,7 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL The protocol finds differences between two peers by comparing _fingerprints_ of _ranges_ of message _IDs_. -_Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, splitted into smaller ones. +_Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, split into smaller (sub)ranges. This process repeats until _ranges_ include a small number of messages at this point _IDs_ are sent for comparison instead of _fingerprints_. From 3d33b5973ded21cae0f68d7239ad31bc6eb38676 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:32:53 -0500 Subject: [PATCH 13/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index c7b95a8..84c9047 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -33,7 +33,7 @@ This process repeats until _ranges_ include a small number of messages at this point _IDs_ are sent for comparison instead of _fingerprints_. #### Overview - +The `reconciliation` protocol follows the following heuristic: 1. The requestor choose a time range to sync. 2. The range is encoded into a payload and sent. 3. The requestee receive the payload and decode it. From 83c5a535e8dc6a675e0111909cb8db82ba01b351 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:34:47 -0500 Subject: [PATCH 14/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 84c9047..96b654b 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -36,7 +36,7 @@ at this point _IDs_ are sent for comparison instead of _fingerprints_. The `reconciliation` protocol follows the following heuristic: 1. The requestor choose a time range to sync. 2. The range is encoded into a payload and sent. -3. The requestee receive the payload and decode it. +3. The requestee receives the payload and decodes it. 4. The range is processed and more ranges produced. 5. The new ranges are encoded and sent. 6. Payloads are repeatedly exchanged and differences between the peers are discovered. From 0c7addadfbe98602228222f2d183f83d4d590223 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:35:42 -0500 Subject: [PATCH 15/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 96b654b..c4e1939 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -37,7 +37,7 @@ The `reconciliation` protocol follows the following heuristic: 1. The requestor choose a time range to sync. 2. The range is encoded into a payload and sent. 3. The requestee receives the payload and decodes it. -4. The range is processed and more ranges produced. +4. The range is processed and, if a difference with the local range is detected, a set of subranges are produced. 5. The new ranges are encoded and sent. 6. Payloads are repeatedly exchanged and differences between the peers are discovered. 7. The synchronization is done when no ranges are left to process. From b3730d0b7d502e00d31edca55f19101213d165b3 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:42:05 -0500 Subject: [PATCH 16/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index c4e1939..adf1c51 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -42,7 +42,7 @@ The `reconciliation` protocol follows the following heuristic: 6. Payloads are repeatedly exchanged and differences between the peers are discovered. 7. The synchronization is done when no ranges are left to process. -#### Message Ids +#### Message IDs Message _IDs_ MUST be composed of the timestamp and the hash of the [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). The timestamp MUST be the time of creation and From 25e071da001b0fb7446295db890a4aec87068697 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:42:25 -0500 Subject: [PATCH 17/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index adf1c51..0a86da3 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -49,7 +49,7 @@ The timestamp MUST be the time of creation and the hash MUST follow the [deterministic message hashing specification](https://rfc.vac.dev/waku/standards/core/14/message#deterministic-message-hashing) -> This way the message Ids can always be totally ordered, +> This way the message IDs can always be totally ordered, first chronologically according to the timestamp and then disambiguate based on the hash lexical order in cases where the timestamp is the same. From 99039de7bee1d5f2edd168f69669420ac0519c4b Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:42:50 -0500 Subject: [PATCH 18/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 0a86da3..d1a3a5c 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -51,7 +51,7 @@ the hash MUST follow the > This way the message IDs can always be totally ordered, first chronologically according to the timestamp and then -disambiguate based on the hash lexical order +disambiguated based on the hash lexical order in cases where the timestamp is the same. #### Range Bounds From 364b27cc91ceb12445f65aa5abb3fcb4cc3e6671 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:43:00 -0500 Subject: [PATCH 19/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index d1a3a5c..6758b48 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -156,7 +156,7 @@ This section was created to inform implementations. #### Parameters Two useful parameters to add to your implementation are partitioning count and the item set threshold. -The partitioning count is the number of time a range is splitted. +The partitioning count is the number of time a range is split. Higher value reduce round trips at the cost of computing more fingerprints. The threshold for which item sets are sent instead of fingerprints. From abcc128da0a1f57a1eb13df15efdb251c8c2feb2 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:43:21 -0500 Subject: [PATCH 20/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 6758b48..aaad19c 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -157,7 +157,7 @@ This section was created to inform implementations. Two useful parameters to add to your implementation are partitioning count and the item set threshold. The partitioning count is the number of time a range is split. -Higher value reduce round trips at the cost of computing more fingerprints. +A higher value reduces round trips at the cost of computing more fingerprints. The threshold for which item sets are sent instead of fingerprints. Higher value sends more items which means higher chance of duplicates but From 0ee6778beeca5391dfec1e9aafa9e3c0cdd6e168 Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:43:50 -0500 Subject: [PATCH 21/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index aaad19c..c6b9bb8 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -159,7 +159,7 @@ Two useful parameters to add to your implementation are partitioning count and t The partitioning count is the number of time a range is split. A higher value reduces round trips at the cost of computing more fingerprints. -The threshold for which item sets are sent instead of fingerprints. +The item set threshold determines when item sets are sent instead of fingerprints. Higher value sends more items which means higher chance of duplicates but reduce the amount of round trips overall. From 299d608b1b6db990c86d53d2cfa83cf8b8615e5f Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:44:04 -0500 Subject: [PATCH 22/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index c6b9bb8..793b8c3 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -160,7 +160,7 @@ The partitioning count is the number of time a range is split. A higher value reduces round trips at the cost of computing more fingerprints. The item set threshold determines when item sets are sent instead of fingerprints. -Higher value sends more items which means higher chance of duplicates but +A higher value sends more items which means higher chance of duplicates but reduce the amount of round trips overall. #### Storage From dcbb1d44a01e3ccb434f36853220f56c8792525b Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:44:18 -0500 Subject: [PATCH 23/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 793b8c3..b428d82 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -161,7 +161,7 @@ A higher value reduces round trips at the cost of computing more fingerprints. The item set threshold determines when item sets are sent instead of fingerprints. A higher value sends more items which means higher chance of duplicates but -reduce the amount of round trips overall. +reduces the amount of round trips overall. #### Storage The storage implementation should reflect the context. From da79ddb1bd9bae08ab2f8c4ff8d0683e598f7e7a Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:44:40 -0500 Subject: [PATCH 24/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index b428d82..3fb7465 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -55,7 +55,7 @@ disambiguated based on the hash lexical order in cases where the timestamp is the same. #### Range Bounds -A _range_ MUST consists of 2 _IDs_, the first bound is +A _range_ MUST consist of two _IDs_, the first bound is inclusive the second bound exclusive. The first bound MUST be strictly smaller than the second one. From 0d655f7023fd429df47db257ba80a79249644acd Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 09:46:59 -0500 Subject: [PATCH 25/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 3fb7465..ceaeade 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -117,8 +117,8 @@ The wire level payload MUST be encoded as follow. 2. if timestamp is zero, 1 byte for the hash bytes length & the hash bytes & 3. 1 byte, the _range_ type & 4. either - - 32 bytes _fingerprint_ & - - _varint_ bytes of the item set length & bytes of every items & + - 32 bytes _fingerprint_ or + - _varint_ bytes of the item set length & bytes of every items or - if _skip range_, do nothing 5. repeat steps 1 to 4 for all ranges. From ebb90d32bb9e8b01cdbcba90d9eec93697fd893d Mon Sep 17 00:00:00 2001 From: Simon-Pierre Vivier Date: Wed, 22 Jan 2025 10:12:36 -0500 Subject: [PATCH 26/27] Update standards/core/sync.md Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com> --- standards/core/sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index ceaeade..53c5750 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -68,7 +68,7 @@ Every _range_ MUST have one of the following types; _fingerprint_, _skip_ or _it - _Fingerprint_ type contains a _fingerprint_. - _Skip_ type contains nothing and is used to signal already processed _ranges_. -- _Item set_ type contains message _Ids_ and a _resolved_ boolean. +- _Item set_ type contains message _IDs_ and a _resolved_ boolean. > _Item sets_ are an optimization, sending multiple _IDs_ instead of recursing further reduce the number of round-trips. From 27aa8bc9c22c85797342aafb193c0100b8bcb7f1 Mon Sep 17 00:00:00 2001 From: SionoiS Date: Wed, 22 Jan 2025 10:55:33 -0500 Subject: [PATCH 27/27] rephrasing --- standards/core/sync.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/standards/core/sync.md b/standards/core/sync.md index 53c5750..fd6afdf 100644 --- a/standards/core/sync.md +++ b/standards/core/sync.md @@ -29,18 +29,18 @@ The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL The protocol finds differences between two peers by comparing _fingerprints_ of _ranges_ of message _IDs_. _Ranges_ are encoded into payloads, exchanged between the peers and when the range _fingerprints_ are different, split into smaller (sub)ranges. -This process repeats until _ranges_ include a small number of messages -at this point _IDs_ are sent for comparison instead of _fingerprints_. +This process repeats until _ranges_ include a small number of messages. +At this point lists of message _IDs_ are sent for comparison instead of _fingerprints_ over entire ranges of messages. #### Overview The `reconciliation` protocol follows the following heuristic: -1. The requestor choose a time range to sync. +1. The requestor chooses a time range to sync. 2. The range is encoded into a payload and sent. 3. The requestee receives the payload and decodes it. 4. The range is processed and, if a difference with the local range is detected, a set of subranges are produced. 5. The new ranges are encoded and sent. -6. Payloads are repeatedly exchanged and differences between the peers are discovered. -7. The synchronization is done when no ranges are left to process. +6. This process repeats while differences found are sent to the `transfer` protocol. +7. The synchronization ends when all ranges have been processed and no differences are left. #### Message IDs Message _IDs_ MUST be composed of the timestamp and the hash of the [`14/WAKU2-MESSAGE`](https://rfc.vac.dev/waku/standards/core/14/message). @@ -75,10 +75,11 @@ recursing further reduce the number of round-trips. #### Range Processing _Ranges_ have to be processed differently according to their types. -- **Equal** _fingerprint ranges_ MUST become _skip ranges_. -- **Unequal** _fingerprint ranges_ MUST be splitted into smaller ranges. The new type MAY be either _fingerprint_ or _item set_. -- **Unresolved** _item set_ ranges MUST be checked for differences and marked resolved. -- **Resolved** _item set_ ranges MUST be checked for differences and become skip ranges. +- _Fingerprint_ ranges MUST be compared. + - **Equal** ranges MUST become _skip ranges_. + - **Unequal** ranges MUST be split into smaller _fingerprint_ or _item set_ ranges based on a implementation specific threshold. +- **Unresolved** _item set_ ranges MUST be compared, differences sent to the `transfer` protocol and marked resolved. +- **Resolved** _item set_ ranges MUST be compared, differences sent to the `transfer` protocol and become skip ranges. - _Skip_ ranges MUST be merged with other consecutive _skip ranges_. In the case where only skip ranges remains, the synchronization is done.