2024-06-12 11:06:17 -04:00
|
|
|
---
|
|
|
|
|
title: WAKU-SYNC
|
|
|
|
|
name: Waku Sync
|
|
|
|
|
editor: Simon-Pierre Vivier <simvivier@status.im>
|
|
|
|
|
contributors:
|
|
|
|
|
- Prem Chaitanya Prathi <prem@status.im>
|
|
|
|
|
- Hanno Cornelius <hanno@status.im>
|
|
|
|
|
---
|
|
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
# Abstract
|
|
|
|
|
This specification explains `WAKU-SYNC`
|
2024-12-09 11:52:40 -05:00
|
|
|
which enables the synchronization of messages between 2 Store nodes.
|
2024-06-12 11:06:17 -04:00
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
# Specification
|
2024-06-12 11:06:17 -04:00
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
Waku Sync consists of 2 protocols; reconciliation and transfer.
|
|
|
|
|
Reconciliation is the process of finding differences in 2 sets of message hashes.
|
2024-12-09 11:52:40 -05:00
|
|
|
Transfer is then used to bilaterally send messages to the other peer.
|
2024-12-06 09:57:29 -05:00
|
|
|
The end goal being that both peers have the same set of hashes and messages.
|
2024-06-12 11:06:17 -04:00
|
|
|
|
2024-12-05 11:34:36 -05:00
|
|
|
#### Terminology
|
2024-06-12 11:06:17 -04:00
|
|
|
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”,
|
|
|
|
|
“RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt).
|
|
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
## Reconciliation
|
|
|
|
|
|
|
|
|
|
**Libp2p Protocol identifier**: `/vac/waku/reconciliation/1.0.0`
|
|
|
|
|
|
2024-12-05 11:34:36 -05:00
|
|
|
#### Message Ids
|
|
|
|
|
Message Ids MUST be composed of the timestamp and the hash of the Waku messages.
|
2024-06-12 11:06:17 -04:00
|
|
|
|
2024-12-05 11:34:36 -05:00
|
|
|
The timestamp MUST be the time of creation and
|
|
|
|
|
the hash MUST follow the
|
|
|
|
|
[deterministic message hashing specification](https://rfc.vac.dev/waku/standards/core/14/message#deterministic-message-hashing)
|
2024-06-12 11:06:17 -04:00
|
|
|
|
2024-12-05 11:34:36 -05:00
|
|
|
> This way the message Ids can always be totally ordered.
|
|
|
|
|
Chronologically according to the timestamp and
|
|
|
|
|
disambiguate based on the hash lexical order
|
|
|
|
|
in cases where the timestamp is the same.
|
|
|
|
|
|
|
|
|
|
#### Range Bounds
|
|
|
|
|
A range MUST consists of 2 Id bounds, the first bound is
|
|
|
|
|
inclusive the second bound exclusive.
|
|
|
|
|
The first bound MUST be strictly smaller than the second one.
|
|
|
|
|
|
|
|
|
|
#### Range Fingerprinting
|
|
|
|
|
The fingerprint of a range MUST be the XOR operation applied to
|
2024-12-06 09:57:29 -05:00
|
|
|
all the hashes of the messages included in that range.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
|
|
|
|
#### Range Type
|
|
|
|
|
Every range MUST have one of the following types; skip, fingerprint or item set.
|
|
|
|
|
|
|
|
|
|
- Skip type is used to signal already processed ranges that MUST be ignored.
|
2024-12-06 09:57:29 -05:00
|
|
|
- Fingerprint type signify that fingerprints MUST be compared when received.
|
2024-12-05 11:34:36 -05:00
|
|
|
- Item set type contain multiple message Ids that MUST all be compared when received.
|
|
|
|
|
> Item sets are an optimization, stopping the recursion early can
|
2024-12-06 09:57:29 -05:00
|
|
|
save network roundtrips.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
|
|
|
|
#### Range Processing
|
2024-12-09 11:52:40 -05:00
|
|
|
Ranges have to be processed differently according to their types.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
2024-12-09 11:52:40 -05:00
|
|
|
- Skip ranges MUST be merged with other consecutive ones if possible.
|
2024-12-05 11:34:36 -05:00
|
|
|
- Equal fingerprint ranges MUST become skip ranges.
|
|
|
|
|
- Unequal fingerprint ranges MUST be splitted into smaller ranges. The new type MAY be either fingerprint or item set.
|
|
|
|
|
- Unresolved item set ranges MUST be checked for differences and marked resolved.
|
|
|
|
|
- Resolved item set ranges MUST be checked for differences and become skip ranges.
|
|
|
|
|
|
|
|
|
|
### Delta Encoding
|
2024-12-09 11:52:40 -05:00
|
|
|
For efficient transmission of timestamps, hashes and ranges. Payloads are delta encoded as follow.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
|
|
|
|
All ranges to be transmitted MUST be ordered and only upper bounds used.
|
|
|
|
|
> Inclusive lower bounds can be omitted because they are always
|
|
|
|
|
the same as the exclusive upper bounds of the previous range or zero.
|
|
|
|
|
|
|
|
|
|
To achieve this, it MAY be needed to add skip ranges.
|
|
|
|
|
> For example, a skip range can be added with
|
|
|
|
|
an exclusive upper bound equal to the first range lower bound.
|
2024-12-06 09:57:29 -05:00
|
|
|
This way the receiving peer knows to ignore the range from zero to the start of the sync window.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
|
|
|
|
Every timestamps after the first MUST be noted as the difference from the previous one.
|
|
|
|
|
If the timestamp is the same, zero MUST be used and the hash MUST be added.
|
2024-12-09 11:52:40 -05:00
|
|
|
The added hash MUST be truncated up to and including the first differentiating byte.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
|
|
|
|
| Timestamp | Hash | Timestamp (encoded) | Hash (encoded)
|
|
|
|
|
| - | - | - | -
|
|
|
|
|
| 1000 | 0x4a8a769a... | 1000 | -
|
|
|
|
|
| 1002 | 0x351c5e86... | 2 | -
|
|
|
|
|
| 1002 | 0x3560d9c4... | 0 | 0x3560
|
|
|
|
|
| 1003 | 0xbeabef25... | 1 | -
|
|
|
|
|
|
|
|
|
|
#### Varints
|
|
|
|
|
TODO
|
2024-06-12 11:06:17 -04:00
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
#### Payload encoding
|
|
|
|
|
The wire level payload MUST be encoded as follow.
|
|
|
|
|
> The & denote concatenation
|
2024-09-03 11:52:48 -04:00
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
1. varint bytes of the delta encoded timestamp &
|
|
|
|
|
2. if timestamp is zero, delta encoded hash bytes &
|
|
|
|
|
3. 1 byte, the range type &
|
|
|
|
|
4. either
|
|
|
|
|
- 32 bytes fingerprint &
|
|
|
|
|
- varint bytes of the item set length & bytes of every items &
|
|
|
|
|
- if skip range, nothing
|
2024-10-07 11:41:55 -04:00
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
5. repeat 1 to 4 for all ranges
|
2024-10-07 11:41:55 -04:00
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
## Transfer Protocol
|
|
|
|
|
|
|
|
|
|
**Libp2p Protocol identifier**: `/vac/waku/transfer/1.0.0`
|
2024-09-03 11:52:48 -04:00
|
|
|
|
2024-12-09 11:52:40 -05:00
|
|
|
The transfer protocol SHOULD send messages as soon as
|
|
|
|
|
a difference is found via reconciliation.
|
|
|
|
|
It MUST only accept messages from peers the node is reconciliating with.
|
|
|
|
|
New message Ids MUST be added to the reconciliation protocol.
|
|
|
|
|
The payload sent MUST follow the wire specification below.
|
2024-12-06 09:57:29 -05:00
|
|
|
|
|
|
|
|
### Wire specification
|
|
|
|
|
```protobuf
|
|
|
|
|
syntax = "proto3";
|
|
|
|
|
|
|
|
|
|
package waku.sync.v2;
|
|
|
|
|
|
|
|
|
|
import "waku/message/v1/message.proto";
|
|
|
|
|
|
|
|
|
|
message WakuMessageAndTopic {
|
|
|
|
|
// Full message content and associated pubsub_topic as value
|
|
|
|
|
optional waku.message.v1.WakuMessage message = 1;
|
|
|
|
|
optional string pubsub_topic = 2;
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
# Implementation
|
2024-12-09 11:52:40 -05:00
|
|
|
The flexibility of the protocol implies that much is left to the implementers.
|
2024-12-06 09:57:29 -05:00
|
|
|
What will follow is NOT part of the specification.
|
|
|
|
|
This section was created to inform implementations.
|
2024-09-03 11:52:48 -04:00
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
#### Parameters
|
2024-12-09 11:52:40 -05:00
|
|
|
Two useful parameters to add to your implementation are partitioning count and the item set threshold.
|
2024-12-06 09:57:29 -05:00
|
|
|
|
2024-12-09 11:52:40 -05:00
|
|
|
The partitioning count is the number of time a range is splitted.
|
|
|
|
|
Higher value reduce round trips at the cost of computing more fingerprints.
|
2024-12-06 09:57:29 -05:00
|
|
|
|
2024-12-09 11:52:40 -05:00
|
|
|
The threshold for which item sets are sent instead of fingerprints.
|
|
|
|
|
Higher value sends more items which means higher chance of duplicates but
|
|
|
|
|
reduce the amount of round trips overall.
|
2024-12-06 09:57:29 -05:00
|
|
|
|
|
|
|
|
#### Storage
|
|
|
|
|
The storage implementation should reflect the context.
|
2024-09-03 11:52:48 -04:00
|
|
|
Most messages that will be added will be recent and
|
2024-12-06 09:57:29 -05:00
|
|
|
removed messages will be older ones.
|
2024-09-03 11:52:48 -04:00
|
|
|
When differences are found some messages will have to be inserted randomly.
|
|
|
|
|
It is expected to be a less likely case than time based insertion and removal.
|
2024-12-06 09:57:29 -05:00
|
|
|
Last but not least it must be optimized for fingerprinting
|
2024-09-03 11:52:48 -04:00
|
|
|
as it is the most often used operation.
|
|
|
|
|
|
2024-12-06 09:57:29 -05:00
|
|
|
#### Sync Interval
|
2024-12-05 11:34:36 -05:00
|
|
|
Ad-hoc syncing can be useful in some cases but continuous periodic sync
|
|
|
|
|
minimize the differences in messages stored across the network.
|
|
|
|
|
Syncing early and often is the best strategy.
|
2024-12-09 11:52:40 -05:00
|
|
|
The default used in Nwaku is 5 minutes interval between sync with a range of 1 hour.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
2024-12-09 11:52:40 -05:00
|
|
|
#### Sync Window
|
|
|
|
|
By default we offset the sync window by 20 seconds in the past.
|
|
|
|
|
The actual start of the sync range is T-01:00:20 and the end T-00:00:20 in most cases.
|
|
|
|
|
This is to handle the inherent jitters of GossipSub.
|
|
|
|
|
In other words, it is the amount of time needed to confirm if a message is missing or not.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
2024-12-09 11:52:40 -05:00
|
|
|
#### Peer Choice
|
|
|
|
|
Wrong peering strategies can lead to inadvertently segregating peers and
|
|
|
|
|
reduce sampling diversity.
|
|
|
|
|
Nwaku randomly select peers to sync with for simplicity and robustness.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
2024-12-09 11:52:40 -05:00
|
|
|
Good strategies can be devised but we chose not to.
|
2024-12-05 11:34:36 -05:00
|
|
|
|
|
|
|
|
## Attack Vectors
|
|
|
|
|
Nodes using `WAKU-SYNC` are fully trusted.
|
|
|
|
|
Message hashes are assumed to be of valid messages received via Waku Relay or Light push.
|
|
|
|
|
|
|
|
|
|
Further refinements to the protocol are planned
|
|
|
|
|
to reduce the trust level required to operate.
|
|
|
|
|
Notably by verifying messages RLN proof at reception.
|
|
|
|
|
|
2024-09-06 11:18:14 -04:00
|
|
|
## Copyright
|
2024-06-12 11:06:17 -04:00
|
|
|
|
|
|
|
|
Copyright and related rights waived via
|
|
|
|
|
[CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
|
|
|
|
|
2024-09-06 11:18:14 -04:00
|
|
|
## References
|
2024-12-06 09:57:29 -05:00
|
|
|
- [RBSR](https://github.com/AljoschaMeyer/rbsr_short/blob/main/main.pdf)
|
|
|
|
|
- [Negentropy Explainer](https://logperiodic.com/rbsr.html)
|
|
|
|
|
- [Master Thesis](https://github.com/AljoschaMeyer/master_thesis/blob/main/main.pdf)
|