mirror of
https://github.com/logos-messaging/specs.git
synced 2026-01-02 14:13:06 +00:00
waku sync 2.0 initial draft
This commit is contained in:
parent
b38f248b4f
commit
cba8da96bf
@ -11,78 +11,131 @@ contributors:
|
||||
This specification explains the `WAKU-SYNC` protocol
|
||||
which enables the reconciliation of two sets of message hashes
|
||||
in the context of keeping multiple Store nodes synchronized.
|
||||
Waku Sync is a wrapper around
|
||||
[Negentropy](https://github.com/hoytech/negentropy) a [range-based set reconciliation protocol](https://logperiodic.com/rbsr.html).
|
||||
|
||||
## Specification
|
||||
|
||||
**Protocol identifier**: `/vac/waku/sync/1.0.0`
|
||||
**Protocol identifier**: `/vac/waku/reconciliation/1.0.0`
|
||||
|
||||
### Terminology
|
||||
#### Terminology
|
||||
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”,
|
||||
“RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||
|
||||
The term Negentropy refers to the protocol of the same name.
|
||||
Negentropy payload refers to
|
||||
the messages created by the Negentropy protocol.
|
||||
Client always refers to the initiator
|
||||
and the server the receiver of the first payload.
|
||||
#### Message Ids
|
||||
Message Ids MUST be composed of the timestamp and the hash of the Waku messages.
|
||||
|
||||
### Design Requirements
|
||||
Nodes enabling Waku Sync SHOULD
|
||||
manage and keep message hashes in a local cache
|
||||
for the range of time
|
||||
during which synchronization is required.
|
||||
Nodes SHOULD use the same time range,
|
||||
for Waku we chose one hour as the global default.
|
||||
Waku Relay or Light Push protocol MAY be enabled
|
||||
and used in conjunction with Sync
|
||||
as a source of new message hashes
|
||||
for the cache.
|
||||
The timestamp MUST be the time of creation and
|
||||
the hash MUST follow the
|
||||
[deterministic message hashing specification](https://rfc.vac.dev/waku/standards/core/14/message#deterministic-message-hashing)
|
||||
|
||||
Nodes MAY use the Store protocol
|
||||
to request missing messages once reconciliation is complete
|
||||
or to provide messages to requesting clients.
|
||||
> This way the message Ids can always be totally ordered.
|
||||
Chronologically according to the timestamp and
|
||||
disambiguate based on the hash lexical order
|
||||
in cases where the timestamp is the same.
|
||||
|
||||
### Payload
|
||||
#### Range Bounds
|
||||
A range MUST consists of 2 Id bounds, the first bound is
|
||||
inclusive the second bound exclusive.
|
||||
The first bound MUST be strictly smaller than the second one.
|
||||
|
||||
```protobuf
|
||||
syntax = "proto3";
|
||||
#### Range Fingerprinting
|
||||
The fingerprint of a range MUST be the XOR operation applied to
|
||||
all the hashes of the Ids included in that range.
|
||||
|
||||
package waku.sync.v1;
|
||||
#### Range Type
|
||||
Every range MUST have one of the following types; skip, fingerprint or item set.
|
||||
|
||||
message SyncPayload {
|
||||
optional bytes negentropy = 1;
|
||||
- Skip type is used to signal already processed ranges that MUST be ignored.
|
||||
- Fingerprint type signify that this range fingerprint MUST be compared when received.
|
||||
- Item set type contain multiple message Ids that MUST all be compared when received.
|
||||
> Item sets are an optimization, stopping the recursion early can
|
||||
save many network roundtrips.
|
||||
|
||||
repeated bytes hashes = 20;
|
||||
}
|
||||
```
|
||||
#### Range Processing
|
||||
Ranges have to be processed differently acording to their types.
|
||||
|
||||
### Session Flow
|
||||
A client initiates a session with a server
|
||||
by sending a `SyncPayload` with
|
||||
only the `negentropy` field set.
|
||||
This field MUST contain
|
||||
the first negentropy payload
|
||||
created by the client
|
||||
for this session.
|
||||
- Skip ranges MUST be merged with other consequtive ones if possible.
|
||||
- Equal fingerprint ranges MUST become skip ranges.
|
||||
- Unequal fingerprint ranges MUST be splitted into smaller ranges. The new type MAY be either fingerprint or item set.
|
||||
- Unresolved item set ranges MUST be checked for differences and marked resolved.
|
||||
- Resolved item set ranges MUST be checked for differences and become skip ranges.
|
||||
|
||||
The server receives a `SyncPayload`.
|
||||
A new negentropy payload is computed from the received one.
|
||||
The server sends back a `SyncPayload` to the client.
|
||||
### Delta Encoding
|
||||
For efficient transmition of timestamps, hashes and ranges. Payloads are delta encoded as follow.
|
||||
|
||||
The client receives a `SyncPayload`.
|
||||
A new negentropy payload OR an empty one is computed.
|
||||
If a new payload is computed then
|
||||
the exchanges between client and server continues until
|
||||
the client computes an empty payload.
|
||||
This client computation also outputs any hash differences found,
|
||||
those MUST be stored.
|
||||
In the case of an empty payload,
|
||||
the reconciliation is done,
|
||||
the client MUST send back a `SyncPayload`
|
||||
with all the missing server hashes in the `hashes` field and
|
||||
an empty `nengentropy` field.
|
||||
All ranges to be transmitted MUST be ordered and only upper bounds used.
|
||||
> Inclusive lower bounds can be omitted because they are always
|
||||
the same as the exclusive upper bounds of the previous range or zero.
|
||||
|
||||
To achieve this, it MAY be needed to add skip ranges.
|
||||
> For example, a skip range can be added with
|
||||
an exclusive upper bound equal to the first range lower bound.
|
||||
This way the receiving peer knows to ignore the range from zero to the start of the ranges
|
||||
|
||||
Every timestamps after the first MUST be noted as the difference from the previous one.
|
||||
If the timestamp is the same, zero MUST be used and the hash MUST be added.
|
||||
The added hash MUST be trucated up to and including the first differetiating byte.
|
||||
|
||||
| Timestamp | Hash | Timestamp (encoded) | Hash (encoded)
|
||||
| - | - | - | -
|
||||
| 1000 | 0x4a8a769a... | 1000 | -
|
||||
| 1002 | 0x351c5e86... | 2 | -
|
||||
| 1002 | 0x3560d9c4... | 0 | 0x3560
|
||||
| 1003 | 0xbeabef25... | 1 | -
|
||||
|
||||
#### Varints
|
||||
TODO
|
||||
|
||||
## Implementation
|
||||
|
||||
#### Parameters
|
||||
#TODO fix copy pasta from research issue
|
||||
|
||||
T -> Item set threshold. If a range length is <= than T, all items are sent. Higher T sends more items which means higher chance of duplicates but reduce the amount of round trips overall.
|
||||
|
||||
B -> Partitioning count. When recursively splitting a range, it is split into B sub ranges. Higher B reduce round trips at the cost of computing more fingerprints.
|
||||
|
||||
#### Storage
|
||||
TODO
|
||||
|
||||
A local cache of message Ids MUST be maintained when the node is online.
|
||||
This storage MUST keep Ids ordered at all times.
|
||||
|
||||
This storage is critical for the various function of the protocol and should as efficient as possible.
|
||||
How this storage is implemented however, is outside the scope of this specification.
|
||||
|
||||
TODO mention trees vs arrays???
|
||||
|
||||
The storage implementation should reflect the Waku context.
|
||||
Most messages that will be added will be recent and
|
||||
all removed messages will be older ones.
|
||||
When differences are found some messages will have to be inserted randomly.
|
||||
It is expected to be a less likely case than time based insertion and removal.
|
||||
Last but not least it must be optimized for sequential read
|
||||
as it is the most often used operation.
|
||||
|
||||
#### Range
|
||||
TODO
|
||||
|
||||
We also offset the sync range by 20 seconds in the past.
|
||||
The actual start of the sync range is T-01:00:20 and the end T-00:00:20
|
||||
This is to handle the inherent jitters of GossipSub.
|
||||
In other words, it is the amount of time needed to confirm if a message is missing or not.
|
||||
|
||||
#### Interval
|
||||
TODO
|
||||
|
||||
Ad-hoc syncing can be useful in some cases but continuous periodic sync
|
||||
minimize the differences in messages stored across the network.
|
||||
Syncing early and often is the best strategy.
|
||||
The default used in nwaku is 5 minutes interval between sync with a range of 1 hour.
|
||||
|
||||
#### Peer Choice
|
||||
TODO
|
||||
|
||||
Peering strategies can lead to inadvertently segregating peers and reduce sampling diversity.
|
||||
We randomly select peers to sync with for simplicity and robustness.
|
||||
|
||||
A good strategy can be devised but we chose not to.
|
||||
|
||||
## Attack Vectors
|
||||
Nodes using `WAKU-SYNC` are fully trusted.
|
||||
@ -92,36 +145,6 @@ Further refinements to the protocol are planned
|
||||
to reduce the trust level required to operate.
|
||||
Notably by verifying messages RLN proof at reception.
|
||||
|
||||
## Implementation
|
||||
The following is not part of the specifications but good to know implementation details.
|
||||
|
||||
### Peer Choice
|
||||
Peering strategies can lead to inadvertently segregating peers and reduce sampling diversity.
|
||||
We randomly select peers to sync with for simplicity and robustness.
|
||||
|
||||
A good strategy can be devised but we chose not to.
|
||||
|
||||
### Interval
|
||||
Ad-hoc syncing can be useful in some cases but continuous periodic sync
|
||||
minimize the differences in messages stored across the network.
|
||||
Syncing early and often is the best strategy.
|
||||
The default used in nwaku is 5 minutes interval between sync with a range of 1 hour.
|
||||
|
||||
### Range
|
||||
We also offset the sync range by 20 seconds in the past.
|
||||
The actual start of the sync range is T-01:00:20 and the end T-00:00:20
|
||||
This is to handle the inherent jitters of GossipSub.
|
||||
In other words, it is the amount of time needed to confirm if a message is missing or not.
|
||||
|
||||
### Storage
|
||||
The storage implementation should reflect the Waku context.
|
||||
Most messages that will be added will be recent and
|
||||
all removed messages will be older ones.
|
||||
When differences are found some messages will have to be inserted randomly.
|
||||
It is expected to be a less likely case than time based insertion and removal.
|
||||
Last but not least it must be optimized for sequential read
|
||||
as it is the most often used operation.
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user