diff --git a/waku/rfcs/standards/core/13/README.md b/waku/rfcs/standards/core/13/README.md new file mode 100644 index 0000000..aa1cbbc --- /dev/null +++ b/waku/rfcs/standards/core/13/README.md @@ -0,0 +1,259 @@ +--- +slug: 13 +title: 13/WAKU2-STORE +name: Waku v2 Store +status: draft +tags: waku-core +editor: Sanaz Taheri +contributors: + - Dean Eigenmann + - Oskar Thorén + - Aaryamann Challani +--- + +# Abstract +This specification explains the `13/WAKU2-STORE` protocol which enables querying of messages received through the relay protocol and +stored by other nodes. +It also supports pagination for more efficient querying of historical messages. + +**Protocol identifier***: `/vac/waku/store/2.0.0-beta4` + +## Terminology +The term PII, Personally Identifiable Information, +refers to any piece of data that can be used to uniquely identify a user. +For example, the signature verification key, and +the hash of one's static IP address are unique for each user and hence count as PII. + +# Design Requirements +The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, +“RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt). + +Nodes willing to provide the storage service using `13/WAKU2-STORE` protocol, +SHOULD provide a complete and full view of message history. +As such, they are required to be *highly available* and +specifically have a *high uptime* to consistently receive and store network messages. +The high uptime requirement makes sure that no message is missed out hence a complete and +intact view of the message history is delivered to the querying nodes. +Nevertheless, in case storage provider nodes cannot afford high availability, +the querying nodes may retrieve the historical messages from multiple sources to achieve a full and intact view of the past. + +The concept of `ephemeral` messages introduced in [`14/WAKU2-MESSAGE`](/spec/14) affects `13/WAKU2-STORE` as well. +Nodes running `13/WAKU2-STORE` SHOULD support `ephemeral` messages as specified in [14/WAKU2-MESSAGE](/spec/14). +Nodes running `13/WAKU2-STORE` SHOULD NOT store messages with the `ephemeral` flag set to `true`. + +# Adversarial Model +Any peer running the `13/WAKU2-STORE` protocol, i.e. +both the querying node and the queried node, are considered as an adversary. +Furthermore, +we currently consider the adversary as a passive entity that attempts to collect information from other peers to conduct an attack but +it does so without violating protocol definitions and instructions. +As we evolve the protocol, +further adversarial models will be considered. +For example, under the passive adversarial model, +no malicious node hides or +lies about the history of messages as it is against the description of the `13/WAKU2-STORE` protocol. + +The following are not considered as part of the adversarial model: +- An adversary with a global view of all the peers and their connections. +- An adversary that can eavesdrop on communication links between arbitrary pairs of peers (unless the adversary is one end of the communication). +In specific, the communication channels are assumed to be secure. + +# Wire Specification +Peers communicate with each other using a request / response API. +The messages sent are Protobuf RPC messages which are implemented using [protocol buffers v3](https://developers.google.com/protocol-buffers/). +The following are the specifications of the Protobuf messages. + +## Payloads + +```protobuf +syntax = "proto3"; + +message Index { + bytes digest = 1; + sint64 receiverTime = 2; + sint64 senderTime = 3; + string pubsubTopic = 4; +} + +message PagingInfo { + uint64 pageSize = 1; + Index cursor = 2; + enum Direction { + BACKWARD = 0; + FORWARD = 1; + } + Direction direction = 3; +} + +message ContentFilter { + string contentTopic = 1; +} + +message HistoryQuery { + // the first field is reserved for future use + string pubsubtopic = 2; + repeated ContentFilter contentFilters = 3; + PagingInfo pagingInfo = 4; +} + +message HistoryResponse { + // the first field is reserved for future use + repeated WakuMessage messages = 2; + PagingInfo pagingInfo = 3; + enum Error { + NONE = 0; + INVALID_CURSOR = 1; + } + Error error = 4; +} + +message HistoryRPC { + string request_id = 1; + HistoryQuery query = 2; + HistoryResponse response = 3; +} +``` + +### Index + +To perform pagination, +each `WakuMessage` stored at a node running the `13/WAKU2-STORE` protocol is associated with a unique `Index` that encapsulates the following parts. +- `digest`: a sequence of bytes representing the SHA256 hash of a `WakuMessage`. + The hash is computed over the concatenation of `contentTopic` and `payload` fields of a `WakuMessage` (see [14/WAKU2-MESSAGE](/spec/14)). +- `receiverTime`: the UNIX time in nanoseconds at which the `WakuMessage` is received by the receiving node. +- `senderTime`: the UNIX time in nanoseconds at which the `WakuMessage` is generated by its sender. +- `pubsubTopic`: the pubsub topic on which the `WakuMessage` is received. + +### PagingInfo + +`PagingInfo` holds the information required for pagination. It consists of the following components. +- `pageSize`: A positive integer indicating the number of queried `WakuMessage`s in a `HistoryQuery` +(or retrieved `WakuMessage`s in a `HistoryResponse`). +- `cursor`: holds the `Index` of a `WakuMessage`. +- `direction`: indicates the direction of paging which can be either `FORWARD` or `BACKWARD`. + +### ContentFilter +`ContentFilter` carries the information required for filtering historical messages. +- `contentTopic` represents the content topic of the queried historical `WakuMessage`. + This field maps to the `contentTopic` field of the [14/WAKU2-MESSAGE](/spec/14). + +### HistoryQuery + +RPC call to query historical messages. + +- The `pubsubTopic` field MUST indicate the pubsub topic of the historical messages to be retrieved. + This field denotes the pubsub topic on which `WakuMessage`s are published. + This field maps to `topicIDs` field of `Message` in [`11/WAKU2-RELAY`](/spec/11). + Leaving this field empty means no filter on the pubsub topic of message history is requested. + This field SHOULD be left empty in order to retrieve the historical `WakuMessage` regardless of the pubsub topics on which they are published. +- The `contentFilters` field MUST indicate the list of content filters based on which the historical messages are to be retrieved. + Leaving this field empty means no filter on the content topic of message history is required. + This field SHOULD be left empty in order to retrieve historical `WakuMessage` regardless of their content topics. +- `PagingInfo` holds the information required for pagination. + Its `pageSize` field indicates the number of `WakuMessage`s to be included in the corresponding `HistoryResponse`. + It is RECOMMENDED that the queried node defines a maximum page size internally. + If the querying node leaves the `pageSize` unspecified, + or if the `pageSize` exceeds the maximum page size, + the queried node SHOULD auto-paginate the `HistoryResponse` to no more than the configured maximum page size. + This allows mitigation of long response time for `HistoryQuery`. + In the forward pagination request, + the `messages` field of the `HistoryResponse` SHALL contain, at maximum, + the `pageSize` amount of `WakuMessage` whose `Index` values are larger than the given `cursor` + (and vise versa for the backward pagination). + Note that the `cursor` of a `HistoryQuery` MAY be empty (e.g., for the initial query), as such, and + depending on whether the `direction` is `BACKWARD` or `FORWARD` the last or the first `pageSize` `WakuMessage` SHALL be returned, respectively. + +### Sorting Messages +The queried node MUST sort the `WakuMessage` based on their `Index`, +where the `senderTime` constitutes the most significant part and the `digest` comes next, and +then perform pagination on the sorted result. +As such, the retrieved page contains an ordered list of `WakuMessage` from the oldest messages to the most recent one. +Alternatively, the `receiverTime` (instead of `senderTime` ) MAY be used to sort messages during the paging process. +However, it is RECOMMENDED the use of the `senderTime` for sorting as it is invariant and +consistent across all the nodes. +This has the benefit of `cursor` reusability i.e., +a `cursor` obtained from one node can be consistently used to query from another node. +However, this `cursor` reusability does not hold when the `receiverTime` is utilized as the receiver time is affected by the network delay and +nodes' clock asynchrony. + +### HistoryResponse + +RPC call to respond to a HistoryQuery call. +- The `messages` field MUST contain the messages found, +these are [14/WAKU2-MESSAGE](/spec/14) types. +- `PagingInfo` holds the paging information based on which the querying node can resume its further history queries. + The `pageSize` indicates the number of returned Waku messages (i.e., the number of messages included in the `messages` field of `HistoryResponse`). + The `direction` is the same direction as in the corresponding `HistoryQuery`. + In the forward pagination, the `cursor` holds the `Index` of the last message in the `HistoryResponse` `messages` (and the first message in the backward paging). + Regardless of the paging direction, the retrieved `messages` are always sorted in ascending order based on their timestamp as explained in the [sorting messages](#sorting-messages) section, that is, from the oldest to the most recent. + The requester SHALL embed the returned `cursor` inside its next `HistoryQuery` to retrieve the next page of the [14/WAKU2-MESSAGE](/spec/14). + The `cursor` obtained from one node SHOULD NOT be used in a request to another node because the result may be different. +- The `error` field contains information about any error that has occurred while processing the corresponding `HistoryQuery`. + `NONE` stands for no error. + This is also the default value. + `INVALID_CURSOR` means that the `cursor` field of `HistoryQuery` does not match with the `Index` of any of the `WakuMessage` persisted by the queried node. + +# Security Consideration + +The main security consideration to take into account while using this protocol is that a querying node have to reveal their content filters of interest to the queried node, hence potentially compromising their privacy. + +# Future Work + +- **Anonymous query**: This feature guarantees that nodes can anonymously query historical messages from other nodes i.e., +without disclosing the exact topics of [14/WAKU2-MESSAGE](/spec/14) they are interested in. +As such, no adversary in the `13/WAKU2-STORE` protocol would be able to learn which peer is interested in which content filters i.e., +content topics of [14/WAKU2-MESSAGE](/spec/14). +The current version of the `13/WAKU2-STORE` protocol does not provide anonymity for historical queries, +as the querying node needs to directly connect to another node in the `13/WAKU2-STORE` protocol and +explicitly disclose the content filters of its interest to retrieve the corresponding messages. +However, one can consider preserving anonymity through one of the following ways: + - By hiding the source of the request i.e., anonymous communication. + That is the querying node shall hide all its PII in its history request e.g., its IP address. + This can happen by the utilization of a proxy server or by using Tor. + Note that the current structure of historical requests does not embody any piece of PII, otherwise, + such data fields must be treated carefully to achieve query anonymity. + + - By deploying secure 2-party computations in which the querying node obtains the historical messages of a certain topic, + the queried node learns nothing about the query. + Examples of such 2PC protocols are secure one-way Private Set Intersections (PSI). + + + +- **Robust and verifiable timestamps**: Messages timestamp is a way to show that the message existed prior to some point in time. +However, the lack of timestamp verifiability can create room for a range of attacks, +including injecting messages with invalid timestamps pointing to the far future. +To better understand the attack, +consider a store node whose current clock shows `2021-01-01 00:00:30` (and assume all the other nodes have a synchronized clocks +-20seconds). +The store node already has a list of messages, + `(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20)`, +that are sorted based on their timestamp. +An attacker sends a message with an arbitrary large timestamp e.g., +10 hours ahead of the correct clock `(m',2021-01-01 10:00:30)`. +The store node places `m'` at the end of the list, +`(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20), (m',2021-01-01 10:00:30)`. +Now another message arrives with a valid timestamp e.g., +`(m11, 2021-01-01 00:00:45)`. +However, since its timestamp precedes the malicious message `m'`, +it gets placed before `m'` in the list i.e., +`(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20), (m11, 2021-01-01 00:00:45), (m',2021-01-01 10:00:30)`. +In fact, for the next 10 hours, +`m'` will always be considered as the most recent message and +served as the last message to the querying nodes irrespective of how many other messages arrive afterward. + +A robust and verifiable timestamp allows the receiver of a message to verify that a message has been generated prior to the claimed timestamp. +One solution is the use of [open timestamps](https://opentimestamps.org/) e.g., +block height in Blockchain-based timestamps. +That is, messages contain the most recent block height perceived by their senders at the time of message generation. +This proves accuracy within a range of minutes (e.g., in Bitcoin blockchain) or +seconds (e.g., in Ethereum 2.0) from the time of origination. + +# Copyright + +Copyright and related rights waived via +[CC0](https://creativecommons.org/publicdomain/zero/1.0/). + +# References +1. [14/WAKU2-MESSAGE](/spec/14) +2. [protocol buffers v3](https://developers.google.com/protocol-buffers/) +3. [11/WAKU2-RELAY](/spec/11) +4. [Open timestamps](https://opentimestamps.org/)