vac.dev/rlog/2022-07-22-relay-anonymity.mdx

381 lines
24 KiB
Plaintext

---
layout: post
name: 'Waku Privacy and Anonymity Analysis Part I: Definitions and Waku Relay'
title: 'Waku Privacy and Anonymity Analysis Part I: Definitions and Waku Relay'
date: 2022-07-22 10:00:00
authors: kaiserd
published: true
slug: wakuv2-relay-anon
categories: research
image: /img/anonymity_trilemma.svg
discuss: https://forum.vac.dev/t/discussion-waku-privacy-and-anonymity-analysis/149
_includes: [math]
toc_min_heading_level: 2
toc_max_heading_level: 5
---
Introducing a basic threat model and privacy/anonymity analysis for the Waku v2 relay protocol.
<!--truncate-->
[Waku v2](https://rfc.vac.dev/spec/10/) enables secure, privacy preserving communication using a set of modular P2P protocols.
Waku v2 also aims at protecting the user's anonymity.
This post is the first in a series about Waku v2 security, privacy, and anonymity.
The goal is to eventually have a full privacy and anonymity analysis for each of the Waku v2 protocols, as well as covering the interactions of various Waku v2 protocols.
This provides transparency with respect to Waku's current privacy and anonymity guarantees, and also identifies weak points that we have to address.
In this post, we first give an informal description of security, privacy and anonymity in the context of Waku v2.
For each definition, we summarize Waku's current guarantees regarding the respective property.
We also provide attacker models, an attack-based threat model, and a first anonymity analysis of [Waku v2 relay](https://rfc.vac.dev/spec/11/) within the respective models.
Waku comprises many protocols that can be combined in a modular way.
For our privacy and anonymity analysis, we start with the relay protocol because it is at the core of Waku v2 enabling Waku's publish subscribe approach to P2P messaging.
In its current form, Waku relay is a minor extension of [libp2p GossipSub](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md).
![Figure 1: The Waku v2 relay mesh is based on the [GossipSub mesh](https://docs.libp2p.io/concepts/publish-subscribe#types-of-peering)](/img/libp2p_gossipsub_types_of_peering.png)
## Informal Definitions: Security, Privacy, and Anonymity
The concepts of security, privacy, and anonymity are linked and have quite a bit of overlap.
### Security
Of the three, [Security](https://en.wikipedia.org/wiki/Information_security) has the clearest agreed upon definition,
at least regarding its key concepts: _confidentiality_, _integrity_, and _availability_.
- confidentiality: data is not disclosed to unauthorized entities.
- integrity: data is not modified by unauthorized entities.
- availability: data is available, i.e. accessible by authorized entities.
While these are the key concepts, the definition of information security has been extended over time including further concepts,
e.g. [authentication](https://en.wikipedia.org/wiki/Authentication) and [non-repudiation](https://en.wikipedia.org/wiki/Non-repudiation).
We might cover these in future posts.
### Privacy
Privacy allows users to choose which data and information
- they want to share
- and with whom they want to share it.
This includes data and information that is associated with and/or generated by users.
Protected data also comprises metadata that might be generated without users being aware of it.
This means, no further information about the sender or the message is leaked.
Metadata that is protected as part of the privacy-preserving property does not cover protecting the identities of sender and receiver.
Identities are protected by the [anonymity property](#anonymity).
Often privacy is realized by the confidentiality property of security.
This neither makes privacy and security the same, nor the one a sub category of the other.
While security is abstract itself (its properties can be realized in various ways), privacy lives on a more abstract level using security properties.
Privacy typically does not use integrity and availability.
An adversary who has no access to the private data, because the message has been encrypted, could still alter the message.
Waku offers confidentiality via secure channels set up with the help of the [Noise Protocol Framework](https://noiseprotocol.org/).
Using these secure channels, message content is only disclosed to the intended receivers.
They also provide good metadata protection properties.
However, we do not have a metadata protection analysis as of yet,
which is part of our privacy/anonymity roadmap.
### Anonymity
Privacy and anonymity are closely linked.
Both the identity of a user and data that allows inferring a user's identity should be part of the privacy policy.
For the purpose of analysis, we want to have a clearer separation between these concepts.
We define anonymity as _unlinkablity of users' identities and their shared data and/or actions_.
We subdivide anonymity into _receiver anonymity_ and _sender anonymity_.
#### Receiver Anonymity
We define receiver anonymity as _unlinkability of users' identities and the data they receive and/or related actions_.
The data transmitted via Waku relay must be a [Waku message](https://rfc.vac.dev/spec/14/), which contains a content topic field.
Because each message is associated with a content topic, and each receiver is interested in messages with specific content topics,
receiver anonymity in the context of Waku corresponds to _subscriber-topic unlinkability_.
An example for the "action" part of our receiver anonymity definition is subscribing to a specific topic.
The Waku message's content topic is not related to the libp2p pubsub topic.
For now, Waku uses a single libp2p pubsub topic, which means messages are propagated via a single mesh of peers.
With this, the receiver discloses its participation in Waku on the gossipsub layer.
We will leave the analysis of libp2p gossipsub to a future article within this series, and only provide a few hints and pointers here.
Waku offers k-anonymity regarding content topic interest in the global adversary model.
[K-anonymity](https://en.wikipedia.org/wiki/K-anonymity) in the context of Waku means an attacker can link receivers to content topics with a maximum certainty of $1/k$.
The larger $k$, the less certainty the attacker gains.
Receivers basically hide in a pool of $k$ content topics, any subset of which could be topics they subscribed to.
The attacker does not know which of those the receiver actually subscribed to,
and the receiver enjoys [plausible deniability](https://en.wikipedia.org/wiki/Plausible_deniability#Use_in_cryptography) regarding content topic subscription.
Assuming there are $n$ Waku content topics, a receiver has $n$-anonymity with respect to association to a specific content topic.
Technically, Waku allows distributing messages over several libp2p pubsub topics.
This yields $k$-anonymity, assuming $k$ content topics share the same pubsub topic.
However, if done wrongly, such sharding of pubsub topics can breach anonymity.
A formal specification of anonymity-preserving topic sharding building on the concepts of [partitioned topics](https://specs.status.im/spec/10#partitioned-topic) is part of our roadmap.
Also, Waku is not directly concerned with 1:1 communication, so for this post, 1:1 communication is out of scope.
Channels for 1:1 communication can be implemented on top of Waku relay.
In the future, a 1:1 communication protocol might be added to Waku.
Similar to topic sharding, it would maintain receiver anonymity leveraging [partitioned topics](https://specs.status.im/spec/10#partitioned-topic).
#### Sender Anonymity
We define sender anonymity as _unlinkability of users' identities and the data they send and/or related actions_.
Because the data in the context of Waku is Waku messages, sender anonymity corresponds to _sender-message unlinkability_.
In summary, Waku offers weak sender anonymity because of [Waku's strict no sign policy](https://rfc.vac.dev/spec/11/#signature-policy),
which has its origins in the [Ethereum consensus specs](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#why-are-we-using-the-strictnosign-signature-policy).
[17/WAKU-RLN-RELAY](https://rfc.vac.dev/spec/17/) and [18/WAKU2-SWAP](https://rfc.vac.dev/spec/18/) mitigate replay and injection attacks.
Waku currently does not offer sender anonymity in stronger attacker models, as well as cannot protect against targeted attacks in weaker attacker models like the single or multi node attacker.
We will cover this in more detail in later sections.
### Anonymity Trilemma
[The Anonymity trilemma](https://freedom.cs.purdue.edu/projects/trilemma.html) states that only two out of _strong anonymity_, _low bandwidth_, and _low latency_ can be guaranteed in the global on-net attacker model.
Waku's goal, being a modular set of protocols, is to offer any combination of two out of these three properties, as well as blends.
An example for blending is an adjustable number of pubsub topics and peers in the respective pubsub topic mesh; this allows tuning the trade-off between anonymity and bandwidth.
![Figure 2: Anonymity Trilemma: pick two. ](/img/anonymity_trilemma.svg)
A fourth factor that influences [the anonymity trilemma](https://freedom.cs.purdue.edu/projects/trilemma.html) is _frequency and patterns_ of messages.
The more messages there are, and the more randomly distributed they are, the better the anonymity protection offered by a given anonymous communication protocol.
So, incentivising users to use the protocol, for instance by lowering entry barriers, helps protecting the anonymity of all users.
The frequency/patterns factor is also related to the above described k-anonymity.
### Censorship Resistance
Another security related property that Waku aims to offer is censorship resistance.
Censorship resistance guarantees that users can participate even if an attacker tries to deny them access.
So, censorship resistance ties into the availability aspect of security.
In the context of Waku that means users should be able to send messages as well as receive all messages they are interested in,
even if an attacker tries to prevent them from disseminating messages or tries to deny them access to messages.
Currently, Waku only guarantees censorship resistance in the weak single node attacker model.
While currently employed secure channels mitigate targeted censorship, e.g. blocking specific content topics,
general censorship resistance in strong attacker models is part of our roadmap.
Among other options, we will investigate [Pluggable Transports](https://www.pluggabletransports.info/about/) in future articles.
## Attacker Types
The following lists various attacker types with varying degrees of power.
The more power an attacker has, the more difficult it is to gain the respective attacker position.
Each attacker type comes in a passive and an active variant.
While a passive attacker can stay hidden and is not suspicious,
the respective active attacker has more (or at least the same) deanonymization power.
We also distinguish between internal and external attackers.
### Internal
With respect to Waku relay, an internal attacker participates in the same pubsub topic as its victims.
Without additional measures on higher layer protocols, access to an internal position is easy to get.
#### Single Node
This attacker controls a single node.
Because this position corresponds to normal usage of Waku relay, it is trivial to obtain.
#### Multi Node
This attacker controls several nodes. We assume a smaller static number of controlled nodes.
The multi node position can be achieved relatively easily by setting up multiple nodes.
Botnets might be leveraged to increase the number of available hosts.
Multi node attackers could use [Sybil attacks](https://en.wikipedia.org/wiki/Sybil_attack) to increase the number of controlled nodes.
A countermeasure is for nodes to only accept libp2p gossipsub graft requests from peers with different IP addresses, or even different subnets.
#### Linearly Scaling Nodes
This attacker controls a number of nodes that scales linearly with the number of nodes in the network.
This attacker is especially interesting to investigate in the context of DHT security,
which Waku uses for ambient peer discovery.
### External
An external attacker can only see encrypted traffic (protected by a secure channel set up with [Noise](https://rfc.vac.dev/spec/35/)).
Because an internal position can be easily obtained,
in practice external attackers would mount combined attacks that leverage both internal an external attacks.
We cover this more below when describing attacks.
#### Local
A local attacker has access to communication links in a local network segment.
This could be a rogue access point (with routing capability).
#### AS
An AS attacker controls a single AS (autonomous system).
A passive AS attacker can listen to traffic on arbitrary links within the AS.
An active AS attacker can drop, inject, and alter traffic on arbitrary links within the AS.
In practice, a malicious ISP would be considered as an AS attacker.
A malicious ISP could also easily setup a set of nodes at specific points in the network,
gaining internal attack power similar to a strong multi node attacker.
#### Global On-Net
A global on-net attacker has complete overview over the whole network.
A passive global attacker can listen to traffic on all links,
while the active global attacker basically carries the traffic: it can freely drop, inject, and alter traffic at all positions in the network.
This basically corresponds to the [Dolev-Yao model](https://en.wikipedia.org/wiki/Dolev%E2%80%93Yao_model).
An entity with this power would, in practice, also have the power of the internal linearly scaling nodes attacker.
## Attack-based Threat Analysis
The following lists various attacks including the weakest attacker model in which the attack can be successfully performed.
The respective attack can be performed in all stronger attacker models as well.
An attack is considered more powerful if it can be successfully performed in a weaker attacker model.
If not stated otherwise, we look at these attacks with respect to their capability to deanonymize the message sender.
### Scope
In this post, we introduce a simple tightly scoped threat model for Waku v2 Relay, which will be extended in the course of this article series.
In this first post, we will look at the relay protocol in isolation.
Even though many threats arise from layers Waku relay is based on, and layers that in turn live on top of relay,
we want to first look at relay in isolation because it is at the core of Waku v2.
Addressing and trying to solve all security issues of a complex system at once is an overwhelming task, which is why we focus on the soundness of relay first.
This also goes well with the modular design philosophy of Waku v2, as layers of varying levels of security guarantees can be built on top of relay, all of which can relay on the guarantees that Waku provides.
Instead of looking at a multiplicative explosion of possible interactions, we look at the core in this article, and cover the most relevant combinations in future posts.
Further restricting the scope, we will look at the data field of a relay message as a black box.
In a second article on Waku v2 relay, we will look into the data field, which according to the [specification of Waku v2 relay](https://rfc.vac.dev/spec/11/#message-fields) must be a [Waku v2 message](https://rfc.vac.dev/spec/14/).
We only consider messages with version field `2`, which indicates that the payload has to be encoded using [35/WAKU2-NOISE](https://rfc.vac.dev/spec/35/).
### Prerequisite: Get a Specific Position in the Network
Some attacks require the attacker node(s) to be in a specific position in the network.
In most cases, this corresponds to trying to get into the mesh peer list for the desired pubsub topic of the victim node.
In libp2p gossipsub, and by extension Waku v2 relay, nodes can simply send a graft message for the desired topic to the victim node.
If the victim node still has open slots, the attacker gets the desired position.
This only requires the attacker to know the gossipsub multiaddress of the victim node.
A linearly scaling nodes attacker can leverage DHT based discovery systems to boost the probability of malicious nodes being returned, which in turn significantly increases the probability of attacker nodes ending up in the peer lists of victim nodes.
[Waku v2 discv5](https://vac.dev/wakuv2-apd) will employ countermeasures that mitigate the amplifying effect this attacker type can achieve.
### Replay Attack
In the scope we defined above, Waku v2 is resilient against replay attacks.
GossipSub nodes, and by extension Waku relay nodes, feature a `seen` cache, and only relay messages they have not seen before.
Further, replay attacks will be punished by [RLN](https://rfc.vac.dev/spec/17/) and [SWAP](https://rfc.vac.dev/spec/18/).
### Neighbourhood Surveillance
This attack can be performed by a single node attacker that is connected to all peers of the victim node $v$ with respect to a specific topic mesh.
The attacker also has to be connected to $v$.
In this position, the attacker will receive messages $m_v$ sent by $v$ both on the direct path from $v$, and on indirect paths relayed by peers of $v$.
It will also receive messages $m_x$ that are not sent by $v$. These messages $m_x$ are relayed by both $v$ and the peers of $v$.
Messages that are received (significantly) faster from $v$ than from any other of $v$'s peers are very likely messages that $v$ sent,
because for these messages the attacker is one hop closer to the source.
The attacker can (periodically) measure latency between itself and $v$, and between itself and the peers of $v$ to get more accurate estimates for the expected timings.
An AS attacker (and if the topology allows, even a local attacker) could also learn the latency between $v$ and its well-behaving peers.
An active AS attacker could also increase the latency between $v$ and its peers to make the timing differences more prominent.
This, however, might lead to $v$ switching to other peers.
This attack cannot (reliably) distinguish messages $m_v$ sent by $v$ from messages $m_y$ relayed by peers of $v$ the attacker is not connected to.
Still, there are hop-count variations that might be leveraged.
Messages $m_v$ always have a hop-count of 1 on the path from $v$ to the attacker, while all other paths are longer.
Messages $m_y$ might have the same hop-count on the path from $v$ as well as on other paths.
### Controlled Neighbourhood
If a multi node attacker manages to control all peers of the victim node, it can trivially tell which messages originated from $v$.
### Observing Messages
If Waku relay was not protected with Noise, the AS attacker could simply check for messages leaving $v$ which have not been relayed to $v$.
These are the messages sent by $v$.
Waku relay protects against this attack by employing secure channels setup using Noise.
### Correlation
Monitoring all traffic (in an AS or globally), allows the attacker to identify traffic correlated with messages originating from $v$.
This (alone) does not allow an external attacker to learn which message $v$ sent, but it allows identifying the respective traffic propagating through the network.
The more traffic in the network, the lower the success rate of this attack.
Combined with just a few nodes controlled by the attacker, the actual message associated with the correlated traffic can eventually be identified.
### DoS
An active single node attacker could run a disruption attack by
- (1) dropping messages that should be relayed
- (2) flooding neighbours with bogus messages
While (1) has a negative effect on availability, the impact is not significant.
A linearly scaling botnet attacker, however, could significantly disrupt the network with such an attack.
(2) is thwarted by [RLN](https://rfc.vac.dev/spec/17/).
Also [SWAP](https://rfc.vac.dev/spec/18/) helps mitigating DoS attacks.
A local attacker can DoS Waku by dropping all Waku traffic within its controlled network segment.
An AS attacker can DoS Waku within its authority, while a global attacker can DoS the whole network.
A countermeasure are censorship resistance techniques like [Pluggable Transports](https://www.pluggabletransports.info/about/).
## Summary and Future Work
Currently, Waku v2 relay offers k-anonymity with respect to receiver anonymity.
This also includes k-anonymity towards legitimate members of the same topic.
Waku v2 relay offers sender anonymity in the single node attacker model with its [strict no sign policy](https://rfc.vac.dev/spec/11/#signature-policy).
Currently, Waku v2 does not guarantee sender anonymity in the multi node and stronger attacker models.
However, we are working on modular anonymity-preserving protocols and building blocks as part of our privacy/anonymity roadmap.
The goal is to allow tunable anonymity with respect to trade offs between _strong anonymity_, _low bandwidth_, and _low latency_.
All of these cannot be fully guaranteed as the [the anonymity trilemma](https://freedom.cs.purdue.edu/projects/trilemma.html) states.
Some applications have specific requirements, e.g. low latency, which require a compromise on anonymity.
Anonymity-preserving mechanisms we plan to investigate and eventually specify as pluggable anonymity protocols for Waku comprise
- [Dandelion++](https://arxiv.org/abs/1805.11060) for lightweight anonymity;
- [onion routing](https://en.wikipedia.org/wiki/Onion_routing) as a building block adding a low latency anonymization layer;
- [a mix network](https://en.wikipedia.org/wiki/Mix_network) for providing strong anonymity (on top of onion routing) even in the strongest attacker model at the cost of higher latency.
These pluggable anonymity-preserving protocols will form a sub-set of the Waku v2 protocol set.
As an intermediate step, we might directly employ Tor for onion-routing, and [Nym](https://nymtech.net/) as a mix-net layer.
In future research log posts, we will cover further Waku v2 protocols and identify anonymity problems that will be added to our roadmap.
These protocols comprise
- [13/WAKU2-STORE](https://rfc.vac.dev/spec/13/), which can violate receiver anonymity as it allows filtering by content topic.
A countermeasure is using the content topic exclusively for local filters.
- [12/WAKU2-FILTER](https://rfc.vac.dev/spec/12/), which discloses nodes' interest in topics;
- [19/WAKU2-LIGHTPUSH](https://rfc.vac.dev/spec/19/), which also discloses nodes' interest in topics and links the lightpush client as the sender of a message to the lightpush service node;
- [21/WAKU2-FTSTORE](https://rfc.vac.dev/spec/21/), which discloses nodes' interest in specific time ranges allowing to infer information like online times.
While these protocols are not necessary for the operation of Waku v2, and can be seen as pluggable features,
we aim to provide alternatives without the cost of lowering the anonymity level.
## References
- [10/WAKU2](https://rfc.vac.dev/spec/10/)
- [11/WAKU2-RELAY](https://rfc.vac.dev/spec/11/)
- [libp2p GossipSub](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md)
- [Security](https://en.wikipedia.org/wiki/Information_security)
- [Authentication](https://en.wikipedia.org/wiki/Authentication)
- [Non-repudiation](https://en.wikipedia.org/wiki/Non-repudiation)
- [Noise Protocol Framework](https://noiseprotocol.org/)
- [plausible deniability](https://en.wikipedia.org/wiki/Plausible_deniability#Use_in_cryptography)
- [Waku v2 message](https://rfc.vac.dev/spec/14/)
- [partitioned topics](https://specs.status.im/spec/10#partitioned-topic)
- [Sybil attack](https://en.wikipedia.org/wiki/Sybil_attack)
- [Dolev-Yao model](https://en.wikipedia.org/wiki/Dolev%E2%80%93Yao_model)
- [35/WAKU2-NOISE](https://rfc.vac.dev/spec/35/)
- [33/WAKU2-DISCV5](https://vac.dev/wakuv2-apd)
- [strict no sign policy](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#why-are-we-using-the-strictnosign-signature-policy)
- [Waku v2 strict no sign policy](https://rfc.vac.dev/spec/11/#signature-policy)
- [17/WAKU-RLN-RELAY](https://rfc.vac.dev/spec/17/)
- [anonymity trilemma](https://freedom.cs.purdue.edu/projects/trilemma.html)
- [18/WAKU2-SWAP](https://rfc.vac.dev/spec/18/)
- [Pluggable Transports](https://www.pluggabletransports.info/about/)
- [Nym](https://nymtech.net/)
- [Dandelion++](https://arxiv.org/abs/1805.11060)
- [13/WAKU2-STORE](https://rfc.vac.dev/spec/13/)
- [12/WAKU2-FILTER](https://rfc.vac.dev/spec/12/)
- [19/WAKU2-LIGHTPUSH](https://rfc.vac.dev/spec/19/)
- [21/WAKU2-FTSTORE](https://rfc.vac.dev/spec/21/)