Add section research log

This commit is contained in:
Maria Rushkova 2021-08-02 18:50:16 +02:00
parent 1e922be110
commit 07c558f2ca
18 changed files with 2467 additions and 1 deletions

View File

@ -89,7 +89,7 @@
<li class="hover:opacity-50">
<a
class="nav__link-internal"
href="{{site.url}}{{ site.baseurl }}/research-log/"
href="{{site.url}}{{ site.baseurl }}/#research-log"
>Research log</a
>
</li>

View File

@ -0,0 +1,165 @@
---
layout: post
name: "P2P Data Sync for Mobile"
title: "P2P Data Sync for Mobile"
date: 2019-07-19 12:00:00 +0800
author: oskarth
published: true
permalink: /p2p-data-sync-for-mobile
categories: research
summary: A research log. Reliable and decentralized, pick two.
image: /assets/img/mvds_interactive.png
---
Together with decanus, I've been working on the problem of data sync lately.
In building p2p messaging systems, one problem you quickly come across is the problem of reliably transmitting data. If there's no central server with high availability guarantees, you can't meaningfully guarantee that data has been transmitted. One way of solving this problem is through a synchronization protocol.
There are many synchronization protocols out there and I won't go into detail of how they differ with our approach here. Some common examples are Git and Bittorrent, but there are also projects like IPFS, Swarm, Dispersy, Matrix, Briar, SSB, etc.
## Problem motivation
Why do we want to do p2p sync for mobilephones in the first place? There are three components to that question. One is on the value of decentralization and peer-to-peer, the second is on why we'd want to reliably sync data at all, and finally why mobilephones and other resource restricted devices.
### Why p2p?
For decentralization and p2p, there are both technical and social/philosophical reasons. Technically, having a user-run network means it can scale with the number of users. Data locality is also improved if you query data that's close to you, similar to distributed CDNs. The throughput is also improved if there are more places to get data from.
Socially and philosophically, there are several ways to think about it. Open and decentralized networks also relate to the idea of open standards, i.e. compare the longevity of AOL with IRC or Bittorrent. One is run by a company and is shut down as soon as it stops being profitable, the others live on. Additionally increasingly control of data and infrastructure is becoming a liability. By having a network with no one in control, everyone is. It's ultimately a form of democratization, more similar to organic social structures pre Big Internet companies. This leads to properties such as censorship resistance and coercion resistance, where we limit the impact a 3rd party might have a voluntary interaction between individuals or a group of people. Examples of this are plentiful in the world of Facebook, Youtube, Twitter and WeChat.
### Why reliably sync data?
At risk of stating the obvious, reliably syncing data is a requirement for many problem domains. You don't get this by default in a p2p world, as it is unreliable with nodes permissionslessly join and leave the network. In some cases you can get away with only ephemeral data, but usually you want some kind of guarantees. This is a must for reliable group chat experience, for example, where messages are expected to arrive in a timely fashion and in some reasonable order. The same is true for messages there represent financial transactions, and so on.
### Why mobilephones?
Most devices people use daily are mobile phones. It's important to provide the same or at least similar guarantees to more traditional p2p nodes that might run on a desktop computer or computer. The alternative is to rely on gateways, which shares many of the drawbacks of centralized control and prone to censorship, control and surveillence.
More generally, resource restricted devices can differ in their capabilities. One example is smartphones, but others are: desktop, routers, Raspberry PIs, POS systems, and so on. The number and diversity of devices are exploding, and it's useful to be able to leverage this for various types of infrastructure. The alternative is to centralize on big cloud providers, which also lends itself to lack of democratization and censorship, etc.
## Minimal Requirements
For requirements or design goals for a solution, here's what we came up with.
1. MUST sync data reliably between devices. By reliably we mean having the ability to deal with messages being out of order, dropped, duplicated, or delayed.
2. MUST NOT rely on any centralized services for reliability. By centralized services we mean any single point of failure that isnt one of the endpoint devices.
3. MUST allow for mobile-friendly usage. By mobile-friendly we mean devices that are resource restricted, mostly-offline and often changing network.
4. MAY use helper services in order to be more mobile-friendly. Examples of helper services are decentralized file storage solutions such as IPFS and Swarm. These help with availability and latency of data for mostly-offline devices.
5. MUST have the ability to provide casual consistency. By casual consistency we mean the commonly accepted definition in distributed systems literature. This means messages that are casually related can achieve a partial ordering.
6. MUST support ephemeral messages that dont need replication. That is, allow for messages that dont need to be reliabily transmitted but still needs to be transmitted between devices.
7. MUST allow for privacy-preserving messages and extreme data loss. By privacy-preserving we mean things such as exploding messages (self-destructing messages). By extreme data loss we mean the ability for two trusted devices to recover from a, deliberate or accidental, removal of data.
8. MUST be agnostic to whatever transport it is running on. It should not rely on specific semantics of the transport it is running on, nor be tightly coupled with it. This means a transport can be swapped out without loss of reliability between devices.
## MVDS - a minimium viable version
The first minimum viable version is in an alpha stage, and it has a [specification](https://github.com/vacp2p/specs/blob/master/mvds.md), [implementation](https://github.com/vacp2p/mvds) and we have deployed it in a [console client](https://github.com/status-im/status-console-client) for end to end functionality. It's heavily inspired by [Bramble Sync Protocol](https://code.briarproject.org/briar/briar-spec/blob/master/protocols/BSP.md).
The spec is fairly minimal. You have nodes that exchange records over some secure transport. These records are of different types, such as `OFFER`, `MESSAGE`, `REQUEST`, and `ACK`. A peer keep tracks of the state of message for each node it is interacting with. There's also logic for message retransmission with exponential delay. The positive ACK and retransmission model is quite similar to how TCP is designed.
There are two different modes of syncing, interactive and batch mode. See sequence diagrams below.
Interactive mode:
<img src="{{site.baseurl}}/assets/img/mvds_interactive.png">
Batch mode:
<img src="{{site.baseurl}}/assets/img/mvds_batch.png">
Which mode should you choose? It's a tradeoff of latency and bandwidth. If you want to minimize latency, batch mode is better. If you care about preserving bandwidth interactive mode is better. The choice is up to each node.
### Basic simulation
Initial ad hoc bandwidth and latency testing shows some issues with a naive approach. Running with the [default simulation settings](https://github.com/vacp2p/mvds/):
- communicating nodes: 2
- nodes using interactive mode: 2
- interval between messages: 5s
- time node is offine: 90%
- nodes each node is sharing with: 2
we notice a [huge overhead](https://notes.status.im/7QYa4b6bTH2wMk3HfAaU0w#). More specifically, we see a ~5 minute latency overhead and a bandwidth multiplier of x100-1000, i.e. 2-3 orders of magnitude just for receiving a message with interactive mode, without acks.
Now, that seems terrible. A moment of reflection will reveal why that is. If each node is offline uniformly 90% of the time, that means that each record will be lost 90% of the time. Since interactive mode requires offer, request, payload (and then ack), that's three links just for Bob to receive the actual message.
Each failed attempt implies another retransmission. That means we have `(1/0.1)^3 = 1000` expected overhead to receive a message in interactive mode. The latency follows naturally from that, with the retransmission logic.
### Mostly-offline devices
The problem above hints at the requirements 3 and 4 above. While we did get reliable syncing (requirement 1), it came at a big cost.
There are a few ways of getting around this issue. One is having a *store and forward* model, where some intermediary node picks up (encrypted) messages and forwards them to the recipient. This is what we have in production right now at Status.
Another, arguably more pure and robust, way is having a *remote log*, where the actual data is spread over some decentralized storage layer, and you have a mutable reference to find the latest messages, similar to DNS.
What they both have in common is that they act as a sort of highly-available cache to smooth over the non-overlapping connection windows between two endpoints. Neither of them are *required* to get reliable data transmission.
### Basic calculations for bandwidth multiplier
While we do want better simulations, and this is a work in progress, we can also look at the above scenarios using some basic calculations. This allows us to build a better intuition and reason about the problem without having to write code. Let's start with some assumptions:
- two nodes exchanging a single message in batch mode
- 10% uniformly random uptime for each node
- in HA cache case, 100% uptime of a piece of infrastructure C
- retransmission every epoch (with constant or exponential backoff)
- only looking at average (p50) case
#### First case, no helper services
A sends a message to B, and B acks it.
```
A message -> B (10% chance of arrival)
A <- ack B (10% chance of arrival)
```
With a constant backoff, A will send messages at epoch `1, 2, 3, ...`. With exponential backoff and a multiplier of 2, this would be `1, 2, 4, 8, ...`. Let's assume constant backoff for now, as this is what will influence the success rate and thus the bandwidth multiplier.
There's a difference between *time to receive* and *time to stop sending*. Assuming each send attempt is independent, it takes on average 10 epochs for A's message to arrive with B. Furthermore:
1. A will send messages until it receives an ACK.
2. B will send ACK if it receives a message.
To get an average of one ack through, A needs to send 100 messages, and B send on average 10 acks. That's a multiplier of roughly a 100. That's roughly what we saw with the simulation above for receiving a message in interactive mode.
#### Second case, high-availability caching layer
Let's introduce a helper node or piece of infrastructure, C. Whenever A or B sends a message, it also sends it to C. Whenever A or B comes online, it queries for messages with C.
```
A message -> B (10% chance of arrival)
A message -> C (100% chance of arrival)
B <- req/res -> C (100% chance of arrival)
A <- ack B (10% chance of arrival)
C <- ack B (100% chance of arrival)
A <- req/res -> C (100% chance of arrival)
```
What's the probability that A's messages will arrive at B? Directly, it's still 10%. But we can assume it's 100% that C picks up the message. (Giving C a 90% chance success rate doesn't materially change the numbers).
B will pick up A's message from C after an average of 10 epochs. Then B will send ack to A, which will also be picked up by C 100% of the time. Once A comes online again, it'll query C and receive B's ack.
Assuming we use exponential backoff with a multiplier of 2, A will send a message directly to B at epoch `1, 2, 4, 8` (assuming it is online). At this point, epoch `10`, B will be online in the average case. These direct sends will likely fail, but B will pick the message up from C and send one ack, both directly to A and to be picked up by C. Once A comes online, it'll query C and receive the ack from B, which means it won't do any more retransmits.
How many messages have been sent? Not counting interactions with C, A sends 4 (at most) and B 1. Depending on if the interaction with C is direct or indirect (i.e. multicast), the factor for interaction with C will be ~2. This means the total bandwidth multiplier is likely to be `<10`, which is a lot more acceptable.
Since the syncing semantics are end-to-end, this is without relying on the reliablity of C.
#### Caveat
Note that both of these are probabilistic argument. They are also based on heuristics. More formal analysis would be desirable, as well as better simulations to experimentally verify them. In fact, the calculations could very well be wrong!
## Future work
There are many enhancements that can be made and are desirable. Let's outline a few.
1. Data sync clients. Examples of actual usage of data sync, with more interesting domain semantics. This also includes usage of sequence numbers and DAGs to know what content is missing and ought to be synced.
2. Remote log. As alluded to above, this is necessary. It needs a more clear specification and solid proof of concepts.
3. More efficient ways of syncing with large number of nodes. When the number of nodes goes up, the algorithmic complexity doesn't look great. This also touches on things such as ambient content discovery.
4. More robust simulations and real-world deployments. Exisiting simulation is ad hoc, and there are many improvements that can be made to gain more confidence and identify issues. Additionally, better formal analysis.
5. Example usage over multiple transports. Including things like sneakernet and meshnets. The described protocol is designed to work over unstructured, structured and private p2p networks. In some cases it can leverage differences in topology, such as multicast, or direct connections.

View File

@ -0,0 +1,89 @@
---
layout: post
name: "Vac - A Rough Overview"
title: "Vac - A Rough Overview"
date: 2019-08-02 12:00:00 +0800
author: oskarth
published: true
permalink: /vac-overview
categories: research
summary: Vac is a modular peer-to-peer messaging stack, with a focus on secure messaging. Overview of terms, stack and open problems.
---
Vac is a **modular peer-to-peer messaging stack, with a focus on secure messaging**. What does that mean? Let's unpack it a bit.
## Basic terms
*messaging stack*. While the initial focus is on [data sync](https://vac.dev/p2p-data-sync-for-mobile), we are concerned with all layers in the stack. That means all the way from underlying transports, p2p overlays and routing, to initial trust establishment and semantics for things like group chat. The ultimate goal is to give application developers the tools they need to provide secure messaging for their users, so they can focus on their domain expertise.
*modular*. Unlike many other secure messaging applications, our goal is not to have a tightly coupled set of protocols, nor is it to reinvent the wheel. Instead, we aim to provide options at each layer in the stack, and build on the shoulders of giants, putting a premimum on interoperability. It's similar in philosophy to projects such as [libp2p](https://libp2p.io/) or [Substrate](https://www.parity.io/substrate/) in that regard. Each choice comes with different trade-offs, and these look different for different applications.
*peer-to-peer*. The protocols we work on are pure p2p, and aim to minimize centralization. This too is in opposition to many initiatives in the secure messaging space.
*messaging*. By messaging we mean messaging in a generalized sense. This includes both human to human communication, as well machine to machine communication. By messaging we also mean something more fundamental than text messages, we also include things like transactions (state channels, etc) under this moniker.
*secure messaging*. Outside of traditional notions of secure messaging, such as ensuring end to end encryption, forward secrecy, avoiding MITM-attacks, etc, we are also concerned with two other forms of secure messaging. We call these *private messaging* and *censorship-resistance*. Private messaging means viewing privacy as a security property, with all that entails. Censorship resistance ties into being p2p, but also in terms of allowing for transports and overlays that can't easily be censored by port blocking, traffic analysis, and similar.
*Vāc*. Is a Vedic goddess of speech. It also hints at being a vaccine.
## Protocol stack
What does this stack look like? We take inspiration from [core](https://tools.ietf.org/html/rfc793) [internet architecture](https://www.ietf.org/rfc/rfc1122.txt), existing [survey work](http://cacr.uwaterloo.ca/techreports/2015/cacr2015-02.pdf) and other [efforts](https://code.briarproject.org/briar/briar/wikis/A-Quick-Overview-of-the-Protocol-Stack) that have been done to decompose the problem into orthogonal pieces. Each layer provides their own set of properties and only interact with the layers it is adjacent to. Note that this is a rough sketch.
| Layer / Protocol | Purpose | Examples |
|-------------------|-----------------------------------|----------------------|
| Application layer | End user semantics | 1:1 chat, group chat |
| Data Sync | Data consistency | MVDS, BSP |
| Secure Transport | Confidentiality, PFS, etc | Double Ratchet, MLS |
| Transport Privacy | Transport and metadata protection | Whisper, Tor, Mixnet |
| P2P Overlay | Overlay routing, NAT traversal | devp2p, libp2p |
| | |
| Trust Establishment | Establishing end-to-end trust | TOFU, web of trust |
As an example, end user semantics such as group chat or moderation capabilities can largely work regardless of specific choices further down the stack. Similarly, using a mesh network or Tor doesn't impact the use of Double Ratchet at the Secure Transport layer.
Data Sync plays a similar role to what TCP does at the transport layer in a traditional Internet architecture, and for some applications something more like UDP is likely to be desirable.
In terms of specific properties and trade-offs at each layer, we'll go deeper down into them as we study them. For now, this is best treated as a rough sketch or mental map.
## Problems and rough priorities
With all the pieces involved, this is quite an undertaking. Luckily, a lot of pieces are already in place and can be either incorporated as-is or iterated on. In terms of medium and long term, here's a rough sketch of priorities and open problems.
1. **Better data sync.** While the current [MVDS](https://specs.vac.dev/specs/mvds.html) works, it is lacking in a few areas:
- Lack of remote log for mostly-offline offline devices
- Better scalability for multi-user chat contexts
- Better usability in terms of application layer usage and supporting more transports
2. **Better transport layer support.** Currently MVDS runs primarily over Whisper, which has a few issues:
- scalability, being able to run with many nodes
- spam-resistance, proof of work is a poor mechanism for heterogeneous devices
- no incentivized infrastructure, leading to centralized choke points
In addition to these most immediate concerns, there are other open problems. Some of these are overlapping with the above.
3. **Adaptive nodes.** Better support for resource restricted devices and nodes of varying capabilities. Light connection strategy for resources and guarantees. Security games to outsource processing with guarantees.
4. **Incentivized and spam-resistant messaging.** Reasons to run infrastructure and not relying on altruistic nodes. For spam resistance, in p2p multicast spam is a big attack vector due to amplification. There are a few interesting directions here, such as EigenTrust, proof of burn with micropayments, and leveraging zero-knowledge proofs.
5. **Strong privacy guarantees at transport privacy layer**. More rigorous privacy guarantees and explicit trade-offs for metadata protection. Includes Mixnet.
6. **Censorship-resistant and robust P2P overlay**. NAT traversal; running in the browser; mesh networks; pluggable transports for traffic obfuscation.
7. **Scalable and decentralized secure conversational security.** Strong security guarantees such as forward secrecy, post compromise security, for large group chats. Includes projects such MLS and extending Double Ratchet.
8. **Better trust establishment and key handling**. Avoiding MITM attacks while still enabling a good user experience. Protecting against ghost users in group chat and providing better ways to do key handling.
There is also a set of more general problems, that touch multiple layers:
9. **Ensuring modularity and interoperability**. Providing interfaces that allow for existing and new protocols to be at each layer of the stack.
10. **Better specifications**. Machine-readable and formally verified specifications. More rigorous analysis of exact guarantees and behaviors. Exposing work in such a way that it can be analyzed by academics.
11. **Better simulations**. Providing infrastructure and tooling to be able to test protocols in adverse environments and at scale.
12. **Enabling excellent user experience**. A big reason for the lack of widespread adoption of secure messaging is the fact that more centralized, insecure methods provide a better user experience. Given that incentives can align better for users interested in secure messaging, providing an even better user experience should be doable.
---
We got some work to do. Come help us if you want. See you in the next update!

View File

@ -0,0 +1,115 @@
---
layout: post
name: "P2P Data Sync with a Remote Log"
title: "P2P Data Sync with a Remote Log"
date: 2019-10-04 12:00:00 +0800
author: oskarth
published: true
permalink: /remote-log
categories: research
summary: A research log. Asynchronous P2P messaging? Remote logs to the rescue!
image: /assets/img/remote-log.png
---
A big problem when doing end-to-end data sync between mobile nodes is that most devices are offline most of the time. With a naive approach, you quickly run into issues of 'ping-pong' behavior, where messages have to be constantly retransmitted. We saw some basic calculations of what this bandwidth multiplier looks like in a [previous post](https://vac.dev/p2p-data-sync-for-mobile).
While you could do some background processing, this is really battery-draining, and on iOS these capabilities are limited. A better approach instead is to loosen the constraint that two nodes need to be online at the same time. How do we do this? There are two main approaches, one is the *store and forward model*, and the other is a *remote log*.
In the *store and forward* model, we use an intermediate node that forward messages on behalf of the recipient. In the *remote log* model, you instead replicate the data onto some decentralized storage, and have a mutable reference to the latest state, similar to DNS. While both work, the latter is somewhat more elegant and "pure", as it has less strict requirements of an individual node's uptime. Both act as a highly-available cache to smoothen over non-overlapping connection windows between endpoints.
In this post we are going to describe how such a remote log schema could work. Specifically, how it enhances p2p data sync and takes care of the [following requirements](https://vac.dev/p2p-data-sync-for-mobile):
> 3. MUST allow for mobile-friendly usage. By mobile-friendly we mean devices
> that are resource restricted, mostly-offline and often changing network.
> 4. MAY use helper services in order to be more mobile-friendly. Examples of
> helper services are decentralized file storage solutions such as IPFS and
> Swarm. These help with availability and latency of data for mostly-offline
> devices.
## Remote log
A remote log is a replication of a local log. This means a node can read data from a node that is offline.
The spec is in an early draft stage and can be found [here](https://github.com/vacp2p/specs/pull/16). A very basic [spike](https://en.wikipedia.org/wiki/Spike_(software_development)) / proof-of-concept can be found [here](https://github.com/vacp2p/research/tree/master/remote_log).
### Definitions
| Term | Definition |
| ----------- | -------------------------------------------------------------------------------------- |
| CAS | Content-addressed storage. Stores data that can be addressed by its hash. |
| NS | Name system. Associates mutable data to a name. |
| Remote log | Replication of a local log at a different location. |
### Roles
There are four fundamental roles:
1. Alice
2. Bob
2. Name system (NS)
3. Content-addressed storage (CAS)
The *remote log* is the data format of what is stored in the name system.
"Bob" can represent anything from 0 to N participants. Unlike Alice, Bob only needs read-only access to NS and CAS.
### Flow
<!-- diagram -->
<p align="center">
<img src="{{site.baseurl}}/assets/img/remote-log.png">
<br />
Figure 1: Remote log data synchronization.
</p>
### Data format
The remote log lets receiving nodes know what data they are missing. Depending on the specific requirements and capabilities of the nodes and name system, the information can be referred to differently. We distinguish between three rough modes:
1. Fully replicated log
2. Normal sized page with CAS mapping
3. "Linked list" mode - minimally sized page with CAS mapping
A remote log is simply a mapping from message identifiers to their corresponding address in a CAS:
| Message Identifier (H1) | CAS Hash (H2) |
| ---------------- |---------------|
| H1_3 | H2_3 |
| H1_2 | H2_2 |
| H1_1 | H2_1 |
| | |
| *address to next page* |
The numbers here corresponds to messages. Optionally, the content itself can be included, just like it normally would be sent over the wire. This bypasses the need for a dedicated CAS and additional round-trips, with a trade-off in bandwidth usage.
| Message Identifier (H1) | Content |
| ---------------- |---------------|
| H1_3 | C3 |
| H1_2 | C2 |
| H1_1 | C1 |
| | |
| *address to next page* |
Both patterns can be used in parallel, e,g. by storing the last `k` messages directly and use CAS pointers for the rest. Together with the `next_page` page semantics, this gives users flexibility in terms of bandwidth and latency/indirection, all the way from a simple linked list to a fully replicated log. The latter is useful for things like backups on durable storage.
### Interaction with MVDS
[vac.mvds.Message](https://specs.vac.dev/specs/mvds.html#payloads) payloads are the only payloads that MUST be uploaded. Other messages types MAY be uploaded, depending on the implementation.
## Future work
The spec is still in an early draft stage, so it is expected to change. Same with the proof of concept. More work is needed on getting a fully featured proof of concept with specific CAS and NAS instances. E.g. Swarm and Swarm Feeds, or IPFS and IPNS, or something else.
For data sync in general:
- Make consistency guarantees more explicit for app developers with support for sequence numbers and DAGs, as well as the ability to send non-synced messages. E.g. ephemeral typing notifications, linear/sequential history and casual consistency/DAG history
- Better semantics and scalability for multi-user sync contexts, e.g. CRDTs and joining multiple logs together
- Better usability in terms of application layer usage (data sync clients) and supporting more transports
---
PS1. Thanks everyone who submitted great [logo proposals](https://explorer.bounties.network/bounty/3389) for Vac!
PPS2. Next week on October 10th decanus and I will be presenting Vac at [Devcon](https://devcon.org/agenda), come say hi :)

View File

@ -0,0 +1,152 @@
---
layout: post
name: "Feasibility Study: Semaphore rate limiting through zkSNARKs"
title: "Feasibility Study: Semaphore rate limiting through zkSNARKs"
date: 2019-11-08 12:00:00 +0800
author: oskarth
published: true
permalink: /feasibility-semaphore-rate-limiting-zksnarks
categories: research
summary: A research log. Zero knowledge signaling as a rate limiting mechanism to prevent spam in p2p networks.
image: /assets/img/peacock-signaling.jpg
discuss: https://forum.vac.dev/t/discussion-feasibility-study-semaphore-rate-limiting-through-zksnarks/21
---
**tldr: Moon math promising for solving spam in Whisper, but to get there we need to invest more in performance work and technical upskilling.**
## Motivating problem
In open p2p networks for messaging, one big problem is spam-resistance. Existing solutions, such as Whisper's proof of work, are insufficient, especially for heterogeneous nodes. Other reputation-based approaches might not be desirable, due to issues around arbitrary exclusion and privacy.
One possible solution is to use a right-to-access staking-based method, where a node is only able to send a message, signal, at a certain rate, and otherwise they can be slashed. One problem with this is in terms of privacy-preservation, where we specifically don't want a user to be tied to a specific payment or unique fingerprint.
### Related problems
In addition to above, there are a lot of related problems that share similarities in terms of their structure and proposed solution.
- Private transactions ([Zcash](https://z.cash/), [AZTEC](https://www.aztecprotocol.com/))
- Private voting ([Semaphore](https://github.com/kobigurk/semaphore))
- Private group membership (Semaphore)
- Layer 2 scaling, poss layer 1 ([ZK Rollup](https://ethresear.ch/t/on-chain-scaling-to-potentially-500-tx-sec-through-mass-tx-validation/3477); StarkWare/Eth2-3)
## Overview
## Basic terminology
A *zero-knowledge proof* allows a *prover* to show a *verifier* that they know something, without revealing what that something is. This means you can do trust-minimized computation that is also privacy preserving. As a basic example, instead of showing your ID when going to a bar you simply give them a proof that you are over 18, without showing the doorman your id.
*zkSNARKs* is a form of zero-knowledge proofs. There are many types of zero-knowledge proofs, and the field is evolving rapidly. They come with various trade-offs in terms of things such as: trusted setup, cryptographic assumptions, proof/verification key size, proof/verification time, proof size, etc. See section below for more.
*Semaphore* is a framework/library/construct on top of zkSNARks. It allows for zero-knowledge signaling, specifically on top of Ethereum. This means an approved user can broadcast some arbitrary string without revealing their identity, given some specific constraints. An approved user is someone who has been added to a certain merkle tree. See [current Github home](https://github.com/kobigurk/semaphore) for more.
*Circom* is a DSL for writing arithmetic circuits that can be used in zkSNARKs, similar to how you might write a NAND gate. See [Github](https://github.com/iden3/circom) for more.
## Basic flow
We start with a private voting example, and then extend it to the slashable rate limiting example.
1. A user registers an identity (arbitrary keypair), along with a small fee, to a smart contract. This adds them to a merkle tree and allows them to prove that they are member of that group, without revealing who they are.
2. When a user wants to send a message, they compute a zero-knowledge proof. This ensures certain invariants, have some *public outputs*, and can be verified by anyone (including a smart contract).
3. Any node can verify the proof, including smart contracts on chain (as of Byzantinum HF). Additionally, a node can have rules for the public output. In the case of voting, one such rule is that a specific output hash has to be equal to some predefined value, such as "2020-01-01 vote on Foo Bar for president".
4. Because of how the proof is constructed, and the rules around output values, this ensures that: a user is part of the approved set of voters and that a user can only vote once.
5. As a consequence of above, we have a system where registered users can only vote once, no one can see who voted for what, and this can all be proven and verified.
### Rate limiting example
In the case of rate limiting, we do want nodes to send multiple messages. This changes step 3-5 above somewhat.
*NOTE: It is a bit more involved than this, and if we precompute proofs the flow might look a bit different. But the general idea is the same*.
1. Instead of having a rule that you can only vote once, we have a rule that you can only send a message per epoch. Epoch here can be every second, as defined by UTC date time +-20s.
2. Additionally, if a users sends more than one message per epoch, one of the public outputs is a random share of a private key. Using Shamir's Secret Sharing (similar to a multisig) and 2/3 key share as an example threshold: in the normal case only 1/3 private keys is revealed, which is insufficient to have access. In the case where two messages are sent in an epoch, probabilistically 2/3 shares is sufficient to have access to the key (unless you get the same random share of the key).
3. This means any untrusted user who detects a spamming user, can use it to access their private key corresponding to funds in the contract, and thus slash them.
4. As a consequence of above, we have a system where registered users can only messages X times per epoch, and no one can see who is sending what messages. Additionally, if a user is violating the above rate limit, they can be punished and any user can profit from it.
### Briefly on scope of 'approved users'
In the case of an application like Status, this construct can either be a global StatusNetwork group, or one per chat, or network, etc. It can be applied both at the network and user level. There are no specific limitations on where or who deploys this, and it is thus more of a UX consideration.
## Technical details
For a fairly self-contained set of examples above, see exploration in [Vac research repo](https://github.com/vacp2p/research/blob/master/zksnarks/semaphore/src/hello.js). Note that the Shamir secret sharing is not inside the SNARK, but out-of-band for now.
The [current version](https://github.com/kobigurk/semaphore) of Semaphore is using NodeJS and [Circom](https://github.com/iden3/circom) from Iden3 for Snarks.
For more on rate limiting idea, see [ethresearch post](https://ethresear.ch/t/semaphore-rln-rate-limiting-nullifier-for-spam-prevention-in-anonymous-p2p-setting/5009/).
## Feasibility
The above repo was used to exercise the basic paths and to gain intution of feasibility. Based on it and related reading we outline a few blockers and things that require further study.
### Technical feasibility
#### Proof time
Prove time for Semaphore (<https://github.com/kobigurk/semaphore>) zKSNARKs using circom, groth and snarkjs is currently way too long. It takes on the order of ~10m to generate a proof. With Websnark, it is likely to take 30s, which might still be too long. We should experiment with native code on mobile here.
See [details](https://github.com/vacp2p/research/issues/7).
#### Proving key size
Prover key size is ~110mb for Semaphore. Assuming this is embedded on mobile device, it bloats the APK a lot. Current APK size is ~30mb and even that might be high for people with limited bandwidth.
See [details](https://github.com/vacp2p/research/issues/8).
#### Trusted setup
Using zkSNARKs a trusted setup is required to generate prover and verifier keys. As part of this setup, a toxic parameter lambda is generated. If a party gets access to this lambda, they can prove anything. This means people using zKSNARKs usually have an elaborate MPC ceremony to ensure this parameter doesn't get discovered.
See [details](https://github.com/vacp2p/research/issues/9).
#### Shamir logic in SNARK
For [Semaphore RLN](https://ethresear.ch/t/semaphore-rln-rate-limiting-nullifier-for-spam-prevention-in-anonymous-p2p-setting/5009) we need to embed the Shamir logic inside the SNARK in order to do slashing for spam. Currently the [implementation](https://github.com/vacp2p/research/blob/master/zksnarks/semaphore/src/hello.js#L450) is trusted and very hacky.
See [details](https://github.com/vacp2p/research/issues/10).
#### End to end integation
[Currently](https://github.com/vacp2p/research/blob/master/zksnarks/semaphore/src/hello.js) is standalone and doesn't touch multiple users, deployed contract with merkle tree and verification, actual transactions, a mocked network, add/remove members, etc. There are bound to be edge cases and unknown unknowns here.
See [details](https://github.com/vacp2p/research/issues/11).
#### Licensing issues
Currently Circom [uses a GPL license](https://github.com/iden3/circom/blob/master/COPYING ), which can get tricky when it comes to the App Store etc.
See [details](https://github.com/vacp2p/research/issues/12).
#### Alternative ZKPs?
Some of the isolated blockers for zKSNARKs ([#7](https://github.com/vacp2p/research/issues/7), [#8](https://github.com/vacp2p/research/issues/8), [#9](https://github.com/vacp2p/research/issues/9)) might be mitigated by the use of other ZKP technology. However, they likely have their own issues.
See [details](https://github.com/vacp2p/research/issues/13).
### Social feasibility
#### Technical skill
zkSNARKs and related technologies are quite new. To learn how they work and get an intuition for them requires individuals to dedicate a lot of time to studying them. This means we must make getting competence in these technologies if we wish to use them to our advantage.
#### Time and resources
In order for this and related projects (such as private transaction) to get anywhere, it must be made an explicit area of focus for an extend period of time.
## General thoughts
Similar to Whisper, and in line with moving towards protocol and infrastructure, we need to upskill and invest resources into this. This doesn't mean developing all of the technologies ourselves, but gaining enough competence to leverage and extend existing solutions by the growing ZKP community.
For example, this might also include leveraging largely ready made solutions such as AZTEC for private transaction; more fundamental research into ZK rollup and similar; using Semaphore for private group membership and private voting; Nim based wrapper aronud Bellman, etc.
## Acknowledgement
Thanks to Barry Whitehat for patient explanation and pointers. Thanks to WJ for helping with runtime issues.
*Peacock header image from [Tonos](https://en.wikipedia.org/wiki/File:Flickr_-_lo.tangelini_-_Tonos_(1).jpg).*

View File

@ -0,0 +1,294 @@
---
layout: post
name: "Fixing Whisper with Waku"
title: "Fixing Whisper with Waku"
date: 2019-12-03 12:00:00 +0800
author: oskarth
published: true
permalink: /fixing-whisper-with-waku
categories: research
summary: A research log. Why Whisper doesn't scale and how to fix it.
image: /assets/img/whisper_scalability.png
discuss: https://forum.vac.dev/t/discussion-fixing-whisper-with-waku/27
---
This post will introduce Waku. Waku is a fork of Whisper that attempts to
addresses some of Whisper's shortcomings in an iterative fashion. We will also
introduce a theoretical scaling model for Whisper that shows why it doesn't
scale, and what can be done about it.
## Introduction
Whisper is a gossip-based communication protocol or an ephemeral key-value store
depending on which way you look at it. Historically speaking, it is the
messaging pilllar of [Web3](http://gavwood.com/dappsweb3.html), together with
Ethereum for consensus and Swarm for storage.
Whisper, being a somewhat esoteric protocol and with some fundamental issues,
hasn't seen a lot of usage. However, applications such as Status are using it,
and have been making minor ad hoc modifications to it to make it run on mobile
devices.
What are these fundamental issues? In short:
1. scalability, most immediately when it comes to bandwidth usage
2. spam-resistance, proof of work is a poor mechanism for heterogeneous nodes
3. no incentivized infrastructure, leading to centralized choke points
4. lack of formal and unambiguous specification makes it hard to analyze and implement
5. running over devp2p, which limits where it can run and how
In this post, we'll focus on the first problem, which is scalability through bandwidth usage.
## Whisper theoretical scalability model
*(Feel free to skip this section if you want to get right to the results).*
There's widespread implicit knowledge that Whisper "doesn't scale", but it is less understood exactly why. This theoretical model attempts to encode some characteristics of it. Specifically for use case such as one by Status (see [Status Whisper usage
spec](https://github.com/status-im/specs/blob/master/status-whisper-usage-spec.md)).
### Caveats
First, some caveats: this model likely contains bugs, has wrong assumptions, or completely misses certain dimensions. However, it acts as a form of existence proof for unscalability, with clear reasons.
If certain assumptions are wrong, then we can challenge them and reason about them in isolation. It doesnt mean things will definitely work as the model predicts, and that there arent unknown unknowns.
The model also only deals with receiving bandwidth for end nodes, uses mostly static assumptions of averages, and doesnt deal with spam resistance, privacy guarantees, accounting, intermediate node or network wide failures.
### Goals
1. Ensure network scales by being user or usage bound, as opposed to bandwidth growing in proportion to network size.
2. Staying with in a reasonable bandwidth limit for limited data plans.
3. Do the above without materially impacting existing nodes.
It proceeds through various case with clear assumptions behind them, starting from the most naive assumptions. It shows results for 100 users, 10k users and 1m users.
### Model
```
Case 1. Only receiving messages meant for you [naive case]
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A4. Only receiving messages meant for you.
For 100 users, receiving bandwidth is 1000.0KB/day
For 10k users, receiving bandwidth is 1000.0KB/day
For 1m users, receiving bandwidth is 1000.0KB/day
------------------------------------------------------------
Case 2. Receiving messages for everyone [naive case]
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A5. Received messages for everyone.
For 100 users, receiving bandwidth is 97.7MB/day
For 10k users, receiving bandwidth is 9.5GB/day
For 1m users, receiving bandwidth is 953.7GB/day
------------------------------------------------------------
Case 3. All private messages go over one discovery topic [naive case]
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A8. All private messages are received by everyone (same topic) (static).
For 100 users, receiving bandwidth is 49.3MB/day
For 10k users, receiving bandwidth is 4.8GB/day
For 1m users, receiving bandwidth is 476.8GB/day
------------------------------------------------------------
Case 4. All private messages are partitioned into shards [naive case]
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
For 100 users, receiving bandwidth is 1000.0KB/day
For 10k users, receiving bandwidth is 1.5MB/day
For 1m users, receiving bandwidth is 98.1MB/day
------------------------------------------------------------
Case 5. 4 + Bloom filter with false positive rate
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
- A10. Bloom filter size (m) (static): 512
- A11. Bloom filter hash functions (k) (static): 3
- A12. Bloom filter elements, i.e. topics, (n) (static): 100
- A13. Bloom filter assuming optimal k choice (sensitive to m, n).
- A14. Bloom filter false positive proportion of full traffic, p=0.1
For 100 users, receiving bandwidth is 10.7MB/day
For 10k users, receiving bandwidth is 978.0MB/day
For 1m users, receiving bandwidth is 95.5GB/day
NOTE: Traffic extremely sensitive to bloom false positives
This completely dominates network traffic at scale.
With p=1% we get 10k users ~100MB/day and 1m users ~10gb/day)
------------------------------------------------------------
Case 6. Case 5 + Benign duplicate receives
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
- A10. Bloom filter size (m) (static): 512
- A11. Bloom filter hash functions (k) (static): 3
- A12. Bloom filter elements, i.e. topics, (n) (static): 100
- A13. Bloom filter assuming optimal k choice (sensitive to m, n).
- A14. Bloom filter false positive proportion of full traffic, p=0.1
- A15. Benign duplicate receives factor (static): 2
- A16. No bad envelopes, bad PoW, expired, etc (static).
For 100 users, receiving bandwidth is 21.5MB/day
For 10k users, receiving bandwidth is 1.9GB/day
For 1m users, receiving bandwidth is 190.9GB/day
------------------------------------------------------------
Case 7. 6 + Mailserver under good conditions; small bloom fp; mostly offline
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
- A10. Bloom filter size (m) (static): 512
- A11. Bloom filter hash functions (k) (static): 3
- A12. Bloom filter elements, i.e. topics, (n) (static): 100
- A13. Bloom filter assuming optimal k choice (sensitive to m, n).
- A14. Bloom filter false positive proportion of full traffic, p=0.1
- A15. Benign duplicate receives factor (static): 2
- A16. No bad envelopes, bad PoW, expired, etc (static).
- A17. User is offline p% of the time (static) p=0.9
- A18. No bad request, dup messages for mailservers; overlap perfect (static).
- A19. Mailserver requests can change false positive rate to be p=0.01
For 100 users, receiving bandwidth is 3.9MB/day
For 10k users, receiving bandwidth is 284.8MB/day
For 1m users, receiving bandwidth is 27.8GB/day
------------------------------------------------------------
Case 8. No metadata protection w bloom filter; 1 node connected; static shard
Aka waku mode.
Next step up is to either only use contact code, or shard more aggressively.
Note that this requires change of other nodes behavior, not just local node.
Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
For 100 users, receiving bandwidth is 1000.0KB/day
For 10k users, receiving bandwidth is 1.5MB/day
For 1m users, receiving bandwidth is 98.1MB/day
------------------------------------------------------------
```
See [source](https://github.com/vacp2p/research/tree/master/whisper_scalability)
for more detail on the model and its assumptions.
### Takeaways
1. Whisper as it currently works doesnt scale, and we quickly run into unacceptable bandwidth usage.
2. There are a few factors of this, but largely it boils down to noisy topics usage and use of bloom filters. Duplicate (e.g. see [Whisper vs PSS](https://our.status.im/whisper-pss-comparison/)) and bad envelopes are also factors, but this depends a bit more on specific deployment configurations.
3. Waku mode (case 8) is an additional capability that doesnt require other nodes to change, for nodes that put a premium on performance.
4. The next bottleneck after this is the partitioned topics (app/network specific), which either needs to gracefully (and potentially quickly) grow, or an alternative way of consuming those messages needs to be deviced.
![](assets/img/whisper_scalability.png)
The results are summarized in the graph above. Notice the log-log scale. The
colored backgrounds correspond to the following bandwidth usage:
- Blue: <10mb/d (<~300mb/month)
- Green: <30mb/d (<~1gb/month)
- Yellow: <100mb/d (<~3gb/month)
- Red: >100mb/d (>3gb/month)
These ranges are somewhat arbitrary, but are based on [user
requirements](https://github.com/status-im/status-react/issues/9081) for users
on a limited data plan, with comparable usage for other messaging apps.
## Introducing Waku
### Motivation for a new protocol
Apps such as Status will likely use something like Whisper for the forseeable
future, and we want to enable them to use it with more users on mobile devices
without bandwidth exploding with minimal changes.
Additionally, there's not a clear cut alternative that maps cleanly to the
desired use cases (p2p, multicast, privacy-preserving, open, etc).
We are actively researching, developing and collaborating with more greenfield
approaches. It is likely that Waku will either converge to those, or Waku will
lay the groundwork (clear specs, common issues/components) necessary to make
switching to another protocol easier. In this project we want to emphasize
iterative work with results on the order of weeks.
### Briefly on Waku mode
- Doesnt impact existing clients, its just a separate node and capability.
- Other nodes can still use Whisper as is, like a full node.
- Sacrifices metadata protection and incurs higher connectivity/availability requirements for scalbility
**Requirements:**
- Exposes API to get messages from a set of list of topics (no bloom filter)
- Way of being identified as a Waku node (e.g. through version string)
- Option to statically encode this node in app, e.g. similar to custom bootnodes/mailserver
- Only node that needs to be connected to, possibly as Whisper relay / mailserver hybrid
**Provides:**
- likely provides scalability of up to 10k users and beyond
- with some enhancements to partition topic logic, can possibly scale up to 1m users (app/network specific)
**Caveats:**
- hasnt been tested in a large-scale simulation
- other network and intermediate node bottlenecks might become apparent (e.g. full bloom filter and private cluster capacity; can likely be dealt with in isolation using known techniques, e.g. load balancing) (deployment specific)
### Progress so far
In short, we have a [Waku version 0 spec up](https://specs.vac.dev/specs/waku.html) as well as a [PoC](https://github.com/status-im/nim-eth/pull/120) for backwards compatibility. In the coming weeks, we are going to solidify the specs, get a more fully featured PoC for [Waku mode](https://github.com/status-im/nim-eth/pull/114). See [rough roadmap](https://github.com/vacp2p/pm/issues/5), [project board](https://github.com/orgs/vacp2p/projects/2) and progress thread on the [Vac forum](https://forum.vac.dev/t/waku-project-and-progress/24).
The spec has been rewrittten for clarity, with ABNF grammar and less ambiguous language. The spec also incorporates several previously [ad hoc implemented features](https://specs.vac.dev/specs/waku.html#additional-capabilities), such as light nodes and mailserver/client support. This has already caught a few incompatibilities between the `geth` (Go), `status/whisper` (Go) and `nim-eth` (Nim) versions, specifically around light node usage and the handshake.
If you are interested in this effort, please check out [our forum](https://forum.vac.dev/) for questions, comments and proposals. We already have some discussion for better [spam protection](https://forum.vac.dev/t/stake-priority-based-queuing/26) (see [previous post](https://vac.dev/feasibility-semaphore-rate-limiting-zksnarks) for a more complex but privacy-preserving proposal), something that is likely going to be addressed in future versions of Waku, along with many other fixes and enhancement.

View File

@ -0,0 +1,144 @@
---
layout: post
name: "Waku Update"
title: "Waku Update"
date: 2020-02-14 12:00:00 +0800
author: oskarth
published: true
permalink: /waku-update
categories: research
summary: A research log. What's the current state of Waku? How many users does it support? What are the bottlenecks? What's next?
image: /assets/img/waku_infrastructure_sky.jpg
discuss: https://forum.vac.dev/t/waku-update-where-are-we-at/34
---
Waku is our fork of Whisper where we address the shortcomings of Whisper in an iterative manner. We've seen a in [previous post](https://vac.dev/fixing-whisper-with-waku) that Whisper doesn't scale, and why. In this post we'll talk about what the current state of Waku is, how many users it can support, and future plans.
## Current state
**Specs:**
We released [Waku spec v0.3](https://specs.vac.dev/specs/waku/waku.html) this week! You can see the full changelog [here](https://specs.vac.dev/specs/waku/waku.html#changelog).
The main change from 0.2 is making the handshake more flexible. This enables us to communicate topic interest immediately without ambiguity. We also did the following:
- added recommendation for DNS based discovery
- added an upgradability and compatibility policy
- cut the spec up into several components
We cut the spec up in several components to make Vac as modular as possible. The components right now are:
- Waku (main spec), currently in [version 0.3.0](https://specs.vac.dev/specs/waku/waku.html)
- Waku envelope data field, currently in [version 0.1.0](https://specs.vac.dev/specs/waku/envelope-data-format.html)
- Waku mailserver, currently in [version 0.2.0](https://specs.vac.dev/specs/waku/mailserver.html)
We can probably factor these out further as the main spec is getting quite big, but this is good enough for now.
**Clients:**
There are currently two clients that implement Waku v0.3, these are [Nimbus](https://github.com/status-im/nimbus/tree/master/waku) in Nim and [status-go](https://github.com/status-im/status-go) in Go.
For more details on what each client support and don't, you can follow the [work in progress checklist](https://github.com/vacp2p/pm/issues/7).
Work is currently in progress to integrate it into the [Status core app](https://github.com/status-im/status-react/pull/9949). Waku is expected to be part of their upcoming 1.1 release (see [Status app roadmap](https://trello.com/b/DkxQd1ww/status-app-roadmap)).
**Simulation:**
We have a [simulation](https://github.com/status-im/nimbus/tree/master/waku#testing-waku-protocol) that verifies - or rather, fails to falsify - our [scalability model](https://vac.dev/fixing-whisper-with-waku). More on the simulation and what it shows below.
## How many users does Waku support?
This is our current understanding of how many users a network running Waku can support. Specifically in the context of the Status chat app, since that's the most immediate consumer of Waku. It should generalize fairly well to most deployments.
**tl;dr (for Status app):**
- beta: 100 DAU
- v1: 1k DAU
- v1.1 (waku only): 10k DAU (up to x10 with deployment hotfixes)
- v1.2 (waku+dns): 100k DAU (can optionally be folded into v1.1)
*Assuming 10 concurrent users = 100 DAU. Estimate uncertainty increases for each order of magnitude until real-world data is observed.*
As far as we know right now, these are the bottlenecks we have:
- Immediate bottleneck - Receive bandwidth for end user clients (aka Fixing Whisper with Waku)
- Very likely bottleneck - Nodes and cluster capacity (aka DNS based node discovery)
- Conjecture but not unlikely to appear- Full node traffic (aka the routing / partition problem)
We've already seen the first bottleneck being discussed in the initial post. Dean wrote a post on [DNS based discovery](https://vac.dev/dns-based-discovery) which explains how we will address the likely second bottleneck. More on the third one in future posts.
For more details on these bottlenecks, see [Scalability estimate: How many users can Waku and the Status app support?](https://discuss.status.im/t/scalability-estimate-how-many-users-can-waku-and-the-status-app-support/1514).
## Simulation
The ultimate test is real-world usage. Until then, we have a simulation thanks to Kim De Mey from the Nimbus team!
![](assets/img/waku_simulation.jpeg)
We have two network topologies, Star and full mesh. Both networks have 6 full nodes, one traditional light node with bloom filter, and one Waku light node.
One of the full nodes sends 1 envelope over 1 of the 100 topics that the two light nodes subscribe to. After that, it sends 10000 envelopes over random topics.
For light node, bloom filter is set to almost 10% false positive (bloom filter: n=100, k=3, m=512). It shows the number of valid and invalid envelopes received for the different nodes.
**Star network:**
| Description | Peers | Valid | Invalid |
|-----------------|-------|-------|---------|
| Master node | 7 | 10001 | 0 |
| Full node 1 | 3 | 10001 | 0 |
| Full node 2 | 1 | 10001 | 0 |
| Full node 3 | 1 | 10001 | 0 |
| Full node 4 | 1 | 10001 | 0 |
| Full node 5 | 1 | 10001 | 0 |
| Light node | 2 | 815 | 0 |
| Waku light node | 2 | 1 | 0 |
**Full mesh:**
| Description | Peers | Valid | Invalid |
|-----------------|-------|-------|---------|
| Full node 0 | 7 | 10001 | 20676 |
| Full node 1 | 7 | 10001 | 9554 |
| Full node 2 | 5 | 10001 | 23304 |
| Full node 3 | 5 | 10001 | 11983 |
| Full node 4 | 5 | 10001 | 24425 |
| Full node 5 | 5 | 10001 | 23472 |
| Light node | 2 | 803 | 803 |
| Waku light node | 2 | 1 | 1 |
Things to note:
- Whisper light node with ~10% false positive gets ~10% of total traffic
- Waku light node gets ~1000x less envelopes than Whisper light node
- Full mesh results in a lot more duplicate messages, expect for Waku light node
Run the simulation yourself [here](https://github.com/status-im/nimbus/tree/master/waku#testing-waku-protocol). The parameters are configurable, and it is integrated with Prometheus and Grafana.
## Difference between Waku and Whisper
Summary of main differences between Waku v0 spec and Whisper v6, as described in [EIP-627](https://eips.ethereum.org/EIPS/eip-627):
- Handshake/Status message not compatible with shh/6 nodes; specifying options as association list
- Include topic-interest in Status handshake
- Upgradability policy
- `topic-interest` packet code
- RLPx subprotocol is changed from shh/6 to waku/0.
- Light node capability is added.
- Optional rate limiting is added.
- Status packet has following additional parameters: light-node, confirmations-enabled and rate-limits
- Mail Server and Mail Client functionality is now part of the specification.
- P2P Message packet contains a list of envelopes instead of a single envelope.
## Next steps and future plans
Several challenges remain to make Waku a robust and suitable base
communication protocol. Here we outline a few challenges that we are addressing and will continue to work on:
- scalability of the network
- incentived infrastructure and spam-resistance
- build with resource restricted devices in mind, including nodes being mostly offline
For the third bottleneck, a likely candidate for fixing this is Kademlia routing. This is similar to what is done in [Swarm's](https://swarm.ethereum.org/]) PSS. We are in the early stages of experimenting with this over libp2p in [nim-libp2p](https://github.com/status-im/nim-libp2p). More on this in a future post!
## Acknowledgements
*Image from "caged sky" by mh.xbhd.org is licensed under CC BY 2.0 (https://ccsearch.creativecommons.org/photos/a9168311-78de-4cb7-a6ad-f92be8361d0e)*

View File

@ -0,0 +1,91 @@
---
layout: post
name: "DNS Based Discovery"
title: "DNS Based Discovery"
date: 2020-02-7 12:00:00 +0100
author: dean
published: true
permalink: /dns-based-discovery
categories: research
summary: A look at EIP-1459 and the benefits of DNS based discovery.
---
Discovery in p2p networks is the process of how nodes find each other and specific resources they are looking for. Popular discovery protocols, such as [Kademlia](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf) which utilizes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table) or DHT, are highly inefficient for resource restricted devices. These methods use short connection windows, and it is quite battery intensive to keep establishing connections. Additionally, we cannot expect a mobile phone for example to synchronize an entire DHT using cellular data.
Another issue is how we do the initial bootstrapping. In other words, how does a client find its first node to then discover the rest of the network? In most applications, including Status right now, this is done with a [static list of nodes](https://github.com/status-im/specs/blob/master/status-client-spec.md#bootstrapping) that a client can connect to.
In summary, we have a static list that provides us with nodes we can connect to which then allows us to discover the rest of the network using something like Kademlia. But what we need is something that can easily be mutated, guarantees a certain amount of security, and is efficient for resource restricted devices. Ideally our solution would also be robust and scalable.
How do we do this?
[EIP 1459: Node Discovery via DNS](https://eips.ethereum.org/EIPS/eip-1459), which is one of the strategies we are using for discovering waku nodes. [EIP-1459](https://eips.ethereum.org/EIPS/eip-1459) is a DNS-based discovery protocol that stores [merkle trees](https://en.wikipedia.org/wiki/Merkle_tree) in DNS records which contain connection information for nodes.
*Waku is our fork of Whisper. Oskar recently wrote an [entire post](https://vac.dev/fixing-whisper-with-waku) explaining it. In short, Waku is our method of fixing the shortcomings of Whisper in a more iterative fashion. You can find the specification [here](https://specs.vac.dev/specs/waku/waku.html)*
DNS-based methods for bootstrapping p2p networks are quite popular. Even Bitcoin uses it, but it uses a concept called DNS seeds, which are just DNS servers that are configured to return a list of randomly selected nodes from the network upon being queried. This means that although these seeds are hardcoded in the client, the IP addresses of actual nodes do not have to be.
```console
> dig dnsseed.bluematt.me +short
129.226.73.12
107.180.78.111
169.255.56.123
91.216.149.28
85.209.240.91
66.232.124.232
207.55.53.96
86.149.241.168
193.219.38.57
190.198.210.139
74.213.232.234
158.181.226.33
176.99.2.207
202.55.87.45
37.205.10.3
90.133.4.73
176.191.182.3
109.207.166.232
45.5.117.59
178.211.170.2
160.16.0.30
```
The above displays the result of querying on of these DNS seeds. All the nodes are stored as [`A` records](https://simpledns.plus/help/a-records) for the given domain name. This is quite a simple solution which Bitcoin almost soley relies on since removing the [IRC bootstrapping method in v0.8.2](https://en.bitcoin.it/wiki/Network#IRC).
What makes this DNS based discovery useful? It allows us to have a mutable list of bootstrap nodes without needing to ship a new version of the client every time a list is mutated. It also allows for a more lightweight method of discovering nodes, something very important for resource restricted devices.
Additionally, DNS provides us with a robust and scalable infrastructure. This is due to its hierarchical architecture. This hierarchical architecture also already makes it distributed such that the failure of one DNS server does not result in us no longer being able to resolve our name.
As with every solution though, there is a trade-off. By storing the list in DNS name an adversary would simply need to censor the DNS records for a specific name. This would prevent any new client trying to join the network from being able to do so.
One thing you notice when looking at [EIP-1459](https://eips.ethereum.org/EIPS/eip-1459) is that it is a lot more technically complex than Bitcoin's way of doing this. So if Bitcoin uses this simple method and has proven that it works, why did we need a new method?
There are multiple reasons, but the main one is **security**. In the Bitcoin example, an attacker could create a new list and no one querying would be able to tell. This is however mitigated in [EIP-1459](https://eips.ethereum.org/EIPS/eip-1459) where we can verify the integrity of the entire returned list by storing an entire merkle tree in the DNS records.
Let's dive into this. Firstly, a client that is using these DNS records for discovery must know the public key corresponding to the private key controlled by the entity creating the list. This is because the entire list is signed using a secp256k1 private key, giving the client the ability to authenticate the list and know that it has not been tampered with by some external party.
So that already makes this a lot safer than the method Bitcoin uses. But how are these lists even stored? As previously stated they are stored using **merkle trees** as follows:
- The root of the tree is stored in a [`TXT` record](https://simpledns.plus/help/txt-records), this record contains the tree's root hash, a sequence number which is incremented every time the tree is updated and a signature as stated above.
Additionally, there is also a root hash to a second tree called a **link tree**, it contains the information to different lists. This link tree allows us to delegate trust and build a graph of multiple merkle trees stored across multiple DNS names.
The sequence number ensures that an attacker cannot replace a tree with an older version because when a client reads the tree, they should ensure that the sequence number is greater than the last synchronized version.
- Using the root hash for the tree, we can find the merkle tree's first branch, the branch is also stored in a `TXT` record. The branch record contains all the hashes of the branch's leafs.
- Once a client starts reading all the leafs, they can find one of two things: either a new branch record leading them further down the tree or an Ethereum Name Records (ENR) which means they now have the address of a node to connect to! To learn more about ethereum node records you can have a look at [EIP-778](https://eips.ethereum.org/EIPS/eip-778), or read a short blog post I wrote explaining them [here](https://dean.eigenmann.me/blog/2020/01/21/network-addresses-in-ethereum/#enr).
Below is the zone file taken from the [EIP-1459](https://eips.ethereum.org/EIPS/eip-1459), displaying how this looks in practice.
```
; name ttl class type content
@ 60 IN TXT enrtree-root:v1 e=JWXYDBPXYWG6FX3GMDIBFA6CJ4 l=C7HRFPF3BLGF3YR4DY5KX3SMBE seq=1 sig=o908WmNp7LibOfPsr4btQwatZJ5URBr2ZAuxvK4UWHlsB9sUOTJQaGAlLPVAhM__XJesCHxLISo94z5Z2a463gA
C7HRFPF3BLGF3YR4DY5KX3SMBE 86900 IN TXT enrtree://AM5FCQLWIZX2QFPNJAP7VUERCCRNGRHWZG3YYHIUV7BVDQ5FDPRT2@morenodes.example.org
JWXYDBPXYWG6FX3GMDIBFA6CJ4 86900 IN TXT enrtree-branch:2XS2367YHAXJFGLZHVAWLQD4ZY,H4FHT4B454P6UXFD7JCYQ5PWDY,MHTDO6TMUBRIA2XWG5LUDACK24
2XS2367YHAXJFGLZHVAWLQD4ZY 86900 IN TXT enr:-HW4QOFzoVLaFJnNhbgMoDXPnOvcdVuj7pDpqRvh6BRDO68aVi5ZcjB3vzQRZH2IcLBGHzo8uUN3snqmgTiE56CH3AMBgmlkgnY0iXNlY3AyNTZrMaECC2_24YYkYHEgdzxlSNKQEnHhuNAbNlMlWJxrJxbAFvA
H4FHT4B454P6UXFD7JCYQ5PWDY 86900 IN TXT enr:-HW4QAggRauloj2SDLtIHN1XBkvhFZ1vtf1raYQp9TBW2RD5EEawDzbtSmlXUfnaHcvwOizhVYLtr7e6vw7NAf6mTuoCgmlkgnY0iXNlY3AyNTZrMaECjrXI8TLNXU0f8cthpAMxEshUyQlK-AM0PW2wfrnacNI
MHTDO6TMUBRIA2XWG5LUDACK24 86900 IN TXT enr:-HW4QLAYqmrwllBEnzWWs7I5Ev2IAs7x_dZlbYdRdMUx5EyKHDXp7AV5CkuPGUPdvbv1_Ms1CPfhcGCvSElSosZmyoqAgmlkgnY0iXNlY3AyNTZrMaECriawHKWdDRk2xeZkrOXBQ0dfMFLHY4eENZwdufn1S1o
```
All of this has already been introduced into go-ethereum with the pull request [#20094](https://github.com/ethereum/go-ethereum/pull/20094), created by Felix Lange. There's a lot of tooling around it that already exists too which is really cool. So if your project is written in Golang and wants to use this, it's relatively simple! Additionally, here's a proof of concept that shows what this might look like with libp2p on [github](https://github.com/decanus/dns-discovery).
I hope this was a helpful explainer into DNS based discovery, and shows [EIP-1459](https://eips.ethereum.org/EIPS/eip-1459)'s benefits over more traditional DNS-based discovery schemes.

View File

@ -0,0 +1,264 @@
---
layout: post
name: "What Would a WeChat Replacement Need?"
title: "What Would a WeChat Replacement Need?"
date: 2020-04-16 12:00:00 +0800
author: oskarth
published: true
permalink: /wechat-replacement-need
categories: research
summary: What would a self-sovereign, private, censorship-resistant and open alternative to WeChat look like?
image: /assets/img/tianstatue.jpg
discuss: https://forum.vac.dev/t/discussion-what-would-a-wechat-replacement-need/42
---
What would it take to replace WeChat? More specifically, what would a self-sovereign, private, censorship-resistant and open alternative look like? One that allows people to communicate, coordinate and transact freely.
## Background
### What WeChat provides to the end-user
Let's first look at some of the things that WeChat providers. It is a lot:
- **Messaging:** 1:1 and group chat. Text, as well as voice and video. Post gifs. Share location.
- **Group chat:** Limited to 500 people; above 100 people people need to verify with a bank account. Also has group video chat and QR code to join a group.
- **Timeline/Moments:** Post comments with attachments and have people like/comment on it.
- **Location Discovery:** See WeChat users that are nearby.
- **Profile:** Nickname and profile picture; can alias people.
- **"Broadcast" messages:** Send one message to many contacts, up to 200 people (spam limited).
- **Contacts:** Max 5000 contacts (people get around it with multiple accounts and sim cards).
- **App reach:** Many diferent web apps, extensions, native apps, etc. Scan QR code to access web app from phone.
- **Selective posting:** Decide who can view your posts and who can view your comments on other people's post.
- **Transact:** Send money gifts through red envelopes.
- **Transact:** Use WeChat pay to transfer money to friends and businesses; linked account with Alipay that is connected to your bank account.
- **Services:** Find taxis and get notifications; book flights, train tickets, hotels etc.
- **Mini apps:** API for all kinds of apps that allow you to provide services etc.
- **Picture in picture:** allowing you to have a video call while using the app.
And much more. Not going to through it all in detail, and there are probably many things I don't know about WeChat since I'm not a heavy user living in mainland China.
### How WeChat works - a toy model
This is an overly simplistic model of how WeChat works, but it is sufficient for our purposes. This general design applies to most traditional client-server apps today.
To sign up for account you need a phone number or equivalent. To get access to some features you need to verify your identity further, for example with official ID and/or bank account.
When you signup this creates an entry in the WeChat server, from now on treated as a black box. You authenticate with that box, and thats where you get your messages from. If you go online the app asks that box for messages you have received while you were offline. If you login from a different app your contacts and conversations are synced from that box.
The box gives you an account, it deals with routing to your contacts, it stores messages and attachments and gives access to mini apps that people have uploaded. For transacting money, there is a partnership with a different company that has a different box which talks to your bank account.
This is done in a such a way that they can support a billion users with the features above, no sweat.
Whoever controls that box can sees who you are talking with and what the content of those messages are. There is no end to end encryption. If WeChat/Tencent disagrees with you for some reason they can ban you. This means you can't interact with the box under that name anymore.
## What do we want?
We want something that is self-sovereign, private, censorship-resistant and open that allows individuals and groups of people to communicate and transact freely. To explore what this means in more detail, without getting lost in the weeds, we provide the following list of properties. A lot of these are tied together, and some fall out of the other requirements. Some of them stand in slight opposition to each other.
**Self-sovereignity identity**. Exercises authority within your own sphere. If you aren't harming anyone, you should be able to have an account and communicate with other people.
**Pseudonymity, and ideally total anonymity**. Not having your identity tied to your real name (e.g. through phone number, bank account, ID, etc). This allows people to act more freely without being overly worried about censorship and coercion in the real world. While total anonymity is even more desirable - especially to break multiple hops to a true-name action - real-world constraints sometimes makes this more challenging.
**Private and secure communication**. Your communication and who you transact with should be for your eyes only. This includes transactions (transfer of value) as a form of communication.
**Censorship-resistance**. Not being able to easily censor individuals on the platform. Both at an individual, group and collective level. Not having single points of failure that allow service to be disrupted.
**Decentralization**. Partly falls out of censorship-resistance and other properties. If infrastructure isn't decentralized it means there's a single point of failure that can be disrupted. This is more of a tool than a goal on its own, but it is an important tool.
**Built for mass adoption**. Includes scalabiltiy, UX (latency, reliability, bandwidth consumption, UI etc), and allowing for people to stick around. One way of doing this is to allow users to discover people they want to talk to.
**Scalability**. Infrastructure needs to support a lot of users to be a viabile alternative. Like, a billion of them (eventually).
**Fundamentals in place to support great user experience**. To be a viable alternative, aside from good UI and distribution, fundamentals such as latency, bandwidth usage, consistency etc must support great UX to be a viable alternative.
**Works for resource restricted devices, including smartphones**. Most people will use a smartphone to use this. This means it has to work well on them and similar devices, without becoming a second-class citizen where we ignore properties such as censorship-resistance and privacy. Some concession to reality will be necessary due to additional constraints, which leads us to...
**Adaptive nodes**. Nodes will have different capabilities, and perhaps at different times. To maintain a lot of the properties described here it is desirable if as many participants as possible are first-class citizens. If a phone is switching from a limited data plan to a WiFi network or from battery to AC power it can do more useful work, and so on. Likewise for a laptop with a lot of free disk space and spare compute power, etc.
**Sustainable**. If there's no centralized, top down ad-driven model, this means all the infrastructure has to be sustainable somehow. Since these are individual entitites, this means it has to be paid for. While altruistic modes and similar can be used, this likely requires some form of incentivization scheme for useful services provided in the network. Related: free rider problem.
**Spam resistant**. Relates to sustainability, scalability and built for mass adoption. Made more difficult by pseudonymous identity due to whitewashing attacks.
**Trust-minimized**. To know that properties are provided for and aren't compromised, various ways of minimizing trust requirements are useful. This also related to mass adoption and social cohesion. Examples include: open and audited protocols, open source, reproducible builds, etc. This also relates to how mini apps are provided for, since we may not know their source but want to be able to use them anyway.
**Open source**. Related to above, where we must be able to inspect the software to know that it functions as advertised and hasn't been compromised, e.g. by uploading private data to a third party.
Some of these are graded and a bit subtle, i.e.:
- Censorship resistance would ideally be able to absorb Internet shutdowns. This would require an extensive MANET/meshnet infrastructure, which while desirable, requires a lot of challenges to be overcome to be feasible.
- Privacy would ideally make all actions (optionally) totally anoymous, though this may incur undue costs on bandwidth and latency, which impacts user experience.
- Decentralization, certain topologies, such as DHTs, are efficient and quite decentralized but still have some centralized aspects, which makes it attackable in various ways. Ditto for blockchains compared with bearer instruments which requires some coordinating infrastructure, compared with naturally occuring assets such as precious metals.
- "Discover people" and striving for "total anonymity" might initially seem incompatible. The idea is to provide for sane defaults, and then allow people to decide how much information they want to disclose. This is the essence of privacy.
- Users often want *some* form of moderation to get a good user experience, which can be seen as a form of censorship. The idea to raise the bar on the basics, the fundamental infrastructure. If individuals or specific communities want certain moderation mechanisms, that is still a compatible requirement.
### Counterpoint 1
We could refute the above by saying that the design goals are undesirable. We want a system where people can censor others, and where everyone is tied to their real identity. Or we could say something like, freedom of speech is a general concept, and it doesn't apply to Internet companies, even if they provide a vital service. You can survive without it and you should've read the terms of service. This roughly charactericizes the mainstream view.
Additional factor here is the idea that a group of people know more about what's good for you then you do, so they are protecting you.
### Counterpoint 2
We could agree with all these design goals, but think they are too extreme in terms of their requirements. For example, we could operate as a non profit, take donations and volunteers, and then host the whole infrastructure ourselves. We could say we are in a friendly legislation, so we won't be a single point of failure. Since we are working on this and maybe even our designs are open, you can trust us and we'll provide service and infrastructure that gives you what you want without having to pay for it or solve all these complex decentralized computation and so on problems. If you don't trust us for some reason, you shouldn't use us regardless. Also, this is better than status quo. And we are more likely to survive by doing this, either by taking shortcuts or by being less ambituous in terms of scope.
## Principal components
There are many ways to skin a cat, but this is one way of breaking down the problem. We have a general direction with the properties listed above, together with some understanding of how WeChat works for the everday user. Now the question is, what infrastructure do we need to support this? How do we achieve the above properties, or at least get closer to them? We want to figure out the necessary building blocks, and one of doing this is to map out likely necessary components.
### Background: Ethereum and Web3 stack
It is worth noting that a lot of the required infrastructure has been developed, at least as concepts, in the original Ethereum / Web3 vision. In it there is Ethereum for consensus/compute/transact, storage through Swarm, and communication through Whisper. That said, the main focus has been on the Ethereum blockchain itself, and a lot of things have happened in the last 5y+ with respect to technology around privacy and scalabilty. It is worth revisiting things from a fresh point of view, with the WeChat alternative in mind as a clear use case.
### Account - self-sovereign identity and the perils of phone numbers
Starting from the most basic: what is an account and how do you get one? With most internet services today, WeChat and almost all popular messaging apps included, you need to signup with some centralized authority. Usually you also have to verify this with some data that ties this account to you as an individual. E.g. by requiring a phone number, which in most jurisdictions [^1] means giving out your real ID. This also means you can be banned from using the service by a somewhat arbitrary process, with no due process.
Now, we could argue these app providers can do what they want. And they are right, in a very narrow sense. As apps like WeChat (and Google) become general-purpose platforms, they become more and more ingrained in our everyday lives. They start to provide utilities that we absolutely require to work to go about our day, such as paying for food or transportation. This means we need higher standard than this.
Justifications for requiring phone numbers are usually centered around three claims:
1) Avoiding spam
2) Tying your account to your real name, for various reasons
3) Using as a commonly shared identifier as a social network discovery mechanism
Of course, many services require more than phone numbers. E.g. email, other forms of personal data such as voice recording, linking a bank account, and so on.
In contrast, a self-sovereign system would allow you to "create an account" completely on your own. This can easily be done with public key cryptograpy, and it also paves the way for end-to-end encryption to make your messages private.
The main issue with this that you need to get more creative about avoiding spam (e.g. through white washing attacks), and ideally there is some other form of social discovery mechanism.
Just having a public key as an account isn't enough though. If it goes through a central server, then nothing is stopping that server from arbitrarly blocking requests related to that public key. Of course, this also depends on how transparent such requests are. Fundamentally, lest we rely completely on goodwill, there needs to be multiple actors by which you can use the service. This naturally points to decentralization as a requirement. See counterpoint.
Even so, if the system is closed source we don't know what it is doing. Perhaps the app communicating is also uploading data to another place, or somehow making it possible to see who is who and act accordingly.
You might notice that just one simple property, self-sovereign identity, leads to a slew of other requirements and properties. You might also notice that WeChat is far from alone in this, even if their identity requirements might be a bit stringent than, say, Telegram. Their control aspects are also a bit more extreme, at least for someone with western sensibilities [^2].
Most user facing applications have similar issues, Google Apps/FB/Twitter etc. For popular tools that have this built in, we can look at git - which is truly decentralized and have keypair at the bottom. It is for a very specific technical domain, and even then people rely on Github. Key management is fairly difficult even for technical people, and for normal people even more so. Banks are generally far behind on this tech, relying on arcane procedures and special purpose hardware for 2FA. That's another big issue.
Let's shift gears a bit and talk about some other functional requirements.
### Routing - packets from A to B
In order to get a lot of the features WeChat provides, we need the ability to do three things: communicate, store data, and transact with people. We need a bit more than that, but let's focus on this for now.
To communicate with people, in the base case, we need to go from one phone to another phone that is separated by a large distance. This requires some form of routing. The most natural platform to build this on is the existing Internet, though not the only one. Most phones are resource restricted, and are only "on" for brief periods of time. This is needed to preserve battery and bandwidth. Additionally, Internet uses IPs as endpoints, which change as a phones move through space. NAT punching etc isn't always perfect either. This means we need a way to get a message from one public key to another, and through some intermediate nodes. We can think of these nodes as a form of service network. Similar to how a power grid works, or phone lines, or collection of ISPs.
One important property here is to ensure we don't end up in a situation like the centralized capture scenario above, something we've seen with centralized ISPs [^3] [^4] where they can choose which traffic is good and which is bad. We want to allow the use of different service nodes, just like if a restaurant gives you food poisioning you can go to the one next door and then the first one goes out of business after a while. And the circle of life continues.
We shouldn't be naive though, and think that this is something nodes are likely to do for free. They need to be adequately compensated for their services, in some of incentivization scheme. That can either be monetary, or as in the case of Bittorrent, more of a barter situation where you use game theory to coordinate with strangers [^5], and some form of reputation attached to it (for private trackers).
There are many ways of doing routing, and we won't go into too much technical detail here. Suffice to say is that you likely want both a structured and unstructured alternative, and that these comes with several trade-offs when it comes to efficiency, metadata protection, ability to incentivize, compatibility with existing topologies, and suitability for mobilephones (mostly offline, bandwidth restricted, not directly connectable). Expect more on this in a future article.
Some of these considerations naturally leads us into the storage and transaction components.
### Storage - available and persistant for later
If mobile phones are mostly offline, we need some way to store these messages so they can be retrieved when online again. The same goes for various kinds attachments as well, and for when people are switching devices. A user might control their timeline, but in the WeChat case that timeline is stored on Tencent's servers, and queried from there as well. This naturally needs to happen by some other service nodes. In the WeChat case, and for most IMs, the way these servers are paid for is through some indirect ad mechanism. The entity controlling these ads and so on is the same one as the one operating the servers for storage. A more direct model with different entities would see these services being compensated for their work.
We also need storage for attachments, mini-apps, as well as a way of understanding the current state of consensus when it comes to the compute/transact module. In the WeChat case, this state is completely handled by the bank institution or one of their partners, such as Alibaba. When it comes to bearer instruments like cash, no state needs to be kept as that's a direct exchange in the physical world. This isn't directly compatible with transfering value over a distance.
All of this state requires availability and persistance. It should be done in a trust minimized fashion and decentralized, which requires some form of incentivization for keeping data around. If it isn't, you are relying on social cohesion which breaks down at very large scales.
Since data will be spread out across multiple nodes, you need a way to sync data and transfer it in the network. As well as being able to add and query data from it. All of this requires a routing component.
To make it more censorship resistant it might be better to keep it as a general-purpose store, i.e. individuals don't need to know what they storing. Otherwise, you naturally end up in a situation where individual nodes can be pressured to not store certain content.
### Messaging - from me to you to all of us (not them)
This builds on top of routing, but it has a slightly different focus. The goal is to allow for individuals and groups to communicate in a private, secure and censorship-resistant manner.
It also needs to provide a decent interface to the end user, in terms of dealing seamlessly with offline messages, providing reliable and timely messaging.
In order to get closer to the ideal of total anonymity, it is useful to be able to hide metadata of who is talking to whom. This applies to both normal communication as well as for transactions. Ideally, no one but the parties involved can see who is taking part in a conversation. This can be achieved through various techniques such as mixnets, anonymous credentials, private information retrieval, and so on. Many of these techniques have a fundamental trade-off with latency and bandwidth, something that is a big concern for mobilephones. Being able to do some form of tuning, in an adaptive node manner, depending on your threat model and current capabilities is useful here.
The baseline here is pseudonymity, and having tools to allow individuals to "cut off" ties to their real world identity and transactions. People act different in different circles in the real world, and this should be mimicked online as well. Your company, family or government shouldn't be able to know what exactly you use your paycheck for, and who you are talking to.
### Compute - transact, contract and settle
The most immediate need here is transaction from A to B. Direct exchange. There is also a more indirect need for private lawmaking and contracting.
We talked about routing and storage and how they likely need to be incentivized to work properly. How are they going to be compensated? While this could in theory work via existing banking system and so on, this would be rather heavy. It'd also very likely require tying your identifier to your legal name, something that goes against what we want to achieve. What we want is something that acts more as right-to-access, similar to the way cash functions in a society [^6]. I pay for a fruit with something that is valuable to you and then I'm on my way.
While there might be other candidates, such as pre-paid debit cards and so on, this transaction mode pretty much requires a cryptocurrency component. The alternative is to do it on a reputation basis, which might work for small communities, due to social cohesion, but quickly detoriates for large ones [^7]. Ad hoc models like private Bittorrent trackers are centralized and easy to censor.
Now, none of the existing cryptocurrency models are ideal. They also all suffer from lack of widespread use, and it is difficult to get onboarded to them in the first place. Transactions in Bitcoin are slow. Ethereum is faster and has more capabilities, but it still suffers from linking payments over time, which makes the privacy part of this more difficult. Zcash, Monero and similar are interesting, but also require more use. For Zcash, shielded transactions appear to only account for less than 2% of all transactions in 2019 [^8] [^9].
Another dimension is what sets general purpose cryptocurrencies like Ethereum apart. Aside from just paying from A to B, you can encode rules about when something should be paid out and not. This is very useful for doing a form of private lawmaking, contracting, for setting up service agreements with these nodes. If there's no trivial recourse as in the meatspace world, where you know someone's name and you can sue them, you need a different kind of model.
What makes something like Zcash interesting is that it works more like digital cash. Instead of leaving a public trail for everyone, where someone can see where you got the initial money from and then trace you across various usage, for Zcash every hop is privacy preserving.
To fulfill the general goals of being censorship resistance and secure, it is also vital that the system being used stays online and can't be easily disrupted. That points to disintermediation, as opposed to using gateways and exchanges. This is a case where something like cash, or gold, is more direct, since no one can censor this transaction without being physically present where this direct exchange is taking place. However, like before, this doesn't work over distance.
### Secure chat - just our business
Similar to the messaging module above. The distinction here is that we assume the network part has already taken place. Here we are interested in keeping the contents of messages private, so that means confidentiality/end-to-end encryption, integrity, authentication, as well as forward secrecy and plausible deniability. This means that even if there's some actor that gets some private key material, or confiscated your phone, there is some level of...ephemerality to your conversations. Another issue here in terms of scalable private group chat.
### Extensible mini apps
This relates to the compute and storage module above. Essentially we want to provide mini apps as in WeChat, but to do so in a way that is compatible with what we want to achieve more generally. This allows individuals and small businesses to create small tools for various purposes, and coordinate with strangers. E.g. booking a cab or getting an insurance, and so on.
This has a higher dependency on the contracting/general computation aspect. I.e. often it isn't only a transaction, but you might want to encode some specific rules here that strangers can abide by without having too high trust requirements. As a simple example: escrows.
This also needs an open API that anyone can use. It should be properly secured, so using one doesn't compromise the rest of the system it is operating in. To be censorship resistant it requires the routing and storage component to work properly.
## Where are we now?
Let's look back at some of desirable properties we set out in the beginning and see how close we are to building out the necessary components. Is it realistic at all or just a pipe dream? We'll see that there are many building blocks in place, and there's reason for hope.
**Self-sovereignity identity**. Public key crypto and web of trust like constructs makes this possible.
**Pseudonymity, and ideally total anonymity**. Pseudonymity can largely be achieved with public key crypto and open systems that allow for permissionless participation. For transactions, pseudonymity exists in most cryptocurrencies. The challenge is linkage across time, especially when interfacing with other "legacy" system. There are stronger constructs that are actively being worked on and are promising here, such as mixnets (Nym), mixers (Wasabi Wallet, Tornado.Cash) and zero knowledge proofs (Zcash, Ethereum, Starkware). This area of applied research has exploded over the last few years.
**Private and secure communication**. Signal has pioneered a lot of this, following OTR. Double Ratchet, X3DH. E2EE is minimum these days, and properties like PFS and PD are getting better. For metadata protection, you have Tor, with its faults, and more active research on mixnets and private information retrieval, etc.
**Censorship-resistance**. This covers a lot of ground across the spectrum. You have technologies like Bittorrent, Bitcoin/Ethereum, Tor obfuscated transports, E2EE by default, partial mesh networks in production, abilit to move/replicate host machines more quickly have all made this more of a reality than it used to be. this easier. Of course, techniques such as deep packet inspection and internet shutdowns have increased.
**Decentralization**. Cryptocurrencies, projects like libp2p and IPFS. Need to be mindful here of many projects that claim decentralization but are still vulnerable to single points of failures, such as relying on gateways.
**Built for mass adoption**. This one is more subjective. There's definitely a lot of work to be done here, both when it comes to fundamental performance, key management and things like social discoverability. Directionally these things are improving and becoming easier for the average person but there is a lot ot be done here.
**Scalability**. With projects like Ethereum 2.0 and IPFS more and more resources are a being put into this, both at the consensus/compute layer as well as networking (gossip, scalable Kademlia) layer. Also various layer 2 solutions for transactions.
**Fundamentals in place to support great user experience**. Similar to built for mass adoption. As scalability becomes more important, more applied research is being done in the p2p area to improve things like latency, bandwidth.
**Works for resource restricted devices, including smartphones**. Work in progress and not enough focus here, generally an after thought. Also have stateless clients etc.
**Adaptive nodes**. See above. With subprotocols and capabilities in Ethereum and libp2p, this is getting easier.
**Sustainable**. Token economics is a thing. While a lot of it won't stay around, there are many more projects working on making themselves dispensable. Being open source, having an engaged community and enabling users run their own infrastructure. Users as stakeholders.
**Spam resistant**. Tricky problem if you want to be pseudonymous, but some signs of hope with incentivization mechanisms, zero knowledge based signaling, etc. Together with various forms of rate limiting and better controlling of topology and network amplification. And just generally being battle-tested by real world attacks, such as historical Ethereum DDoS attacks.
**Trust minimized**. Bitcoin. Zero knowledge provable computation. Open source. Reproducible builds. Signed binaries. Incentive compatible structures. Independent audits. Still a lot of work, but getting better.
**Open source**. Big and only getting bigger. Including mainstream companies.
## What's next?
We've look at what WeChat provides and what we'd like an alternative to look like. We've also seen a few principal modules that are necessary to achieve those goals. To achieve all of this is a daunting task, and one might call it overly ambitiuous. We've also seen how far we've come with some of the goals, and how a lot of the pieces are there, in one form or another. Then it is a question of putting them all together in the right mix.
The good news is that a lot of people are working all these building blocks and thinking about these problems. Compared to a few years ago we've come quite far when it comes to p2p infrastructure, privacy, security, scalability, and general developer mass and mindshare. If you want to join us in building some of these building blocks, and assembling them, check out our forum.
PS. We are [hiring protocol engineers](https://status.im/our_team/open_positions.html). DS
## Acknowledgements
Corey, Dean, Jacek.
## References
[^1]: Mandatory SIM card registration laws: https://privacyinternational.org/long-read/3018/timeline-sim-card-registration-laws
[^2]: On WeChat keyword censorship: https://citizenlab.ca/2016/11/wechat-china-censorship-one-app-two-systems/
[^3]: Net Neutrality: https://www.eff.org/issues/net-neutrality
[^4]: ISP centralization: https://ilsr.org/repealing-net-neutrality-puts-177-million-americans-at-risk/
[^5]: Incentives Build Robustness in BitTorrent bittorrent.org/bittorrentecon.pdf
[^6]: The Case for Electronic Cash: https://coincenter.org/files/2019-02/the-case-for-electronic-cash-coin-center.pdf
[^7]: Money, blockchains, and social scalability: http://unenumerated.blogspot.com/2017/02/money-blockchains-and-social-scalability.html
[^8]: Zcash private transactions (partial paywall): https://www.theblockcrypto.com/genesis/48413/an-analysis-of-zcashs-private-transactions
[^9]: Shielded transactions usage (stats page 404s): https://z.cash/support/faq/

View File

@ -0,0 +1,82 @@
---
layout: post
name: "Feasibility Study: Discv5"
title: "Feasibility Study: Discv5"
date: 2020-04-27 12:00:00 +0800
author: dean
published: true
permalink: /feasibility-discv5
categories: research
summary: Looking at discv5 and the theoretical numbers behind finding peers.
discuss: https://discuss.status.im/t/discv5-feasibility-study/1632
---
> Disclaimer: some of the numbers found in this write-up could be inaccurate. They are based on the current understanding of theoretical parts of the protocol itself by the author and are meant to provide a rough overview rather than bindable numbers.
This post serves as a more authoritative overview of the discv5 study, for a discussionary post providing more context make sure to check out the corresponding [discuss post](https://discuss.status.im/t/discv5-feasibility-study/1632). Additionally, if you are unfamiliar with discv5, check out my previous write-up: ["From Kademlia to Discv5"](https://vac.dev/kademlia-to-discv5).
## Motivating Problem
The discovery method currently used by [Status](https://status.im), is made up of various components and grew over time to solve a mix of problems. We want to simplify this while maintaining some of the properties we currently have.
Namely, we want to ensure censorship resistance to state-level adversaries. One of the issues Status had which caused us them add to their discovery method was the fact that addresses from providers like AWS and GCP were blocked both in Russia and China. Additionally, one of the main factors required is the ability to function on resource restricted devices.
Considering we are talking about resource restricted devices, let's look at the implications and what we need to consider:
- **Battery consumption** - constant connections like websockets consume a lot of battery life.
- **CPU usage** - certain discovery methods may be CPU incentive, slowing an app down and making it unusable.
- **Bandwidth consumption** - a lot of users will be using data plans, the discovery method needs to be efficient in order to accommodate those users without using up significant portions of their data plans.
- **Short connection windows** - the discovery algorithm needs to be low latency, that means it needs to return results fast. This is because many users will only have the app open for a short amount of time.
- **Not publicly connectable** - There is a good chance that most resource restricted devices are not publicly connectable.
For a node to be able to participate as both a provider, and a consumer in the discovery method. Meaning a node both reads from other nodes' stored DHTs and hosts the DHT for other nodes to read from, it needs to be publically connectable. This means another node must be able to connect to some public IP of the given node.
With devices that are behind a NAT, this is easier said than done. Especially mobile devices, that when connected to 4G LTE networks are often stuck behind a symmetric NAT, drastically reducing the the succeess rate of NAT traversal. Keeping this in mind, it becomes obvious that most resource restricted devices will be consumers rather than providers due to this technical limitation.
In order to answer our questions, we formulated the problem with a simple method for testing. The "needle in a haystack" problem was formulated to figure out how easily a specific node can be found within a given network. This issue was fully formulated in [vacp2p/research#15](https://github.com/vacp2p/research/issues/15).
## Overview
The main things we wanted to investigate was the overhead on finding a peer. This means we wanted to look at both the bandwidth, latency and effectiveness of this. There are 2 methods which we can use to find a peer:
- We can find a peer with a specific ID, using normal lookup methods as documented by Kademlia.
- We can find a peer that advertises a capability, this is possible using either capabilities advertised in the ENR or through [topic tables](https://github.com/ethereum/devp2p/blob/master/discv5/discv5-theory.md#topic-advertisement).
## Feasbility
To be able to investigate the feasibility of discv5, we used various methods including rough calculations which can be found in the [notebook](https://vac.dev/discv5-notebook/), and a simulation isolated in [vacp2p/research#19](https://github.com/vacp2p/research/pull/19).
### CPU & Memory Usage
The experimental discv5 has already been used within Status, however what was noticed was that the CPU and memory usage was rather high. It therefore should be investiaged if this is still the case, and if it is, it should be isolated where this stems from. Additionally it is worth looking at whether or not this is the case with both the go and nim implementation.
See details: [vacp2p/research#31](https://github.com/vacp2p/research/issues/31)
### NAT on Cellular Data
If a peer is not publically connectable it can not participate in the DHT both ways. A lot of mobile phones are behind symmetric NATs which UDP hole-punching close to impossible. It should be investigated whether or not mobile phones will be able to participate both ways and if there are good methods for doing hole-punching.
See details: [vacp2p/research#29](https://github.com/vacp2p/research/issues/29)
### Topic Tables
Topic Tables allow us the ability to efficiently find nodes given a specific topic. However, they are not implemented in the [status-im/nim-eth](https://github.com/status-im/nim-eth/) implementation nor are they fully finalized in the spec. These are important if the network grows past a size where the concentration of specific nodes is relatively low making them hard to find.
See details: [vacp2p/research#26](https://github.com/vacp2p/research/issues/26)
### Finding a node
It is important to note, that given a network is relatively small sized, eg 100-500 nodes, then finding a node given a specific address is relatively managable. Additionally, if the concentration of a specific capability in a network is reasonable, then finding a node advertising its capabilities using an ENR rather than the topic table is also managable. A reasonable concentration for example would be 10%, which would give us an 80% chance of getting a node with that capability in the first lookup request. This can be explored more using our [discv5 notebook](https://vac.dev/discv5-notebook/#Needle-in-a-haystack-with-ENR-records-indicating-capabilities).
## Results
Research has shown that finding a node in the DHT has a relatively low effect on bandwidth, both inbound and outbound. For example when trying to find a node in a network of 100 nodes, it would take roughly 5668 bytes total. Additionally if we assume 100ms latency per request it would range at ≈ 300ms latency, translating to 3 requests to find a specific node.
## General Thoughts
One of the main blockers right now is figuring out what the CPU and memory usage of discv5 is on mobile phones, this is a large blocker as it affects one of the core problems for us. We need to consider whether discv5 is an upgrade as it allows us to simplify our current discovery process or if it is too much of an overhead for resource restricted devices. The topic table feature could largely enhance discovery however it is not yet implemented. Given that CPU and memory isn't too high, discv5 could probably be used as the other issues are more "features" than large scale issues. Implementing it would already reduce the ability for state level adversaries to censor our nodes.
## Acknowledgements
- Oskar Thoren
- Dmitry Shmatko
- Kim De Mey
- Corey Petty

View File

@ -0,0 +1,120 @@
---
layout: post
name: "From Kademlia to Discv5"
title: "From Kademlia to Discv5"
date: 2020-04-9 16:00:00 +0100
author: dean
published: true
permalink: /kademlia-to-discv5
categories: research
summary: A quick history of discovery in peer-to-peer networks, along with a look into discv4 and discv5, detailing what they are, how they work and where they differ.
---
If you've been working on Ethereum or adjacent technologies you've probably heard of [discv4](https://github.com/ethereum/devp2p/blob/master/discv4.md) or [discv5](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md). But what are they actually? How do they work and what makes them different? To answer these questions, we need to start at the beginning, so this post will assume that there is little knowledge on the subject so the post should be accessible for anyone.
## The Beginning
Let's start right at the beginning: the problem of discovery and organization of nodes in peer-to-peer networks.
Early P2P file sharing technologies, such as Napster, would share information about who holds what file using a single server. A node would connect to the central server and give it a list of the files it owns. Another node would then connect to that central server, find a node that has the file it is looking for and contact that node. This however was a flawed system -- it was vulnerable to attacks and left a single party open to lawsuits.
It became clear that another solution was needed, and after years of research and experimentation, we were given the distributed hash table or DHT.
## Distributed Hash Tables
In 2001 4 new protocols for such DHTs were conceived, [Tapestry](https://pdos.csail.mit.edu/~strib/docs/tapestry/tapestry_jsac03.pdf), [Chord](https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf), [CAN](https://people.eecs.berkeley.edu/~sylvia/papers/cans.pdf) and [Pastry](http://rowstron.azurewebsites.net/PAST/pastry.pdf), all of which made various trade-offs and changes in their core functionality, giving them unique characteristics.
But as said, they're all DHTs. So what is a DHT?
A distributed hash table (DHT) is essentially a distributed key-value list. Nodes participating in the DHT can easily retrieve the value for a key.
If we have a network with 9 key-value pairs and 3 nodes, ideally each node would store 3 (optimally 6 for redundancy) of those key-value pairs, meaning that if a key-value pair were to be updated, only part of the network would responsible for ensuring that it is. The idea is that any node in the network would know where to find the specific key-value pair it is looking for based on how things are distributed amongst the nodes.
## Kademlia
So now that we know what DHTs are, let's get to Kademlia, the predecessor of discv4. Kademlia was created by Petar Maymounkov and David Mazières in 2002. I will naively say that this is probably one of the most popular and most used DHT protocols. It's quite simple in how it works, so let's look at it.
In Kademlia, nodes and values are arranged by distance (in a very mathematical definition). This distance is not a geographical one, but rather based on identifiers. It is calculated how far 2 identifiers are from eachother using some distance function.
Kademlia uses an `XOR` as its distance function. An `XOR` is a function that outputs `true` only when inputs differ. Here is an example with some binary identifiers:
```
XOR 10011001
00110010
--------
10101011
```
The top in decimal numbers means that the distance between `153` and `50` is `171`.
There are several reasons why `XOR` was taken:
1. The distance from one ID to itself will be `0`.
2. Distance is symmetric, A to B is the same as B to A.
3. Follows triangle inequality, if `A`, `B` and `C` are points on a triangle then the distance `A` to `B` is closer or equal to that of `A` to `C` plus the one from `B` to `C`.
In summary, this distance function allows a node to decide what is "close" to it and make decisions based on that "closeness".
Kademlia nodes store a routing table. This table contains multiple lists. Each subsequent list contains nodes which are a little further distanced than the ones included in the previous list. Nodes maintain detailed knowledge about nodes closest to them, and the further away a node is, the less knowledge the node maintains about it.
So let's say I want to find a specific node. What I would do is go to any node which I already know and ask them for all their neighbours closest to my target. I repeat this process for the returned neighbours until I find my target.
The same thing happens for values. Values have a certain distance from nodes and their IDs are structured the same way so we can calculate this distance. If I want to find a value, I simply look for the neighbours closest to that value's key until I find the one storing said value.
For Kademlia nodes to support these functions, there are several messages with which the protocol communicates.
- `PING` - Used to check whether a node is still running.
- `STORE` - Stores a value with a given key on a node.
- `FINDNODE` - Returns the closest nodes requested to a given ID.
- `FINDVALUE` - The same as `FINDNODE`, except if a node stores the specific value it will return it directly.
*This is a **very** simplified explanation of Kademlia and skips various important details. For the full description, make sure to check out the [paper](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf) or a more in-depth [design specification](http://xlattice.sourceforge.net/components/protocol/kademlia/specs.html)*
## Discv4
Now after that history lesson, we finally get to discv4 (which stands for discovery v4), Ethereum's current node discovery protocol. The protocol itself is essentially based off of Kademlia, however it does away with certain aspects of it. For example, it does away with any usage of the value part of the DHT.
Kademlia is mainly used for the organisation of the network, so we only use the routing table to locate other nodes. Due to the fact that discv4 doesn't use the value portion of the DHT at all, we can throw away the `FINDVALUE` and `STORE` commands described by Kademlia.
The lookup method previously described by Kademlia describes how a node gets its peers. A node contacts some node and asks it for the nodes closest to itself. It does so until it can no longer find any new nodes.
Additionally, discv4 adds mutual endpoint verification. This is meant to ensure that a peer calling `FINDNODE` also participates in the discovery protocol.
Finally, all discv4 nodes are expected to maintain up-to-date ENR records. These contain information about a node. They can be requested from any node using a discv4-specific packet called `ENRRequest`.
*If you want some more details on ENRs, check out one of my posts ["Network Addresses in Ethereum"](https://dean.eigenmann.me/blog/2020/01/21/network-addresses-in-ethereum/)*
Discv4 comes with its own range of problems however. Let's look at a few of them.
Firstly, the way discv4 works right now, there is no way to differentiate between node sub-protocols. This means for example that an Ethereum node could add an Ethereum Classic Node, Swarm or Whisper node to its DHT without realizing that it is invalid until more communication has happened. This inability to differentiate sub-protocols makes it harder to find specific nodes, such as Ethereum nodes with light-client support.
Next, in order to prevent replay attacks, discv4 uses timestamps. This however can lead to various issues when a host's clock is wrong. For more details, see the ["Known Issues"](https://github.com/ethereum/devp2p/blob/master/discv4.md#known-issues-in-the-current-version) section of the discv4 specification.
Finally, we have an issue with the way mutual endpoint verification works. Messages can get dropped and there is no way to tell if both peers have verified eachother. This means that we could consider our peer verified while it does not consider us so making them drop the `FINDNODE` packet.
## Discv5
Finally, let's look at discv5. The next iteration of discv4 and the discovery protocol which will be used by Eth 2.0. It aims at fixing various issues present in discv4.
The first change is the way `FINDNODE` works. In traditional Kademlia as well as in discv5, we pass an identifier. However, in discv5 we instead pass the logarithmic distance, meaning that a `FINDNODE` request gets a response containing all nodes at the specified logarithmic distance from the called node.
Logarithmic distance means we first calculate the distance and then run it through our log base 2 function. See:
```
log2(A xor B)
```
And the second, more important change, is that discv5 aims at solving one of the biggest issues of discv4: the differentiation of sub-protocols. It does this by adding topic tables. Topic tables are [first in first out](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)) lists that contain nodes which have advertised that they provide a specific service. Nodes get themselves added to this list by registering `ads` on their peers.
As of writing, there is still an issue with this proposal. There is currently no efficient way for a node to place `ads` on multiple peers, since it would require separate requests for every peer which is inefficient in a large-scale network.
Additionally, it is unclear how many peers a node should place these `ads` on and exactly which peers to place them on. For more details, check out the issue [devp2p#136](https://github.com/ethereum/devp2p/issues/136).
There are a bunch more smaller changes to the protocol, but they are less important hence they were ommitted from this summary.
Nevertheless, discv5 still does not resolve a couple issues present in discv4, such as unreliable endpoint verification. As of writing this post, there is currently no new method in discv5 to improve the endpoint verification process.
As you can see discv5 is still a work in progress and has a few large challenges to overcome. However if it does, it will most likely be a large improvement to a more naive Kademlia implementations.
---
Hopefully this article helped explain what these discovery protocols are and how they work. If you're interested in their full specifications you can find them on [github](https://github.com/ethereum/devp2p).

View File

@ -0,0 +1,251 @@
---
layout: post
name: "What's the Plan for Waku v2?"
title: "What's the Plan for Waku v2?"
date: 2020-07-01 12:00:00 +0800
author: oskarth
published: true
permalink: /waku-v2-plan
categories: research
summary: Read about our plans for Waku v2, moving to libp2p, better routing, adaptive nodes and accounting!
image: /assets/img/status_scaling_model_fig4.png
discuss: https://forum.vac.dev/t/waku-version-2-pitch/52
---
**tldr: The Waku network is fragile and doesn't scale. Here's how to solve it.**
*NOTE: This post was originally written with Status as a primary use case in mind, which reflects how we talk about some problems here. However, Waku v2 is a general-purpose private p2p messaging protocol, especially for people running in resource restricted environments.*
# Problem
The Waku network is fragile and doesn't scale.
As [Status](https://status.im) is moving into a user-acquisition phase and is improving retention rates for users they need the infrastructure to keep up, specifically when it comes to messaging.
Based on user acquisition models, the initial goal is to support 100k DAU in September, with demand growing from there.
With the Status Scaling Model we have studied the current bottlenecks as a function of concurrent users (CCU) and daily active users (DAU). Here are the conclusions.
****1. Connection limits****. With 100 full nodes we reach ~10k CCU based on connection limits. This can primarily be addressed by increasing the number of nodes (cluster or user operated). This assumes node discovery works. It is also worth investigating the limitations of max number of connections, though this is likely to be less relevant for user-operated nodes. For a user-operated network, this means 1% of users have to run a full node. See Fig 1-2.
****2. Bandwidth as a bottleneck****. We notice that memory usage appears to not be
the primary bottleneck for full nodes, and the bottleneck is still bandwidth. To support 10k DAU, and full nodes with an amplification factor of 25 the required Internet speed is ~50 Mbps, which is a fast home Internet connection. For ~100k DAU only cloud-operated nodes can keep up (500 Mbps). See Fig 3-5.
****3. Amplification factors****. Reducing amplification factors with better routing, would have a high impact, but it is likely we'd need additional measures as well, such as topic sharding or similar. See Fig 8-13.
Figure 1-5:
![](assets/img/status_scaling_model_fig1.png)
![](assets/img/status_scaling_model_fig2.png)
![](assets/img/status_scaling_model_fig3.png)
![](assets/img/status_scaling_model_fig4.png)
![](assets/img/status_scaling_model_fig5.png)
See <https://colab.research.google.com/drive/1Fz-oxRxxAFPpM1Cowpnb0nT52V1-yeRu#scrollTo=Yc3417FUJJ_0> for the full report.
What we need to do is:
1. Reduce amplification factors
2. Get more user-run full nodes
Doing this means the Waku network will be able to scale, and doing so in the right way, in a robust fashion. What would a fragile way of scaling be? Increasing our reliance on a Status Pte Ltd operated cluster which would paint us in a corner where we:
- keep increasing requirements for Internet speed for full nodes
- are vulnerable to censorship and attacks
- have to control the topology in an artifical manner to keep up with load
- basically re-invent a traditional centralized client-server app with extra steps
- deliberately ignore most of our principles
- risk the network being shut down when we run out of cash
# Appetite
Our initial risk appetite for this is 6 weeks for a small team.
The idea is that we want to make tangible progress towards the goal in a limited period of time, as opposed to getting bogged down in trying to find a theoretically perfect generalized solution. Fixed time, variable scope.
It is likely some elements of a complete solution will be done separately. See later sections for that.
# Solution
There are two main parts of the solution. One is to reduce amplification factors, and the other is incentivization to get more user run full nodes with desktop, etc.
What does a full node provide? It provides connectivity to the network, can act as a bandwidth "barrier" and be high or reasonably high availability. What this means right now is essentially topic interest and storing historical messages.
The goal is here to improve the status quo, not get a perfect solution from the get go. All of this can be iterated on further, for stronger guarantees, as well as replaced by other new modules.
Let's first look at the baseline, and then go into some of the tracks and their phases. Track 1 is best done first, after which track 2 and 3 can be executed in parallel. Track 1 gives us more options for track 2 and 3. The work in track 1 is currently more well-defined, so it is likely the specifics of track 2 and 3 will get refined at a later stage.
## Baseline
Here's where we are at now. In reality, the amplification factor are likely even worse than this (15 in the graph below), up to 20-30. Especially with an open network, where we can't easily control connectivity and availability of nodes. Left unchecked, with a full mesh, it could even go as high x100, though this is likely excessive and can be dialed down. See scaling model for more details.
![](assets/img/waku_v1_routing_small.png)
## Track 1 - Move to libp2p
Moving to PubSub over libp2p wouldn't improve amplification per se, but it would be stepping stone. Why? It paves the way for GossipSub, and would be a checkpoint on this journey. Additionally, FloodSub and GossipSub are compatible, and very likely other future forms of PubSub such as GossipSub 1.1 (hardened/more secure), EpiSub, forwarding Kademlia / PubSub over Kademlia, etc. Not to mention security This would also give us access to the larger libp2p ecosystem (multiple protocols, better encryption, quic, running in the browser, security audits, etc, etc), as well as be a joint piece of infrastructured used for Eth2 in Nimbus. More wood behind fewer arrows.
See more on libp2p PubSub here: <https://docs.libp2p.io/concepts/publish-subscribe/>
As part of this move, there are a few individual pieces that are needed.
### 1. FloodSub
This is essentially what Waku over libp2p would look like in its most basic form.
One difference that is worth noting is that the app topics would **not** be the same as Waku topics. Why? In Waku we currently don't use topics for routing between full nodes, but only for edge/light nodes in the form of topic interest. In FloodSub, these topics are used for routing.
Why can't we use Waku topics for routing directly? PubSub over libp2p isn't built for rare and ephemeral topics, and nodes have to explicitly subscribe to a topic. See topic sharding section for more on this.
![](assets/img/waku_v2_routing_flood_small.png)
Moving to FloodSub over libp2p would also be an opportunity to clean up and simplify some components that are no longer needed in the Waku v1 protocol, see point below.
Very experimental and incomplete libp2p support can be found in the nim-waku repo under v2: <https://github.com/status-im/nim-waku>
### 2. Simplify the protocol
Due to Waku's origins in Whisper, devp2p and as a standalone protocol, there are a lot of stuff that has accumulated (<https://specs.vac.dev/specs/waku/waku.html>). Not all of it serves it purpose anymore. For example, do we still need RLP here when we have Protobuf messages? What about extremely low PoW when we have peer scoring? What about key management / encryption when have encryption at libp2p and Status protocol level?
Not everything has to be done in one go, but being minimalist at this stage will the protocol lean and make us more adaptable.
The essential characteristic that has to be maintained is that we don't need to change the upper layers, i.e. we still deal with (Waku) topics and some envelope like data unit.
### 3. Core integration
As early as possible we want to integrate with Core via Stimbus in order to mitigate risk and catch integration issues early in the process. What this looks like in practice is some set of APIs, similar to how Whisper and Waku were working in parallel, and experimental feature behind a toggle in core/desktop.
### 4. Topic interest behavior
While we target full node traffic here, we want to make sure we maintain the existing bandwidth requirements for light nodes that Waku v1 addressed (<https://vac.dev/fixing-whisper-with-waku>). This means implementing topic-interest in the form of Waku topics. Note that this would be separate from app topics notes above.
### 5. Historical message caching
Basically what mailservers are currently doing. This likely looks slightly different in a libp2p world. This is another opportunity to simplify things with a basic REQ-RESP architecture, as opposed to the roundabout way things are now. Again, not everything has to be done in one go but there's no reason to reimplement a poor API if we don't have to.
Also see section below on adaptive nodes and capabilities.
### 6. Waku v1 <> Libp2p bridge
To make the transition complete, there has to a be bridge mode between current Waku and libp2p. This is similar to what was done for Whisper and Waku, and allows any nodes in the network to upgrade to Waku v2 at their leisure. For example, this would likely look different for Core, Desktop, Research and developers.
## Track 2 - Better routing
This is where we improve the amplification factors.
### 1. GossipSub
This is a subprotocol of FloodSub in the libp2p world. Moving to GossipSub would allow traffic between full nodes to go from an amplification factor of ~25 to ~6. This basically creates a mesh of stable bidirectional connections, together with some gossiping capabilities outside of this view.
Explaining how GossipSub works is out of scope of this document. It is implemented in nim-libp2p and used by Nimbus as part of Eth2. You can read the specs here in more detail if you are interested: <https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md> and <https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md>
![](assets/img/waku_v2_routing_gossip_small.png)
![](assets/img/status_scaling_model_fig8.png)
![](assets/img/status_scaling_model_fig9.png)
![](assets/img/status_scaling_model_fig10.png)
![](assets/img/status_scaling_model_fig11.png)
While we technically could implement this over existing Waku, we'd have to re-implement it, and we'd lose out on all the other benefits libp2p would provide, as well as the ecosystem of people and projects working on improving the scalability and security of these protocols.
### 2. Topic sharding
This one is slightly more speculative in terms of its ultimate impact. The basic idea is to split the application topic into N shards, say 10, and then each full node can choose which shards to listen to. This can reduce amplification factors by another factor of 10.
![](assets/img/waku_v2_routing_sharding_small.png)
![](assets/img/status_scaling_model_fig12.png)
![](assets/img/status_scaling_model_fig13.png)
Note that this means a light node that listens to several topics would have to be connected to more full nodes to get connectivity. For a more exotic version of this, see <https://forum.vac.dev/t/rfc-topic-propagation-extension-to-libp2p-pubsub/47>
This is orthogonal from the choice of FloodSub or GossipSub, but due to GossipSub's more dynamic nature it is likely best combined with it.
### 3. Other factors
Not a primary focus, but worth a look. Looking at the scaling model, there might be other easy wins to improve overall bandwidth consumption between full nodes. For example, can we reduce envelope size by a significant factor?
## Track 3 - Accounting and user-run nodes
This is where we make sure the network isn't fragile, become a true p2p app, get our users excited and engaged, and allow us to scale the network without creating an even bigger cluster.
To work in practice, this has a soft dependency on node discovery such as DNS based discovery (<https://eips.ethereum.org/EIPS/eip-1459>) or Discovery v5 (<https://vac.dev/feasibility-discv5>).
### 1. Adaptive nodes and capabilities
We want to make the gradation between light nodes, full nodes, storing (partial set of) historical messages, only acting for a specific shard, etc more flexible and explicit. This is required to identify and discover the nodes you want. See <https://github.com/vacp2p/specs/issues/87>
Depending on how the other tracks come together, this design should allow for a desktop node to identify as a full relaying node for some some app topic shard, but also express waku topic interest and retrieve historical messages itself.
E.g. Disc v5 can be used to supply node properties through ENR.
### 2. Accounting
This is based on a few principles:
1. Some nodes contribute a lot more than other nodes in the network
2. We can account for the difference in contribution in some fashion
3. We want to incentivize nodes to tell the true, and be incentivized not to lie
Accounting here is a stepping stone, where accounting is the raw data upon which some settlement later occurs. It can have various forms of granularity. See <https://forum.vac.dev/t/accounting-for-resources-in-waku-and-beyond/31> for discussion.
We also note that in GossipSub, the mesh is bidrectional. Additionally, it doesn't appears to be a high priority issue in terms of nodes misreporting. What is an issue is having people run full nodes in the first place. There are a few points to that. It has to be possible in the end-user UX, nodes have to be discovered, and it has to be profitable/visible that you are contributing. UX and discovery are out of scope for this work, whereas visibility/accounting is part of this scope. Settlement is a stretch goal here.
The general shape of the solution is inspired by the Swarm model, where we do accounting separate from settlement. It doesn't require any specific proofs, but nodes are incentivized to tell the truth in the following way:
1. Both full node and light node do accounting in a pairwise, local fashion
2. If a light node doesn't ultimately pay or lie about reporting, they get disconnected (e.g.)
3. If a full node doesn't provide its service the light node may pick another full node (e.g.)
While accounting for individual resource usage is useful, for the ultimate end user experience we can ideally account for other things such as:
- end to end delivery
- online time
- completeness of storage
This can be gradually enhanced and strengthened, for example with proofs, consistency checks, Quality of Service, reputation systems. See <https://discuss.status.im/t/network-incentivisation-first-draft/1037> for one attempt to provide stronger guarantees with periodic consistency checks and a shared fund mechanism. And <https://forum.vac.dev/t/incentivized-messaging-using-validity-proofs/51> for using validity proofs and removing liveness requirement for settlement.
All of this is optional at this stage, because our goal here is to improve the status quo for user run nodes. Accounting at this stage should be visible and correspond to the net benefit a node provides to another.
As a concrete example: a light node has some topic interest and cares about historical messages on some topic. A full node communicates envelopes as they come in, communicates their high availability (online time) and stores/forward stored messages. Both nodes have this information, and if they agree settlement (initially just a mock message) can be sending a payment to an address at some time interval / over some defined volume. See future sections for how this can be improved upon.
Also see below in section 4, using constructs such as eigentrust as a local reputation mechanism.
### 3. Relax high availability requirement
If we want desktop nodes to participate in the storing of historical messages, high availability is a problem. It is a problem for any node, especially if they lie about it, but assuming they are honest it is still an issue.
By being connected to multiple nodes, we can get an overlapping online window. Then these can be combined together to get consistency. This is obviously experimental and would need to be tested before being deployed, but if it works it'd be very useful.
Additionally or alternatively, instead of putting a high requirement on message availability, focus on detection of missing information. This likely requires re-thinking how we do data sync / replication.
### 4. Incentivize light and full nodes to tell the truth (policy, etc)
In accounting phase it is largely assumed nodes are honest. What happens when they lie, and how do we incentivize them to be honest? In the case of Bittorrent this is done with tit-for-tat, however this is a different kind of relationship. What follows are some examples of how this can be done.
For light nodes:
- if they don't, they get disconnected
- prepayment (especially to "high value" nodes)
For full nodes:
- multiple nodes reporting to agree, where truth becomes a shelling point
- use eigentrust
- staking for discovery visibility with slashing
### 5. Settlement PoC
Can be done after phase 2 if so desired. Basically integrate payments based on accounting and policy.
# Out of scope
1. We assume the Status Base model requirements are accurate.
2. We assume Core will improve retention rates.
3. We assume the Stimbus production team will enable integration of nim-waku.
4. We assume Discovery mechanisms such as DNS and Discovery v5 will be worked on separately.
5. We assume Core will, at some point, provide an UX for integrating payment of services.
6. We assume the desktop client is sufficiently usable.
7. We assume Core and Infra will investigate ways of improving MaxPeers.

View File

@ -0,0 +1,107 @@
---
layout: post
name: "Waku v2 Update"
title: "Waku v2 Update"
date: 2020-09-28 12:00:00 +0800
author: oskarth
published: true
permalink: /waku-v2-update
categories: research
summary: A research log. Read on to find out what is going on with Waku v2, a messaging protocol. What has been happening? What is coming up next?
image: /assets/img/vac.png
discuss: https://forum.vac.dev/t/discussion-waku-v2-update/56
---
It has been a while since the last post. It is time for an update on Waku v2. Aside from getting more familiar with libp2p (specifically nim-libp2p) and some vacation, what have we been up to? In this post we'll talk about what we've gotten done since last time, and briefly talk about immediate next steps and future. But first, a recap.
# Recap
In the last post ([Waku v2 plan](https://vac.dev/waku-v2-plan)) we explained the rationale of Waku v2 - the current Waku network is fragile and doesn't scale. To solve this, Waku v2 aims to reduce amplification factors and get more user run nodes. We broke the work down into three separate tracks.
1. Track 1 - Move to libp2p
2. Track 2 - Better routing
3. Track 3 - Accounting and user-run nodes
As well as various rough components for each track. The primary initial focus is track 1. This means things like: moving to FloodSub, simplify the protocol, core integration, topic interest behavior, historical message caching, and Waku v1<>v2 bridge.
# Current state
Let's talk about the state of specs and our main implementation nim-waku. Then we'll go over our recent testnet, Nangang, and finish off with a Web PoC.
## Specs
After some back and forth on how to best structure things, we ended up breaking down the specs into a few pieces. While Waku v2 is best thought of as a cohesive whole in terms of its capabilities, it is made up of several protocols. Here's a list of the current specs and their status:
- [Main spec](https://specs.vac.dev/specs/waku/v2/waku-v2.html) (draft)
- [Relay protocol spec](https://specs.vac.dev/specs/waku/v2/waku-relay.html) (draft)
- [Filter protocol spec](https://specs.vac.dev/specs/waku/v2/waku-filter.html) (raw)
- [Store protocol spec](https://specs.vac.dev/specs/waku/v2/waku-store.html) (raw)
- [Bridge spec](https://specs.vac.dev/specs/waku/v2/waku-bridge.html) (raw)
Raw means there is not yet an implementation that corresponds fully to the spec, and draft means there is an implementation that corresponds to the spec. In the interest of space, we won't go into too much detail on the specs here except to note a few things:
- The relay spec is essentially a thin wrapper on top of PubSub/FloodSub/GossipSub
- The filter protocol corresponds to previous light client mode in Waku v1
- The store protocol corresponds to the previous mailserver construct in Waku v1
The filter and store protocol allow for adaptive nodes, i.e. nodes that have various capabilities. For example, a node being mostly offline, or having limited bandwidth capacity. The bridge spec outlines how to bridge the Waku v1 and v2 networks.
## Implementation
The main implementation we are working on is [nim-waku](https://github.com/status-im/nim-waku/). This builds on top of libraries such as [nim-libp2p](https://github.com/status-im/nim-libp2p) and others that the [Nimbus team](https://nimbus.team/) have been working on as part of their Ethereum 2.0 client.
Currently nim-waku implements the relay protocol, and is close to implementing filter and store protocol. It also exposes a [Nim Node API](https://github.com/status-im/nim-waku/blob/master/docs/api/v2/node.md) that allows libraries such as [nim-status](https://github.com/status-im/status-nim) to use it. Additionally, there is also a rudimentary JSON RPC API for command line scripting.
## Nangang testnet
Last week we launched a very rudimentary internal testnet called Nangang. The goal was to test basic connectivity and make sure things work end to end. It didn't have things like: client integration, encryption, bridging, multiple clients, store/filter protocol, or even a real interface. What it did do is allow Waku developers to "chat" via RPC calls and looking in the log output. Doing this meant we exposed and fixed a few blockers, such as connection issues, deployment, topic subscription management, protocol and node integration, and basic scripting/API usage. After this, we felt confident enough to upgrade the main and relay spec to "draft" status.
## Waku Web PoC
As a bonus, we wanted to see what it'd take to get Waku running in a browser. This is a very powerful capability that enables a lot of use cases, and something that libp2p enables with its multiple transport support.
Using the current stack, with nim-waku, would require quite a lot of ground work with WASM, WebRTC, Websockets support etc. Instead, we decided to take a shortcut and hack together a JS implementation called [Waku Web Chat](https://github.com/vacp2p/waku-web-chat/). This quick hack wouldn't be possible without the people behind [js-libp2p-examples](https://github.com/libp2p/js-libp2p-examples/) and [js-libp2p](https://github.com/libp2p/js-libp2p) and all its libraries. These are people like Jacob Heun, Vasco Santos, and Cayman Nava. Thanks!
It consists of a brower implementation, a NodeJS implementation and a bootstrap server that acts as a signaling server for WebRTC. It is largely a bastardized version of GossipSub, and while it isn't completely to spec, it does allow messages originating from a browser to eventually end up at a nim-waku node, and vice versa. Which is pretty cool.
# Coming up
Now that we know what the current state is, what is still missing? what are the next steps?
## Things that are missing
While we are getting closer to closing out work for track 1, there are still a few things missing from the initial scope:
1) Store and filter protocols need to be finished. This means basic spec, implementation, API integration and proven to work in a testnet. All of these are work in progress and expected to be done very soon. Once the store protocol is done in a basic form, it needs further improvements to make it production ready, at least on a spec/basic implementation level.
2) Core integration was mentioned in scope for track 1 initially. This work has stalled a bit, largely due to organizational bandwidth and priorities. While there is a Nim Node API that in theory is ready to be used, having it be used in e.g. Status desktop or mobile app is a different matter. The team responsible for this at Status ([status-nim](github.com/status-im/status-nim)) has been making progress on getting nim-waku v1 integrated, and is expected to look into nim-waku v2 integration soon. One thing that makes this a especially tricky is the difference in interface between Waku v1 and v2, which brings
us too...
3) Companion spec for encryption. As part of simplifying the protocol, the routing is decoupled from the encryption in v2 ([1](https://github.com/vacp2p/specs/issues/158), [2](https://github.com/vacp2p/specs/issues/181)). There are multiple layers of encryption at play here, and we need to figure out a design that makes sense for various use cases (dapps using Waku on their own, Status app, etc).
4) Bridge implementation. The spec is done and we know how it should work, but it needs to be implemented.
5) General tightening up of specs and implementation.
While this might seem like a lot, a lot has been done already, and the majority of the remaining tasks are more amendable to be pursued in parallel with other efforts. It is also worth mentioning that part of track 2 and 3 have been started, in the form of moving to GossipSub (amplification factors) and basics of adaptive nodes (multiple protocols). This is in addition to things like Waku Web which were not part of the initial scope.
## Upcoming
Aside from the things mentioned above, what is coming up next? There are a few areas of interest, mentioned in no particular order. For track 2 and 3, see previous post for more details.
1) Better routing (track 2). While we are already building on top of GossipSub, we still need to explore things like topic sharding in more detail to further reduce amplification factors.
2) Accounting and user-run nodes (track 3). With store and filter protocol getting ready, we can start to implement accounting and light connection game for incentivization in a bottom up and iterative manner.
3) Privacy research. Study better and more rigorous privacy guarantees. E.g. how FloodSub/GossipSub behaves for common threat models, and how custom packet
format can improve things like unlinkability.
4) zkSnarks RLN for spam protection and incentivization. We studied this [last year](https://vac.dev/feasibility-semaphore-rate-limiting-zksnarks) and recent developments have made this relevant to study again. Create an [experimental spec/PoC](https://github.com/vacp2p/specs/issues/189) as an extension to the relay protocol. Kudos to Barry Whitehat and others like Kobi Gurkan and Koh Wei Jie for pushing this!
5) Ethereum M2M messaging. Being able to run in the browser opens up a lot of doors, and there is an opportunity here to enable things like a decentralized WalletConnect, multi-sig transactions, voting and similar use cases. This was the original goal of Whisper, and we'd like to deliver on that.
As you can tell, quite a lot of thing! Luckily, we have two people joining as protocol engineers soon, which will bring much needed support for the current team of ~2-2.5 people. More details to come in further updates.
---
If you are feeling adventurous and want to use early stage alpha software, check out the [docs](https://github.com/status-im/nim-waku/tree/master/docs). If you want to read the specs, head over to [Waku spec](https://specs.vac.dev/specs/waku/). If you want to talk with us, join us on [Status](https://get.status.im/chat/public/vac) or on [Telegram](https://t.me/vacp2p) (they are bridged).

View File

@ -0,0 +1,189 @@
---
layout: post
name: "[Talk] Vac, Waku v2 and Ethereum Messaging"
title: "[Talk] Vac, Waku v2 and Ethereum Messaging"
date: 2020-11-10 12:00:00 +0800
author: oskarth
published: true
permalink: /waku-v2-ethereum-messaging
categories: research
summary: Talk from Taipei Ethereum Meetup. Read on to find out about our journey from Whisper to Waku v2, as well as how Waku v2 can be useful for Etherum Messaging.
image: /assets/img/taipei_ethereum_meetup_slide.png
discuss: https://forum.vac.dev/t/discussion-talk-vac-waku-v2-and-ethereum-messaging/60
---
*The following post is a transcript of the talk given at the [Taipei Ethereum meetup, November 5](https://www.meetup.com/Taipei-Ethereum-Meetup/events/274033344/). There is also a [video recording]( https://www.youtube.com/watch?v=lUDy1MoeYnI).*
---
## 0. Introduction
Hi! My name is Oskar and I'm the protocol research lead at Vac. This talk will be divided into two parts. First I'll talk about the journey from Whisper, to Waku v1 and now to Waku v2. Then I'll talk about messaging in Ethereum. After this talk, you should have an idea of what Waku v2 is, the problems it is trying to solve, as well as where it can be useful for messaging in Ethereum.
## PART 1 - VAC AND THE JOURNEY FROM WHISPER TO WAKU V1 TO WAKU V2
## 1. Vac intro
First, what is Vac? Vac grew out of our efforts Status to create a window on to Ethereum and secure messenger. Vac is modular protocol stack for p2p secure messaging, paying special attention to resource restricted devices, privacy and censorship resistance.
Today we are going to talk mainly about Waku v2, which is the transport privacy / routing aspect of the Vac protocol stack. It sits "above" the p2p overlay, such as libp2p dealing with transports etc, and below a conversational security layer dealing with messaging encryption, such as using Double Ratchet etc.
## 2. Whisper to Waku v1
In the beginning, there was Whisper. Whisper was part of the holy trinity of Ethereum. You had Ethereum for consensus/computation, Whisper for messaging, and Swarm for storage.
However, for various reasons, Whisper didn't get the attention it deserved. Development dwindled, it promised too much and it suffered from many issues, such as being extremely inefficient and not being suitable for running on e.g. mobile phone. Despite this, Status used it in its app from around 2017 to 2019. As far as I know, it was one of very few, if not the only, production uses of Whisper.
In an effort to solve some of its immediate problems, we forked Whisper into Waku and formalized it with a proper specification. This solved immediate bandwidth issues for light nodes, introduced rate limiting for better spam protection, improved historical message support, etc.
If you are interested in this journey, checkout the [EthCC talk Dean and I gave in Paris earlier this year](https://www.youtube.com/watch?v=6lLT33tsJjs).
Status upgraded to Waku v1 early 2020. What next?
## 3. Waku v1 to v2
We were far from done. The changes we had made were quite incremental and done in order to get tangible improvements as quickly as possible. This meant we couldn't address more fundamental issues related to full node routing scalability, running with libp2p for more transports, better security, better spam protection and incentivization.
This kickstarted Waku v2 efforts, which is what we've been working on since July. This work was and is initally centered around a few pieces:
(a) Moving to libp2p
(b) Better routing
(c) Accounting and user-run nodes
The general theme was: making the Waku network more scalable and robust.
We also did a scalability study to show at what point the network would run into issues, due to the inherent lack of routing that Whisper and Waku v1 provided.
You can read more about this [here](https://vac.dev/waku-v2-plan).
## 3.5 Waku v2 - Design goals
Taking a step back, what problem does Waku v2 attempt to solve compared to all the other solutions that exists out there? What type of applications should use it and why? We have the following design goals:
1. **Generalized messaging**. Many applications requires some form of messaging protocol to communicate between different subsystems or different nodes. This messaging can be human-to-human or machine-to-machine or a mix.
2. **Peer-to-peer**. These applications sometimes have requirements that make them suitable for peer-to-peer solutions.
3. **Resource restricted**. These applications often run in constrained environments, where resources or the environment is restricted in some fashion. E.g.:
- limited bandwidth, CPU, memory, disk, battery, etc
- not being publicly connectable
- only being intermittently connected; mostly-offline
4. **Privacy**. These applications have a desire for some privacy guarantees, such as pseudonymity, metadata protection in transit, etc.
As well as to do so in a modular fashion. Meaning you can find a reasonable trade-off depending on your exact requirements. For example, you usually have to trade off some bandwidth to get metadata protection, and vice versa.
The concept of designing for resource restricted devices also leads to the concept of adaptive nodes, where you have more of a continuum between full nodes and light nodes. For example, if you switch your phone from mobile data to WiFi you might be able to handle more bandwidth, and so on.
## 4. Waku v2 - Breakdown
Where is Waku v2 at now, and how is it structured?
It is running over libp2p and we had our second internal testnet last week or so. As a side note, we name our testnets after subway stations in Taipei, the first one being Nangang, and the most recent one being Dingpu.
The main implementation is written in Nim using nim-libp2p, which is also powering Nimbus, an Ethereum 2 client. There is also a PoC for running Waku v2 in the browser. On a spec level, we have the following specifications that corresponds to the components that make up Waku v2:
- Waku v2 - this is the main spec that explains the goals of providing generalized messaging, in a p2p context, with a focus on privacy and running on resources restricted devices.
- Relay - this is the main PubSub spec that provides better routing. It builds on top of GossipSub, which is what Eth2 heavily relies on as well.
- Store - this is a 1-1 protocol for light nodes to get historical messages, if they are mostly-offline.
- Filter - this is a 1-1 protocol for light nodes that are bandwidth restricted to only (or mostly) get messages they care about.
- Message - this explains the payload, to get some basic encryption and content topics. It corresponds roughly to envelopes in Whisper/Waku v1.
- Bridge - this explains how to do bridging between Waku v1 and Waku v2 for compatibility.
Right now, all protocols, with the exception of bridge, are in draft mode, meaning they have been implemented but are not yet being relied upon in production.
You can read more about the breakdown in this [update](https://vac.dev/waku-v2-update) though some progress has been made since then, as well was in the [main Waku v2 spec](https://specs.vac.dev/specs/waku/v2/waku-v2.html).
## 5. Waku v2 - Upcoming
What's coming up next? There are a few things.
For Status to use it in production, it needs to be integrated into the main app using the Nim Node API. The bridge also needs to be implemented and tested.
For other users, we are currently overhauling the API to allow usage from a browser, e.g. To make this experience great, there are also a few underlying infrastructure things that we need in nim-libp2p, such as a more secure HTTP server in Nim, Websockets and WebRTC support.
There are also some changes we made to at what level content encryption happens, and this needs to be made easier to use in the API. This means you can use a node without giving your keys to it, which is useful in some environments.
More generally, beyond getting to production-ready use, there are a few bigger pieces that we are working on or will work on soon. These are things like:
- Better scaling, by using topic sharding.
- Accounting and user-run nodes, to account for and incentives full nodes.
- Stronger and more rigorous privacy guarantees, e.g. through study of GossipSub, unlinkable packet formats, etc.
- Rate Limit Nullifier for privacy preserving spam protection, a la what Barry Whitehat has presented before.
As well as better support for Ethereum M2M Messaging. Which is what I'll talk about next.
## PART 2 - ETHEREUM MESSAGING
A lot of what follows is inspired by exploratory work that John Lea has done at Status, previously Head of UX Architecture at Ubuntu.
## 6. Ethereum Messaging - Why?
It is easy to think that Waku v2 is only for human to human messaging, since that's how Waku is currently primarily used in the Status app. However, the goal is to be useful for generalized messaging, which includes other type of information as well as machine to machine messaging.
What is Ethereum M2M messaging? Going back to the Holy Trinity of Ethereum/Whisper/Swarm, the messaging component was seen as something that could facilitate messages between dapps and acts as a building block. This can help with things such as:
- Reducing on-chain transactions
- Reduce latency for operations
- Decentralize centrally coordinated services (like WalletConnect)
- Improve UX of dapps
- Broadcast live information
- A message transport layer for state channels
And so on.
## 7. Ethereum Messaging - Why? (Cont)
What are some examples of practical things Waku as used for Ethereum Messaging could solve?
- Multisig transfers only needing one on chain transaction
- DAO votes only needing one one chain transaction
- Giving dapps ability to direct push notifications to users
- Giving users ability to directly respond to requests from daps
- Decentralized Wallet Connect
Etc.
## 8. What's needed to deliver this?
We can break it down into our actors:
- Decentralized M2M messaging system (Waku)
- Native wallets (Argent, Metamask, Status, etc)
- Dapps that benefit from M2M messaging
- Users whose problems are being solved
Each of these has a bunch of requirements in turn. The messaging system needs to be decentralized, scalable, robust, etc. Wallets need support for messaging layer, dapps need to integrate this, etc.
This is a lot! Growing adoption is a challenge. There is a catch 22 in terms of justifying development efforts for wallets, when no dapps need it, and likewise for dapps when no wallets support Waku. In addition to this, there must be proven usage of Waku before it can be relied on, etc. How can we break this up into smaller pieces of work?
## 9. Breaking up the problem and a high level roadmap
We can start small. It doesn't and need to be used for critical features first. A more hybrid approach can be taken where it acts more as nice-to-haves.
1. Forking Whisper and solving scalablity, spam etc issues with it.
This is a work in progress. What we talked about in part 1.
2. Expose messaging API for Dapp developers.
3. Implement decentralized version of WalletConnect.
Currently wallets connect ot dapps with centralized service. Great UX.
4. Solve DAO/Multi-Sig coordination problem.
E.g. send message to wallet-derived key when it is time to sign a transaction.
5. Extend dapp-to-user and user-to-dapp communication to more dapps.
Use lessons learned and examples to drive adoptation for wallets/dapps.
And then build up from there.
## 10. We are hiring!
A lot of this will happen in Javascript and browsers, since that's the primarily environment for a lot of wallets and dapps. We are currently hiring for a Waku JS Wallet integration lead to help push this effort further.
Come talk to me after or [apply here](https://status.im/our_team/open_positions.html?gh_jid=2385338).
That's it! You can find us on Status, Telegram, vac.dev. I'm on twitter [here](https://twitter.com/oskarth).
Questions?
---

View File

@ -0,0 +1,188 @@
---
layout: post
name: "Privacy-preserving p2p economic spam protection in Waku v2"
title: "Privacy-preserving p2p economic spam protection in Waku v2"
date: 2021-03-05 12:00:00 +0800
author: sanaztaheri
published: true
permalink: /rln-relay
categories: research
summary: This post is going to give you an overview of how spam protection can be achieved in Waku Relay through rate-limiting nullifiers. We will cover a summary of spam-protection methods in centralized and p2p systems, and the solution overview and details of the economic spam-protection method. The open issues and future steps are discussed in the end.
image: /assets/img/rain.png
discuss: https://forum.vac.dev/t/privacy-preserving-p2p-economic-spam-protection-in-waku-v2-with-rate-limiting-nullfiers/66
---
# Introduction
This post is going to give you an overview of how spam protection can be achieved in Waku Relay protocol[^2] through Rate-Limiting Nullifiers[^3] [^4] or RLN for short.
Let me give a little background about Waku(v2)[^1]. Waku is a privacy-preserving peer-to-peer (p2p) messaging protocol for resource-restricted devices. Being p2p means that Waku relies on **No** central server. Instead, peers collaboratively deliver messages in the network. Waku uses GossipSub[^16] as the underlying routing protocol (as of the writeup of this post). At a high level, GossipSub is based on publisher-subscriber architecture. That is, *peers, congregate around topics they are interested in and can send messages to topics. Each message gets delivered to all peers subscribed to the topic*. In GossipSub, a peer has a constant number of direct connections/neighbors. In order to publish a message, the author forwards its message to a subset of neighbors. The neighbors proceed similarly till the message gets propagated in the network of the subscribed peers. The message publishing and routing procedures are part of the Waku Relay[^17] protocol.
<p align="center">
<img src="../assets/img/rln-relay/rln-relay-overview.png" width="200%" />
<br />
Figure 1: An overview of privacy-preserving p2p economic spam protection in Waku v2 RLN-Relay protocol.
</p>
## What do we mean by spamming?
In centralized messaging systems, a spammer usually indicates an entity that uses the messaging system to send an unsolicited message (spam) to large numbers of recipients. However, in Waku with a p2p architecture, spam messages not only affect the recipients but also all the other peers involved in the routing process as they have to spend their computational power/bandwidth/storage capacity on processing spam messages. As such, we define a spammer as an entity that uses the messaging system to publish a large number of messages in a short amount of time. The messages issued in this way are called spam. In this definition, we disregard the intention of the spammer as well as the content of the message and the number of recipients.
## Possible Solutions
Has the spamming issue been addressed before? Of course yes! Here is an overview of the spam protection techniques with their trade-offs and use-cases. In this overview, we distinguish between protection techniques that are targeted for centralized messaging systems and those for p2p architectures.
### Centralized Messaging Systems
In traditional centralized messaging systems, spam usually signifies unsolicited messages sent in bulk or messages with malicious content like malware. Protection mechanisms include
- authentication through some piece of personally identifiable information e.g., phone number
- checksum-based filtering to protect against messages sent in bulk
- challenge-response systems
- content filtering on the server or via a proxy application
These methods exploit the fact that the messaging system is centralized and a global view of the users' activities is available based on which spamming patterns can be extracted and defeated accordingly. Moreover, users are associated with an identifier e.g., a username which enables the server to profile each user e.g., to detect suspicious behavior like spamming. Such profiling possibility is against the user's anonymity and privacy.
Among the techniques enumerated above, authentication through phone numbers is a some-what economic-incentive measure as providing multiple valid phone numbers will be expensive for the attacker. Notice that while using an expensive authentication method can reduce the number of accounts owned by a single spammer, cannot address the spam issue entirely. This is because the spammer can still send bulk messages through one single account. For this approach to be effective, a centralized mediator is essential. That is why such a solution would not fit the p2p environments where no centralized control exists.
### P2P Systems
What about spam prevention in p2p messaging platforms? There are two techniques, namely *Proof of Work*[^8] deployed by Whisper[^9] and *Peer scoring*[^6] method (namely reputation-based approach) adopted by LibP2P. However, each of these solutions has its own shortcomings for real-life use-cases as explained below.
#### Proof of work
The idea behind the Proof Of Work i.e., POW[^8] is to make messaging a computationally costly operation hence lowering the messaging rate of **all** the peers including the spammers. In specific, the message publisher has to solve a puzzle and the puzzle is to find a nonce such that the hash of the message concatenated with the nonce has at least z leading zeros. z is known as the difficulty of the puzzle. Since the hash function is one-way, peers have to brute-force to find a nonce. Hashing is a computationally-heavy operation so is the brute-force. While solving the puzzle is computationally expensive, it is comparatively cheap to verify the solution.
POW is also used as the underlying mining algorithm in Ethereum and Bitcoin blockchain. There, the goal is to contain the mining speed and allow the decentralized network to come to a consensus, or agree on things like account balances and the order of transactions.
While the use of POW makes perfect sense in Ethereum / Bitcoin blockchain, it shows practical issues in heterogeneous p2p messaging systems with resource-restricted peers. Some peers won't be able to carry the designated computation and will be effectively excluded. Such exclusion showed to be practically an issue in applications like Status, which used to rely on POW for spam-protection, to the extent that the difficulty level had to be set close to zero.
#### Peer Scoring
The peer scoring method[^6] that is utilized by libp2p is to limit the number of messages issued by a peer in connection to another peer. That is each peer monitors all the peers to which it is directly connected and adjusts their messaging quota i.e., to route or not route their messages depending on their past activities. For example, if a peer detects its neighbor is sending more than x messages per month, can drop its quota to z.x where z is less than one. The shortcoming of this solution is that scoring is based on peers' local observations and the concept of the score is defined in relation to one single peer. This leaves room for an attack where a spammer can make connections to k peers in the system and publishes k.(x-1) messages by exploiting all of its k connections. Another attack scenario is through botnets consisting of a large number of e.g., a million bots. The attacker rents a botnet and inserts each of them as a legitimate peer to the network and each can publish x-1 messages per month[^7].
#### Economic-Incentive Spam protection
Is this the end of our spam-protection journey? Shall we simply give up and leave spammers be? Certainly not!
Waku RLN-Relay gives us a p2p spam-protection method which:
- suits **p2p** systems and does not rely on any central entity.
- is **efficient** i.e., with no unreasonable computational, storage, memory, and bandwidth requirement! as such, it fits the network of **heterogeneous** peers.
- respects users **privacy** unlike reputation-based and centralized methods.
- deploys **economic-incentives** to contain spammers' activity. Namely, there is a financial sacrifice for those who want to spam the system. How? follow along ...
We devise a general rule to save everyone's life and that is
**No one can publish more than M messages per epoch without being financially charged!**
We set M to 1 for now, but this can be any arbitrary value. You may be thinking "This is too restrictive! Only one per epoch?". Don't worry, we set the epoch to a reasonable value so that it does not slow down the communication of innocent users but will make the life of spammers harder! Epoch here can be every second, as defined by UTC date-time +-20s.
The remainder of this post is all about the story of how to enforce this limit on each user's messaging rate as well as how to impose the financial cost when the limit gets violated. This brings us to the Rate Limiting Nullifiers and how we integrate this technique into Waku v2 (in specific the Waku Relay protocol) to protect our valuable users against spammers.
# Technical Terms
**Zero-knowledge proof**: Zero-knowledge proof (ZKP)[^14] allows a *prover* to show a *verifier* that they know something, without revealing what that something is. This means you can do the trust-minimized computation that is also privacy-preserving. As a basic example, instead of showing your ID when going to a bar you simply give them proof that you are over 18, without showing the doorman your id. In this write-up, by ZKP we essentially mean zkSNARK[^15] which is one of the many types of ZKPs.
**Threshold Secret Sharing Scheme**: (m,n) Threshold secret-sharing is a method by which you can split a secret value s into n pieces in a way that the secret s can be reconstructed by having m pieces (m <= n). The economic-incentive spam protection utilizes a (2,n) secret sharing realized by Shamir Secret Sharing Scheme[^13].
# Overview: Economic-Incentive Spam protection through Rate Limiting Nullifiers
**Context**: We started the idea of economic-incentive spam protection more than a year ago and conducted a feasibility study to identify blockers and unknowns. The results are published in our prior [post](https://vac.dev/feasibility-semaphore-rate-limiting-zksnarks). Since then major progress has been made and the prior identified blockers that are listed below are now addressed. Kudos to [Barry WhiteHat](https://github.com/barryWhiteHat), [Onur Kilic](https://github.com/kilic), [Koh Wei Jie](https://github.com/weijiekoh/perpetualpowersoftau) for all of their hard work, research, and development which made this progress possible.
- the proof time[^22] which was initially in the order of minutes ~10 mins and now is almost 0.5 seconds
- the prover key size[^21] which was initially ~110MB and now is ~3.9MB
- the lack of Shamir logic[^19] which is now implemented and part of the RLN repository[^4]
- the concern regarding the potential multi-party computation for the trusted setup of zkSNARKs which got resolved[^20]
- the lack of end-to-end integration that now we made it possible, have it implemented, and are going to present it in this post. New blockers are also sorted out during the e2e integration which we will discuss in the [Feasibility and Open Issues](#feasibility-and-open-issues) section.
Now that you have more context, let's see how the final solution works. The fundamental point is to make it economically costly to send more than your share of messages and to do so in a privacy-preserving and e2e fashion. To do that we have the following components:
- 1- **Group**: We manage all the peers inside a large group (later we can split peers into smaller groups, but for now consider only one). The group management is done via a smart contract which is devised for this purpose and is deployed on the Ethereum blockchain.
- 2- **Membership**: To be able to send messages and in specific for the published messages to get routed by all the peers, publishing peers have to register to the group. Membership involves setting up public and private key pairs (think of it as the username and password). The private key remains at the user side but the public key becomes a part of the group information on the contract (publicly available) and everyone has access to it. Public keys are not human-generated (like usernames) and instead they are random numbers, as such, they do not reveal any information about the owner (think of public keys as pseudonyms). Registration is mandatory for the users who want to publish a message, however, users who only want to listen to the messages are more than welcome and do not have to register in the group.
- **Membership fee**: Membership is not for free! each peer has to lock a certain amount of funds during the registration (this means peers have to have an Ethereum account with sufficient balance for this sake). This fund is safely stored on the contract and remains intact unless the peer attempts to break the rules and publish more than one message per epoch.
- **Zero-knowledge Proof of membership**: Do you want your message to get routed to its destination, fine, but you have to prove that you are a member of the group (sorry, no one can escape the registration phase!). Now, you may be thinking that should I attach my public key to my message to prove my membership? Absolutely Not! we said that our solution respects privacy! membership proofs are done in a zero-knowledge manner that is each message will carry cryptographic proof asserting that "the message is generated by one of the current members of the group", so your identity remains private and your anonymity is preserved!
- **Slashing through secret sharing**: Till now it does not seem like we can catch spammers, right? yes, you are right! now comes the exciting part, detecting spammers and slashing them. The core idea behind the slashing is that each publishing peer (not routing peers!) has to integrate a secret share of its private key inside the message. The secret share is deterministically computed over the private key and the current epoch. The content of this share is harmless for the peer's privacy (it looks random) unless the peer attempts to publish more than one message in the same epoch hence disclosing more than one secret share of its private key. Indeed two distinct shares of the private key under the same epoch are enough to reconstruct the entire private key. Then what should you do with the recovered private key? hurry up! go to the contract and withdraw the private key and claim its fund and get rich!! Are you thinking what if spammers attach junk values instead of valid secret shares? Of course, that wouldn't be cool! so, there is a zero-knowledge proof for this sake as well where the publishing peer has to prove that the secret shares are generated correctly.
A high-level overview of the economic spam protection is shown in Figure 1.
# Flow
In this section, we describe the flow of the economic-incentive spam detection mechanism from the viewpoint of a single peer. An overview of this flow is provided in Figure 3.
## Setup and Registration
A peer willing to publish a message is required to register. Registration is moderated through a smart contract deployed on the Ethereum blockchain. The state of the contract contains the list of registered members' public keys. An overview of registration is illustrated in Figure 2.
For the registration, a peer creates a transaction that sends x amount of Ether to the contract. The peer who has the "private key" `sk` associated with that deposit would be able to withdraw x Ether by providing valid proof. Note that `sk` is initially only known by the owning peer however it may get exposed to other peers in case the owner attempts spamming the system i.e., sending more than one message per epoch.
The following relation holds between the `sk` and `pk` i.e., `pk = H(sk)` where `H` denotes a hash function.
<p align="center">
<img src="../assets/img/rln-relay/rln-relay.png" />
<br />
Figure 2: Registration.
</p>
## Maintaining the membership Merkle Tree
The ZKP of membership that we mentioned before relies on the representation of the entire group as a [Merkle Tree](). The tree construction and maintenance is delegated to the peers (the initial idea was to keep the tree on the chain as part of the contract, however, the cost associated with member deletion and insertion was high and unreasonable, please see [Feasibility and Open Issues](#Feasibility-and-Open-Issues) for more details). As such, each peer needs to build the tree locally and sync itself with the contract updates (peer insertion and deletion) to mirror them on its tree.
Two pieces of information of the tree are important as they enable peers to generate zero-knowledge proofs. One is the root of the tree and the other is the membership proof (or the authentication path). The tree root is public information whereas the membership proof is private data (or more precisely the index of the peer in the tree).
## Publishing
In order to publish at a given epoch, each message must carry a proof i.e., a zero-knowledge proof signifying that the publishing peer is a registered member, and has not exceeded the messaging rate at the given epoch.
Recall that the enforcement of the messaging rate was through associating a secret shared version of the peer's `sk` into the message together with a ZKP that the secret shares are constructed correctly. As for the secret sharing part, the peer generates the following data:
1. `shareX`
2. `shareY`
3. `nullifier`
The pair (`shareX`, `shareY`) is the secret shared version of `sk` that are generated using Shamir secret sharing scheme. Having two such pairs for an identical `nullifier` results in full disclosure of peer's `sk` and hence burning the associated deposit. Note that the `nullifier` is a deterministic value derived from `sk` and `epoch` therefore any two messages issued by the same peer (i.e., using the same `sk`) for the same `epoch` are guaranteed to have identical `nullifier`s.
Finally, the peer generates a zero-knowledge proof `zkProof` asserting the membership of the peer in the group and the correctness of the attached secret share (`shareX`, `shareY`) and the `nullifier`. In order to generate a valid proof, the peer needs to have two private inputs i.e., its `sk` and its authentication path. Other inputs are the tree root, epoch, and the content of the message.
**Privacy Hint:** Note that the authentication path of each peer depends on the recent list of members (hence changes when new peers register or leave). As such, it is recommended (and necessary for privacy/anonymity) that the publisher updates her authentication path based on the latest status of the group and attempts the proof using the updated version.
An overview of the publishing procedure is provided in Figure 3.
## Routing
Upon the receipt of a message, the routing peer needs to decide whether to route it or not. This decision relies on the following factors:
1) If the epoch value attached to the message has a non-reasonable gap with the routing peer's current epoch then the message must be dropped (this is to prevent a newly registered peer spamming the system by messaging for all the past epochs).
2) The message MUST contain valid proof that gets verified by the routing peer.
If the preceding checks are passed successfully, then the message is relayed. In case of an invalid proof, the message is dropped. If spamming is detected, the publishing peer gets slashed (see [Spam Detection and Slashing](#Spam-Detection-and-Slashing)).
An overview of the routing procedure is provided in Figure 3.
### Spam Detection and Slashing
In order to enable local spam detection and slashing, routing peers MUST record the `nullifier`, `shareX`, and `shareY` of any incoming message conditioned that it is not spam and has valid proof. To do so, the peer should follow the following steps.
1. The routing peer first verifies the `zkProof` and drops the message if not verified.
2. Otherwise, it checks whether a message with an identical `nullifier` has already been relayed.
+ a) If such message exists and its `shareX` and `shareY` components are different from the incoming message, then slashing takes place (if the `shareX` and `shareY` fields of the previously relayed message is identical to the incoming message, then the message is a duplicate and shall be dropped).
+ b) If none found, then the message gets relayed.
An overview of the slashing procedure is provided in Figure 3.
<p align="center">
<img src="../assets/img/rln-relay/rln-message-verification.png" />
<br />
Figure 3: Publishing, Routing and Slashing workflow.
</p>
# Feasibility and Open Issues
We've come a long way since a year ago, blockers resolved, now we have implemented it end-to-end. We learned lot and could identify further issues and unknowns some of which are blocking getting to production. The summary of the identified issues are presented below.
## Storage overhead per peer
Currently, peers are supposed to maintain the entire tree locally and it imposes storage overhead which is linear in the size of the group (see this [issue](https://github.com/vacp2p/research/issues/57)[^11] for more details). One way to cope with this is to use the light-node and full-node paradigm in which only a subset of peers who are more resourceful retain the tree whereas the light nodes obtain the necessary information by interacting with the full nodes. Another way to approach this problem is through a more storage efficient method (as described in this research issue[^12]) where peers store a partial view of the tree instead of the entire tree. Keeping the partial view lowers the storage complexity to O(log(N)) where N is the size of the group. There are still unknown unknowns to this solution, as such, it must be studied further to become fully functional.
## Cost-effective way of member insertion and deletion
Currently, the cost associated with RLN-Relay membership is around 30 USD[^10]. We aim at finding a more cost-effective approach. Please feel free to share with us your solution ideas in this regard in this [issue](https://github.com/vacp2p/research/issues/56).
## Exceeding the messaging rate via multiple registrations
While the economic-incentive solution has an economic incentive to discourage spamming, we should note that there is still **expensive attack(s)**[^23] that a spammer can launch to break the messaging rate limit. That is, the attacker can pay for multiple legit registrations e.g., k, hence being able to publish k messages per epoch. We believe that the higher the membership fee is, the less probable would be such an attack, hence a stronger level of spam-protection can be achieved. Following this argument, the high fee associated with the membership (which we listed above as an open problem) can indeed be contributing to a better protection level.
# Conclusion and Future Steps
As discussed in this post, Waku RLN Relay can achieve a privacy-preserving economic spam protection through rate-limiting nullifiers. The idea is to financially discourage peers from publishing more than one message per epoch. In specific, exceeding the messaging rate results in a financial charge. Those who violate this rule are called spammers and their messages are spam. The identification of spammers does not rely on any central entity. Also, the financial punishment of spammers is cryptographically guaranteed.
In this solution, privacy is guaranteed since: 1) Peers do not have to disclose any piece of personally identifiable information in any phase i.e., neither in the registration nor in the messaging phase 2) Peers can prove that they have not exceeded the messaging rate in a zero-knowledge manner and without leaving any trace to their membership accounts.
Furthermore, all the computations are light hence this solution fits the heterogenous p2p messaging system. Note that the zero-knowledge proof parts are handled through zkSNARKs and the benchmarking result can be found in the RLN benchmark report[^5].
**Future steps**: We are still at the PoC level, and the development is in progress. As our future steps,
- we would like to evaluate the running time associated with the Merkle tree operations. Indeed, the need to locally store Merkle tree on each peer was one of the unknowns discovered during this PoC and yet the concrete benchmarking result in this regard is not available.
- We would also like to pursue our storage-efficient Merkle Tree maintenance solution in order to lower the storage overhead of peers.
- In line with the storage optimization, the full-node light-node structure is another path to follow.
- Another possible improvement is to replace the membership contract with a distributed group management scheme e.g., through distributed hash tables. This is to address possible performance issues that the interaction with the Ethereum blockchain may cause. For example, the registration transactions are subject to delay as they have to be mined before being visible in the state of the membership contract. This means peers have to wait for some time before being able to publish any message.
# Acknowledgement
Thanks to Onur Kılıç for his explanation and pointers and for assisting with development and runtime issues. Also thanks to Barry Whitehat for his time and insightful comments. Special thanks to Oskar Thoren for his constructive comments and his guides during the development of this PoC and the writeup of this post.
# References
[^1]: Waku v2: [https://github.com/vacp2p/specs/blob/master/specs/waku/v2/waku-v2.md](https://github.com/vacp2p/specs/blob/master/specs/waku/v2/waku-v2.md)
[^2]: RLN-Relay specifications: [https://github.com/vacp2p/specs/blob/master/specs/waku/v2/waku-rln-relay.md](https://github.com/vacp2p/specs/blob/master/specs/waku/v2/waku-rln-relay.md)
[^3]: RLN documentation: [https://hackmd.io/tMTLMYmTR5eynw2lwK9n1w?both](https://hackmd.io/tMTLMYmTR5eynw2lwK9n1w?both)
[^4]: RLN repositories: [https://github.com/kilic/RLN](https://github.com/kilic/RLN) and [https://github.com/kilic/rlnapp](https://github.com/kilic/rlnapp)
[^5]: RLN Benchmark: [https://hackmd.io/tMTLMYmTR5eynw2lwK9n1w?view#Benchmarks](https://hackmd.io/tMTLMYmTR5eynw2lwK9n1w?view#Benchmarks)
[^6]: Peer Scoring: [https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md#peer-scoring](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md#peer-scoring)
[^7]: Peer scoring security issues: [https://github.com/vacp2p/research/issues/44](https://github.com/vacp2p/research/issues/44)
[^8]: Proof of work: [http://www.infosecon.net/workshop/downloads/2004/pdf/clayton.pdf](http://www.infosecon.net/workshop/downloads/2004/pdf/clayton.pdf) and [https://link.springer.com/content/pdf/10.1007/3-540-48071-4_10.pdf](https://link.springer.com/content/pdf/10.1007/3-540-48071-4_10.pdf)
[^9]: Whisper: [https://eips.ethereum.org/EIPS/eip-627](Whisper: https://eips.ethereum.org/EIPS/eip-627)
[^10]: Cost-effective way of member insertion and deletion: [https://github.com/vacp2p/research/issues/56](https://github.com/vacp2p/research/issues/56)
[^11]: Storage overhead per peer: [https://github.com/vacp2p/research/issues/57](https://github.com/vacp2p/research/issues/57)
[^12]: Storage-efficient Merkle Tree maintenance: [https://github.com/vacp2p/research/pull/54](https://github.com/vacp2p/research/pull/54)
[^13]: Shamir Secret Sharing Scheme: [https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing](https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing)
[^14]: Zero Knowledge Proof: [https://dl.acm.org/doi/abs/10.1145/3335741.3335750](https://dl.acm.org/doi/abs/10.1145/3335741.3335750) and [https://en.wikipedia.org/wiki/Zero-knowledge_proof](https://en.wikipedia.org/wiki/Zero-knowledge_proof)
[^15]: zkSNARKs: [https://link.springer.com/chapter/10.1007/978-3-662-49896-5_11](https://link.springer.com/chapter/10.1007/978-3-662-49896-5_11) and [https://coinpare.io/whitepaper/zcash.pdf](https://coinpare.io/whitepaper/zcash.pdf)
[^16]: GossipSub: [https://docs.libp2p.io/concepts/publish-subscribe/](https://docs.libp2p.io/concepts/publish-subscribe/)
[^17]: Waku Relay: [https://github.com/vacp2p/specs/blob/master/specs/waku/v2/waku-relay.md](https://github.com/vacp2p/specs/blob/master/specs/waku/v2/waku-relay.md)
[^18]: Prior blockers of RLN-Relay: [https://vac.dev/feasibility-semaphore-rate-limiting-zksnarks](https://vac.dev/feasibility-semaphore-rate-limiting-zksnarks)
[^19]: The lack of Shamir secret sharing in zkSNARKs: [https://github.com/vacp2p/research/issues/10](https://github.com/vacp2p/research/issues/10)
[^20]: The MPC required for zkSNARKs trusted setup: [https://github.com/vacp2p/research/issues/9](https://github.com/vacp2p/research/issues/9)
[^21]: Prover key size: [https://github.com/vacp2p/research/issues/8](https://github.com/vacp2p/research/issues/8)
[^22]: zkSNARKs proof time: [https://github.com/vacp2p/research/issues/7](https://github.com/vacp2p/research/issues/7)
[^23]: Attack on the messaging rate: [https://github.com/vacp2p/specs/issues/251](https://github.com/vacp2p/specs/issues/251)

View File

@ -0,0 +1,175 @@
---
layout: post
name: "Presenting JS-Waku: Waku v2 in the Browser"
title: "Presenting JS-Waku: Waku v2 in the Browser"
date: 2021-06-04 12:00:00 +0800
author: franck
published: true
permalink: /presenting-js-waku
categories: platform
summary: "JS-Waku is bringing Waku v2 to the browser. Learn what we achieved so far and what is next in our pipeline!"
image: /assets/img/js-waku-gist.png
discuss: https://forum.vac.dev/t/discussion-presenting-js-waku-waku-v2-in-the-browser/81
---
For the past 3 months, we have been working on bringing Waku v2 to the browser.
Our aim is to empower dApps with Waku v2, and it led to the creation of a new library.
We believe now is good time to introduce it!
## Waku v2
First, let's review what Waku v2 is and what problem it is trying to solve.
Waku v2 comes from a need to have a more scalable, better optimised solution for the Status app to achieve decentralised
communications on resource restricted devices (i.e., mobile phones).
The Status chat feature was initially built over Whisper.
However, Whisper has a number of caveats which makes it inefficient for mobile phones.
For example, with Whisper, all devices are receiving all messages which is not ideal for limited data plans.
To remediate this, a Waku mode (then Waku v1), based on devp2p, was introduced.
To further enable web and restricted resource environments, Waku v2 was created based on libp2p.
The migration of the Status chat feature to Waku v2 is currently in progress.
We see the need of such solution in the broader Ethereum ecosystem, beyond Status.
This is why we are building Waku v2 as a decentralised communication platform for all to use and build on.
If you want to read more about Waku v2 and what it aims to achieve,
checkout [What's the Plan for Waku v2?](/waku-v2-plan).
Since last year, we have been busy defining and implementing Waku v2 protocols in [nim-waku](https://github.com/status-im/nim-waku),
from which you can build [wakunode2](https://github.com/status-im/nim-waku#wakunode).
Wakunode2 is an adaptive and modular Waku v2 node,
it allows users to run their own node and use the Waku v2 protocols they need.
The nim-waku project doubles as a library, that can be used to add Waku v2 support to native applications.
## Waku v2 in the browser
We believe that dApps and wallets can benefit from the Waku network in several ways.
For some dApps, it makes sense to enable peer-to-peer communications.
For others, machine-to-machine communications would be a great asset.
For example, in the case of a DAO,
Waku could be used for gas-less voting.
Enabling the DAO to notify their users of a new vote,
and users to vote without interacting with the blockchain and spending gas.
[Murmur](https://github.com/status-im/murmur) was the first attempt to bring Whisper to the browser,
acting as a bridge between devp2p and libp2p.
Once Waku v2 was started and there was a native implementation on top of libp2p,
a [chat POC](https://github.com/vacp2p/waku-web-chat) was created to demonstrate the potential of Waku v2
in web environment.
It showed how using js-libp2p with few modifications enabled access to the Waku v2 network.
There was still some unresolved challenges.
For example, nim-waku only support TCP connections which are not supported by browser applications.
Hence, to connect to other node, the POC was connecting to a NodeJS proxy application using websockets,
which in turn could connect to wakunode2 via TCP.
However, to enable dApp and Wallet developers to easily integrate Waku in their product,
we need to give them a library that is easy to use and works out of the box:
introducing [JS-Waku](https://github.com/status-im/js-waku).
JS-Waku is a JavaScript library that allows your dApp, wallet or other web app to interact with the Waku v2 network.
It is available right now on [npm](https://www.npmjs.com/package/js-waku):
`npm install js-waku`.
As it is written in TypeScript, types are included in the npm package to allow easy integration with TypeScript, ClojureScript and other typed languages that compile to JavaScript.
Key Waku v2 protocols are already available:
[message](https://rfc.vac.dev/spec/14/), [store](https://rfc.vac.dev/spec/13/), [relay](https://rfc.vac.dev/spec/11/) and [light push](https://rfc.vac.dev/spec/19/),
enabling your dApp to:
- Send and receive near-instant messages on the Waku network (relay),
- Query nodes for messages that may have been missed, e.g. due to poor cellular network (store),
- Send messages with confirmations (light push).
JS-Waku needs to operate in the same context from which Waku v2 was born:
a restricted environment were connectivity or uptime are not guaranteed;
JS-Waku brings Waku v2 to the browser.
## Achievements so far
We focused the past month on developing a [ReactJS Chat App](https://status-im.github.io/js-waku/).
The aim was to create enough building blocks in JS-Waku to enable this showcase web app that
we now [use for dogfooding](https://github.com/status-im/nim-waku/issues/399) purposes.
Most of the effort was on getting familiar with the [js-libp2p](https://github.com/libp2p/js-libp2p) library
that we heavily rely on.
JS-Waku is the second implementation of Waku v2 protocol,
so a lot of effort on interoperability was needed.
For example, to ensure compatibility with the nim-waku reference implementation,
we run our [tests against wakunode2](https://github.com/status-im/js-waku/blob/90c90dea11dfd1277f530cf5d683fb92992fe141/src/lib/waku_relay/index.spec.ts#L137) as part of the CI.
This interoperability effort helped solidify the current Waku v2 specifications:
By clarifying the usage of topics
([#327](https://github.com/vacp2p/rfc/issues/327), [#383](https://github.com/vacp2p/rfc/pull/383)),
fix discrepancies between specs and nim-waku
([#418](https://github.com/status-im/nim-waku/issues/418), [#419](https://github.com/status-im/nim-waku/issues/419))
and fix small nim-waku & nim-libp2p bugs
([#411](https://github.com/status-im/nim-waku/issues/411), [#439](https://github.com/status-im/nim-waku/issues/439)).
To fully access the waku network, JS-Waku needs to enable web apps to connect to nim-waku nodes.
A standard way to do so is using secure websockets as it is not possible to connect directly to a TCP port from the browser.
Unfortunately websocket support is not yet available in [nim-libp2p](https://github.com/status-im/nim-libp2p/issues/407) so
we ended up deploying [websockify](https://github.com/novnc/websockify) alongside wakunode2 instances.
As we built the [web chat app](https://github.com/status-im/js-waku/tree/main/examples/web-chat),
we were able to fine tune the API to provide a simple and succinct interface.
You can start a node, connect to other nodes and send a message in less than ten lines of code:
```javascript
import { Waku } from 'js-waku';
const waku = await Waku.create({});
const nodes = await getStatusFleetNodes();
await Promise.all(nodes.map((addr) => waku.dial(addr)));
const msg = WakuMessage.fromUtf8String("Here is a message!", "/my-cool-app/1/my-use-case/proto")
await waku.relay.send(msg);
```
We have also put a bounty at [0xHack](https://0xhack.dev/) for using JS-Waku
and running a [workshop](https://vimeo.com/551509621).
We were thrilled to have a couple of hackers create new software using our libraries.
One of the projects aimed to create a decentralised, end-to-end encrypted messenger app,
similar to what the [ETH-DM](https://rfc.vac.dev/spec/20/) protocol aims to achieve.
Another project was a decentralised Twitter platform.
Such projects allow us to prioritize the work on JS-Waku and understand how DevEx can be improved.
As more developers use JS-Waku, we will evolve the API to allow for more custom and fine-tune usage of the network
while preserving this out of the box experience.
## What's next?
Next, we are directing our attention towards [Developer Experience](https://github.com/status-im/js-waku/issues/68).
We already have [documentation](https://www.npmjs.com/package/js-waku) available but we want to provide more:
[Tutorials](https://github.com/status-im/js-waku/issues/56), various examples
and showing how [JS-Waku can be used with Web3](https://github.com/status-im/js-waku/issues/72).
By prioritizing DevEx we aim to enable JS-Waku integration in dApps and wallets.
We think JS-Waku builds a strong case for machine-to-machine (M2M) communications.
The first use cases we are looking into are dApp notifications:
Enabling dApp to notify their user directly in their wallets!
Leveraging Waku as a decentralised infrastructure and standard so that users do not have to open their dApp to be notified
of events such as DAO voting.
We already have some POC in the pipeline to enable voting and polling on the Waku network,
allowing users to save gas by **not** broadcasting each individual vote on the blockchain.
To facilitate said applications, we are looking at improving integration with Web3 providers by providing examples
of signing, validating, encrypting and decrypting messages using Web3.
Waku is privacy conscious, so we will also provide signature and encryption examples decoupled from users' Ethereum identity.
As you can read, we have grand plans for JS-Waku and Waku v2.
There is a lot to do, and we would love some help so feel free to
check out the new role in our team:
[js-waku: Wallet & Dapp Integration Developer](https://status.im/our_team/jobs.html?gh_jid=3157894).
We also have a number of [positions](https://status.im/our_team/jobs.html) open to work on Waku protocol and nim-waku.
If you are as excited as us by JS-Waku, why not build a dApp with it?
You can find documentation on the [npmjs page](https://www.npmjs.com/package/js-waku).
Whether you are a developer, you can come chat with us using [WakuJS Web Chat](https://status-im.github.io/js-waku/)
or [chat2](https://github.com/status-im/nim-waku/blob/master/docs/tutorial/chat2.md).
You can get support on Discord [#waku-support (dev support)](https://discord.gg/VChNsDdj).
If you have any ideas on how Waku could enable a specific dapp or use case, do share, we are always keen to hear it.

View File

@ -467,3 +467,42 @@ layout: default
</div>
</div>
</section>
<section
id="research-log"
class="log container max-w-screen-xl flex flex-col sm:flex-row py-10 border-b"
>
<div class="heading-block w-full sm:w-2/12 lg:w-3/12">
<h2 class="text-sm font-semibold mb-5 s:text-center lg:text-center">
Research log
</h2>
</div>
<div class="info-block w-full sm:w-10/12 lg:w-9/12">
<div class="flex flex-col">
<ul class="mb-5 container s:mx-auto flex flex-col flex-wrap sm:flex-row">
{% for post in site.posts limit:8 %}
<li
class="flex flex-col w-full sm:w-1/2 lm:max-w-screen-xs mb-5 lg:mb-8"
>
<time class="text-xxs opacity-75"
>{{ post.date | date: '%B %d, %Y' }}</time
><a
href="{{ site.baseurl }}{{ post.url }}"
class="text-xs lg:text-base font-semibold"
>{{ post.title }}</a
>
</li>
{% endfor %}
</ul>
<div class="log__link container flex justify-start">
<a
class="link link--external"
href="{{site.url}}{{ site.baseurl }}/research-log/"
>View all posts</a
>
</div>
</div>
</div>
</section>

View File

@ -35,6 +35,7 @@ module.exports = {
sans: ["Open Sans", "sans-serif"],
},
fontSize: {
xxs: ["10px", "22px"],
xs: ["12px", "16px"],
s: ["12px", "20px"],
sm: ["14px", "22px"],