vac.dev/_drafts/fixing-whisper-with-waku.md

---
layout: post
name:  "Waku - Fixing Whisper"
title:  "Waku - Fixing Whisper"
date:   2019-11-28 12:00:00 +0800
author: oskarth
published: true
permalink: /fixing-whisper-with-waku
categories: research
summary: A research log. Why Whisper can't scale and how to fix it.
image: /assets/img/whisper_scalability.png
---

This post will introduce Waku. Waku is a fork of Whisper that addresses some of
its shortcomings in an iterative way. It will also show a theoretical scaling
model for Status.

- Description of Whisper and recap of its issues (gossip, 'darkness', pow, incentive, spec etc)
- Introduce model
- Motivation for a new protocol
- Progress so far

## Whisper theoretical model

Whisper theoretical model. Attempts to encode characteristics of it. Specifically for use case such as one by Status (see [Status Whisper usage spec](https://github.com/status-im/specs/blob/master/status-whisper-usage-spec.md)).


### Goals
1. Ensure network scales by being user or usage bound, as opposed to bandwidth growing in proportion to network size.
2. Staying with in a reasonable bandwidth limit for limited data plans.
3. Do the above without materially impacting existing nodes.


```
Case 1. Only receiving messages meant for you [naive case]

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A4. Only receiving messages meant for you.

For 100 users, receiving bandwidth is 1000.0KB/day
For 10k users, receiving bandwidth is 1000.0KB/day
For  1m users, receiving bandwidth is 1000.0KB/day

------------------------------------------------------------
Case 2. Receiving messages for everyone [naive case]

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A5. Received messages for everyone.

For 100 users, receiving bandwidth is   97.7MB/day
For 10k users, receiving bandwidth is    9.5GB/day
For  1m users, receiving bandwidth is  953.7GB/day

------------------------------------------------------------
Case 3. All private messages go over one discovery topic

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A8. All private messages are received by everyone (same topic) (static).

For 100 users, receiving bandwidth is   49.3MB/day
For 10k users, receiving bandwidth is    4.8GB/day
For  1m users, receiving bandwidth is  476.8GB/day

------------------------------------------------------------
Case 4. All private messages are partitioned into shards [naive case]

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000

For 100 users, receiving bandwidth is 1000.0KB/day
For 10k users, receiving bandwidth is    1.5MB/day
For  1m users, receiving bandwidth is   98.1MB/day

------------------------------------------------------------

Case 5. 4 + Bloom filter with false positive rate

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
- A10. Bloom filter size (m) (static): 512
- A11. Bloom filter hash functions (k) (static): 3
- A12. Bloom filter elements, i.e. topics, (n) (static): 100
- A13. Bloom filter assuming optimal k choice (sensitive to m, n).
- A14. Bloom filter false positive proportion of full traffic, p=0.1

For 100 users, receiving bandwidth is   10.7MB/day
For 10k users, receiving bandwidth is  978.0MB/day
For  1m users, receiving bandwidth is   95.5GB/day

NOTE: Traffic extremely sensitive to bloom false positives
This completely dominates network traffic at scale.
With p=1% we get 10k users ~100MB/day and 1m users ~10gb/day)
------------------------------------------------------------
Case 6. Case 5 + Benign duplicate receives

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
- A10. Bloom filter size (m) (static): 512
- A11. Bloom filter hash functions (k) (static): 3
- A12. Bloom filter elements, i.e. topics, (n) (static): 100
- A13. Bloom filter assuming optimal k choice (sensitive to m, n).
- A14. Bloom filter false positive proportion of full traffic, p=0.1
- A15. Benign duplicate receives factor (static): 2
- A16. No bad envelopes, bad PoW, expired, etc (static).

For 100 users, receiving bandwidth is   21.5MB/day
For 10k users, receiving bandwidth is    1.9GB/day
For  1m users, receiving bandwidth is  190.9GB/day

------------------------------------------------------------

Case 7. 6 + Mailserver under good conditions; small bloom fp; mostly offline

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000
- A10. Bloom filter size (m) (static): 512
- A11. Bloom filter hash functions (k) (static): 3
- A12. Bloom filter elements, i.e. topics, (n) (static): 100
- A13. Bloom filter assuming optimal k choice (sensitive to m, n).
- A14. Bloom filter false positive proportion of full traffic, p=0.1
- A15. Benign duplicate receives factor (static): 2
- A16. No bad envelopes, bad PoW, expired, etc (static).
- A17. User is offline p% of the time (static) p=0.9
- A18. No bad request, dup messages for mailservers; overlap perfect (static).
- A19. Mailserver requests can change false positive rate to be p=0.01

For 100 users, receiving bandwidth is    3.9MB/day
For 10k users, receiving bandwidth is  284.8MB/day
For  1m users, receiving bandwidth is   27.8GB/day

------------------------------------------------------------

Case 8. No metadata protection w bloom filter; 1 node connected; static shard

Next step up is to either only use contact code, or shard more aggressively.
Note that this requires change of other nodes behavior, not just local node.

Assumptions:
- A1. Envelope size (static): 1024kb
- A2. Envelopes / message (static): 10
- A3. Received messages / day (static): 100
- A6. Proportion of private messages (static): 0.5
- A7. Public messages only received by relevant recipients (static).
- A9. Private messages partitioned across partition shards (static), n=5000

For 100 users, receiving bandwidth is 1000.0KB/day
For 10k users, receiving bandwidth is    1.5MB/day
For  1m users, receiving bandwidth is   98.1MB/day

------------------------------------------------------------
```

See [source](https://github.com/vacp2p/research/tree/master/whisper_scalability)
for more detail on the model and its assumptions.

### Takeaways

The results are summed up in the following graph. Notice the log-log scale. The
colored backgrounds correspond to the following bandwidth usage:

- Blue: <10mb/d (<~300mb/month)
- Green: <30mb/d (<~1gb/month)
- Yellow: <100mb/d (<~3gb/month)
- Red: >100mb/d(>3gb/month)

![](assets/img/whisper_scalability.png)

## Progress so far

,,,
fresh attempt 2019-11-28 05:19:56 +00:00			`---`
			`layout: post`
			`name: "Waku - Fixing Whisper"`
			`title: "Waku - Fixing Whisper"`
			`date: 2019-11-28 12:00:00 +0800`
			`author: oskarth`
			`published: true`
			`permalink: /fixing-whisper-with-waku`
			`categories: research`
			`summary: A research log. Why Whisper can't scale and how to fix it.`
			`image: /assets/img/whisper_scalability.png`
			`---`

			`This post will introduce Waku. Waku is a fork of Whisper that addresses some of`
			`its shortcomings in an iterative way. It will also show a theoretical scaling`
			`model for Status.`

			`- Description of Whisper and recap of its issues (gossip, 'darkness', pow, incentive, spec etc)`
			`- Introduce model`
			`- Motivation for a new protocol`
			`- Progress so far`

			`## Whisper theoretical model`

			`Whisper theoretical model. Attempts to encode characteristics of it. Specifically for use case such as one by Status (see [Status Whisper usage spec](https://github.com/status-im/specs/blob/master/status-whisper-usage-spec.md)).`


			`### Goals`
			`1. Ensure network scales by being user or usage bound, as opposed to bandwidth growing in proportion to network size.`
			`2. Staying with in a reasonable bandwidth limit for limited data plans.`
			`3. Do the above without materially impacting existing nodes.`


			```
			`Case 1. Only receiving messages meant for you [naive case]`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A4. Only receiving messages meant for you.`

			`For 100 users, receiving bandwidth is 1000.0KB/day`
			`For 10k users, receiving bandwidth is 1000.0KB/day`
			`For 1m users, receiving bandwidth is 1000.0KB/day`

			`------------------------------------------------------------`
			`Case 2. Receiving messages for everyone [naive case]`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A5. Received messages for everyone.`

			`For 100 users, receiving bandwidth is 97.7MB/day`
			`For 10k users, receiving bandwidth is 9.5GB/day`
			`For 1m users, receiving bandwidth is 953.7GB/day`

			`------------------------------------------------------------`
			`Case 3. All private messages go over one discovery topic`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A6. Proportion of private messages (static): 0.5`
			`- A7. Public messages only received by relevant recipients (static).`
			`- A8. All private messages are received by everyone (same topic) (static).`

			`For 100 users, receiving bandwidth is 49.3MB/day`
			`For 10k users, receiving bandwidth is 4.8GB/day`
			`For 1m users, receiving bandwidth is 476.8GB/day`

			`------------------------------------------------------------`
			`Case 4. All private messages are partitioned into shards [naive case]`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A6. Proportion of private messages (static): 0.5`
			`- A7. Public messages only received by relevant recipients (static).`
			`- A9. Private messages partitioned across partition shards (static), n=5000`

			`For 100 users, receiving bandwidth is 1000.0KB/day`
			`For 10k users, receiving bandwidth is 1.5MB/day`
			`For 1m users, receiving bandwidth is 98.1MB/day`

			`------------------------------------------------------------`

			`Case 5. 4 + Bloom filter with false positive rate`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A6. Proportion of private messages (static): 0.5`
			`- A7. Public messages only received by relevant recipients (static).`
			`- A9. Private messages partitioned across partition shards (static), n=5000`
			`- A10. Bloom filter size (m) (static): 512`
			`- A11. Bloom filter hash functions (k) (static): 3`
			`- A12. Bloom filter elements, i.e. topics, (n) (static): 100`
			`- A13. Bloom filter assuming optimal k choice (sensitive to m, n).`
			`- A14. Bloom filter false positive proportion of full traffic, p=0.1`

			`For 100 users, receiving bandwidth is 10.7MB/day`
			`For 10k users, receiving bandwidth is 978.0MB/day`
			`For 1m users, receiving bandwidth is 95.5GB/day`

			`NOTE: Traffic extremely sensitive to bloom false positives`
			`This completely dominates network traffic at scale.`
			`With p=1% we get 10k users ~100MB/day and 1m users ~10gb/day)`
			`------------------------------------------------------------`
			`Case 6. Case 5 + Benign duplicate receives`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A6. Proportion of private messages (static): 0.5`
			`- A7. Public messages only received by relevant recipients (static).`
			`- A9. Private messages partitioned across partition shards (static), n=5000`
			`- A10. Bloom filter size (m) (static): 512`
			`- A11. Bloom filter hash functions (k) (static): 3`
			`- A12. Bloom filter elements, i.e. topics, (n) (static): 100`
			`- A13. Bloom filter assuming optimal k choice (sensitive to m, n).`
			`- A14. Bloom filter false positive proportion of full traffic, p=0.1`
			`- A15. Benign duplicate receives factor (static): 2`
			`- A16. No bad envelopes, bad PoW, expired, etc (static).`

			`For 100 users, receiving bandwidth is 21.5MB/day`
			`For 10k users, receiving bandwidth is 1.9GB/day`
			`For 1m users, receiving bandwidth is 190.9GB/day`

			`------------------------------------------------------------`

			`Case 7. 6 + Mailserver under good conditions; small bloom fp; mostly offline`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A6. Proportion of private messages (static): 0.5`
			`- A7. Public messages only received by relevant recipients (static).`
			`- A9. Private messages partitioned across partition shards (static), n=5000`
			`- A10. Bloom filter size (m) (static): 512`
			`- A11. Bloom filter hash functions (k) (static): 3`
			`- A12. Bloom filter elements, i.e. topics, (n) (static): 100`
			`- A13. Bloom filter assuming optimal k choice (sensitive to m, n).`
			`- A14. Bloom filter false positive proportion of full traffic, p=0.1`
			`- A15. Benign duplicate receives factor (static): 2`
			`- A16. No bad envelopes, bad PoW, expired, etc (static).`
			`- A17. User is offline p% of the time (static) p=0.9`
			`- A18. No bad request, dup messages for mailservers; overlap perfect (static).`
			`- A19. Mailserver requests can change false positive rate to be p=0.01`

			`For 100 users, receiving bandwidth is 3.9MB/day`
			`For 10k users, receiving bandwidth is 284.8MB/day`
			`For 1m users, receiving bandwidth is 27.8GB/day`

			`------------------------------------------------------------`

			`Case 8. No metadata protection w bloom filter; 1 node connected; static shard`

			`Next step up is to either only use contact code, or shard more aggressively.`
			`Note that this requires change of other nodes behavior, not just local node.`

			`Assumptions:`
			`- A1. Envelope size (static): 1024kb`
			`- A2. Envelopes / message (static): 10`
			`- A3. Received messages / day (static): 100`
			`- A6. Proportion of private messages (static): 0.5`
			`- A7. Public messages only received by relevant recipients (static).`
			`- A9. Private messages partitioned across partition shards (static), n=5000`

			`For 100 users, receiving bandwidth is 1000.0KB/day`
			`For 10k users, receiving bandwidth is 1.5MB/day`
			`For 1m users, receiving bandwidth is 98.1MB/day`

			`------------------------------------------------------------`
			```

			`See [source](https://github.com/vacp2p/research/tree/master/whisper_scalability)`
			`for more detail on the model and its assumptions.`

			`### Takeaways`

			`The results are summed up in the following graph. Notice the log-log scale. The`
			`colored backgrounds correspond to the following bandwidth usage:`

			`- Blue: <10mb/d (<~300mb/month)`
			`- Green: <30mb/d (<~1gb/month)`
			`- Yellow: <100mb/d (<~3gb/month)`
			`- Red: >100mb/d(>3gb/month)`

			`![](assets/img/whisper_scalability.png)`

			`## Progress so far`

			`,,,`