mirror of https://github.com/vacp2p/research.git
154 lines
5.0 KiB
Markdown
154 lines
5.0 KiB
Markdown
|
# Notes
|
||
|
|
||
|
Bloom filter and probability of false positives
|
||
|
|
||
|
|
||
|
## Questions
|
||
|
|
||
|
### How big is an envelope?
|
||
|
|
||
|
`[ Expiry, TTL, Topic, Data, Nonce ]` 4+4+4+arb+8, where arb is data field as
|
||
|
factor of 256 bytes (minus salt?), according to EIP627. Can't find.
|
||
|
|
||
|
512 bytes seems like minimum, and 4 kB max. Let's assume 1 kB for now.
|
||
|
|
||
|
**Question: How big is an envelope in Status? Does it change?**
|
||
|
|
||
|
### How many envelopes over some time period for a full node?
|
||
|
|
||
|
Running `time ./shh_basic_client --mainnet --watch --post --port=30000`, connected to all full nodes with a full bloom filter we get incoming messages:
|
||
|
|
||
|
`WRN 2019-10-14 13:05:55+09:00 Message Ingress Stats tid=9527 added=6 allowed=58 disallowed=281 disallowedBloom=0 disallowedPow=281 disallowedSize=0 duplicate=27 invalid=23 noHandshake=0`
|
||
|
|
||
|
Note that pow is incorrect, so that's 281+58 ~ 330 envelopes in total per minute.
|
||
|
|
||
|
Assuming this is representative for whole network load, we get ~2k envelopes per
|
||
|
hour and ~50k envelopes per 24h. This corresponds to roughly ~2mb/h and
|
||
|
~50mb/day, assuming below is accurate.
|
||
|
|
||
|
Note that in https://discuss.status.im/t/performance-of-mailservers/1215 we see
|
||
|
1467846 envelopes for 30days, which is ~50k/day. While this was in May, this is
|
||
|
another data point.
|
||
|
|
||
|
### How many per topic?
|
||
|
|
||
|
`time ./shh_basic_client --mainnet --watch --post --port=30000 | grep topicStr > foo.log` with same settings as above for 1 minute:
|
||
|
|
||
|
`wc -l foo.log` 159 entries, so half of above. Not sure why.
|
||
|
|
||
|
```
|
||
|
cat foo.log | grep 5C6C9B56 | wc -l
|
||
|
159
|
||
|
```
|
||
|
|
||
|
All are from that weird topic. Hypothesis: no one was using Status during this
|
||
|
minute. Lets run again. Sent 4 messages from mobile, public and private chat
|
||
|
(1). Plus launched Status app so discovery topic. Indeed:
|
||
|
|
||
|
```
|
||
|
oskarth@localhost /home/oskarth/git/nim-eth/tests/p2p> wc -l foo.log
|
||
|
664 foo.log
|
||
|
```
|
||
|
|
||
|
```
|
||
|
oskarth@localhost /home/oskarth/git/nim-eth/tests/p2p> cat foo.log | grep 5C6C9B56 | wc -l
|
||
|
186
|
||
|
```
|
||
|
|
||
|
Constant at roughly x3 a second.
|
||
|
|
||
|
```
|
||
|
oskarth@localhost /home/oskarth/git/nim-eth/tests/p2p> cat foo.log | grep F8946AAC | wc -l
|
||
|
432
|
||
|
```
|
||
|
|
||
|
Discovery topic, um that's a lot! Does this imply duplicate?
|
||
|
|
||
|
```
|
||
|
oskarth@localhost /home/oskarth/git/nim-eth/tests/p2p> cat foo.log | grep 9C22FF5F | wc -l
|
||
|
36
|
||
|
```
|
||
|
|
||
|
```
|
||
|
oskarth@localhost /home/oskarth/git/nim-eth/tests/p2p> cat foo.log | grep CD423760 | wc -l
|
||
|
10
|
||
|
```
|
||
|
|
||
|
Not quite sure what to make of this, tbh.
|
||
|
|
||
|
Lets run again, just launch app (30s): It's all in either 5C6C9B56 topic
|
||
|
(mystery), 126 (for 30s), or discovery F8946AAC 911 (30s)!
|
||
|
|
||
|
The discovery topic is going crazy.
|
||
|
|
||
|
Hypothesis: a lot of duplicate envelopes getting through here.
|
||
|
|
||
|
### How many envelopes have already been received?
|
||
|
|
||
|
Checking hashes during 30s run. Also go to app and send one message. 159 total hashes and
|
||
|
|
||
|
`0102C0C42044FDCAC2D64CAF7EF35AA759BEA21703EE8BA7AEFFD176E2280089` 4
|
||
|
|
||
|
```
|
||
|
oskarth@localhost /home/oskarth/git/nim-eth/tests/p2p> cat foo.log | awk '{print $8}' | sort | uniq -c
|
||
|
1
|
||
|
8 hash=0002B6BA794DD457D55D67AD68BB8C49C98791AFEF466534FC97E28062F763FB
|
||
|
8 hash=00637F3882A2EFEC89CF40249DC59FDC9A049B78D42EDB6E116B0D76BE0AA523
|
||
|
4 hash=0102C0C42044FDCAC2D64CAF7EF35AA759BEA21703EE8BA7AEFFD176E2280089
|
||
|
24 hash=2D567B7E97FA2510A1299EA84F140B19DA0B2012BE431845B433BA238A22282C
|
||
|
22 hash=40D7D3BCC784CC9D663446A9DFB06D55533F80799261FDD30E30CC70853572CE
|
||
|
21 hash=707DA8C56605C57C930CE910F2700E379C350C845B5DAE20A9F8A6DBA4F59B2B
|
||
|
24 hash=AC8C3ABE198ABE3BF286E80E25B0CFF08B3573F42F61499FB258F519C1CF9F18
|
||
|
24 hash=C4C3D64886ED31A387B7AE57C904D702AEE78036E9446B30E964A149134B0D56
|
||
|
24 hash=D4A1D17641BD08E58589B120E7F8F399D23DA1AF1BA5BD3FED295CD852BC17DA
|
||
|
```
|
||
|
|
||
|
For how many connected nodes is this? WhisperNodes is 24, so assuming that's
|
||
|
duplication factor. Urf. But for Status nodes this should be lower?
|
||
|
|
||
|
### How does this behave with light users?
|
||
|
|
||
|
### How does this vary with Bloom filters?
|
||
|
|
||
|
Scalability trades off privacy vs traffic. I.e .false positives = privacy =
|
||
|
bandwidth.
|
||
|
|
||
|
Bloom size is 512-bits, and we have m topics, which leads to some p false
|
||
|
positive rate.
|
||
|
|
||
|
Assuming optimal number of hash functions, k = (m/n) ln2. Note that this is set
|
||
|
to 3 in Whisper afaict.
|
||
|
|
||
|
https://hur.st/bloomfilter/?n=1000&p=&m=512&k=3
|
||
|
|
||
|
Woaha, so at 512 bits, k=3 the probability of false positives is 1% at ~50
|
||
|
topics, ~10% at 100 topics and essentially 1 at 1000 topics.
|
||
|
|
||
|
Which makes sense, since the bloom would be full by then. Question: is this
|
||
|
actual items in filter or universe? This is for all that filter is being tested
|
||
|
to.
|
||
|
|
||
|
You need about 10 bits per element or 1 byte. So 50 topics checks out. Um...
|
||
|
sigh.
|
||
|
|
||
|
Accurate that this means 1% traffic overhead if you listen to 50 topics? How
|
||
|
many topics does a normal app listen to? It quickly explodes! Actually is this
|
||
|
accurate? Because if you get 100% it isn't 100% of traffic, it is _all_ traffic.
|
||
|
|
||
|
topicMatch vs bloomMatch.
|
||
|
|
||
|
ok, 3 main factors:
|
||
|
|
||
|
1 big topics
|
||
|
- discover, then 5k one
|
||
|
- what happens
|
||
|
2. duplicate messages
|
||
|
- number of peers
|
||
|
- mailservers
|
||
|
3. bloom filter
|
||
|
- false positive
|
||
|
- direct api call
|
||
|
4. disconnect bad peers
|
||
|
|
||
|
offline case dominating over online
|