to support batch publishing messages
Replaces #602.
Batch publishing lets the system know there are multiple related
messages to be published so it can prioritize sending different messages
before sending copies of messages. For example, with the default API,
when you publish two messages A and B, under the hood A gets sent to D=8
peers first, before B gets sent out. With this MessageBatch api we can
now send one copy of A _and then_ one copy of B before sending multiple
copies.
When a node has bandwidth constraints relative to the messages it is
publishing this improves dissemination time.
For more context see this post:
https://ethresear.ch/t/improving-das-performance-with-gossipsub-batch-publishing/21713
We were sending IDONTWANT to the sender of the received message. This is
pointless, as the sender should not repeat a message it already sent.
The sender could also have tracked that it had sent this peer the
message (we don't do this currently, and it's probably not necessary).
@ppopth
When a new peer wants to graft us into their mesh, we check our current
mesh size to determine whether we can add any more new peers to it. This
is done to prevent our mesh size from being greater than `Dhi` and
prevent mesh takeover attacks here:
c06df2f9a3/gossipsub.go (L943)
During every heartbeat we check our mesh size and if it is **greater**
than `Dhi` then we will prune our mesh back down to `D`.
c06df2f9a3/gossipsub.go (L1608)
However if you look closely at both lines there is a problematic end
result. Since we only stop grafting new peers into our mesh if our
current mesh size is **greater than or equal to** `Dhi` and we only
prune peers if the current mesh size is greater than `Dhi`.
This would result in the mesh being in a state of stasis at `Dhi`.
Rather than float between `D` and `Dhi` , the mesh stagnates at `Dhi` .
This would end up increasing the target degree of the node to `Dhi` from
`D`. This had been observed in ethereum mainnet by recording mesh
interactions and message fulfillment from those peers.
This PR fixes it by adding an equality check to the conditional so that
it can be periodically pruned. The PR also adds a regression test for
this particular case.
In high load scenarios when consumer is slow, `doDropRPC` is called
often and makes extra unnecessary allocations formatting `log.Debug`
message.
Fixed by checking log level before running expensive formatting.
Before:
```
BenchmarkAllocDoDropRPC-10 13684732 76.28 ns/op 144 B/op 3 allocs/op
```
After:
```
BenchmarkAllocDoDropRPC-10 28140273 42.88 ns/op 112 B/op 1 allocs/op
```
## GossipSub v1.2 implementation
Specification: libp2p/specs#548
### Work Summary
Sending IDONTWANT
Implement a smart queue
Add priorities to the smart queue
Put IDONTWANT packets into the smart priority queue as soon as the node gets the packets
Handling IDONTWANT
Use a map to remember the message ids whose IDONTWANT packets have been received
Implement max_idontwant_messages (ignore the IDONWANT packets if the max is reached)
Clear the message IDs from the cache after 3 heartbeats
Hash the message IDs before putting them into the cache.
More requested features
Add a feature test to not send IDONTWANT if the other side doesnt support it
### Commit Summary
* Replace sending channel with the smart rpcQueue
Since we want to implement a priority queue later, we need to replace
the normal sending channels with the new smart structures first.
* Implement UrgentPush in the smart rpcQueue
UrgentPush allows you to push an rpc packet to the front of the queue so
that it will be popped out fast.
* Add IDONTWANT to rpc.proto and trace.proto
* Send IDONTWANT right before validation step
Most importantly, this commit adds a new method called PreValidation to
the interface PubSubRouter, which will be called right before validating
the gossipsub message.
In GossipSubRouter, PreValidation will send the IDONTWANT controll
messages to all the mesh peers of the topics of the received messages.
* Test GossipSub IDONWANT sending
* Send IDONWANT only for large messages
* Handle IDONTWANT control messages
When receiving IDONTWANTs, the host should remember the message ids
contained in IDONTWANTs using a hash map.
When receiving messages with those ids, it shouldn't forward them to the
peers who already sent the IDONTWANTs.
When the maximum number of IDONTWANTs is reached for any particular
peer, the host should ignore any excessive IDONTWANTs from that peer.
* Clear expired message IDs from the IDONTWANT cache
If the messages IDs received from IDONTWANTs are older than 3
heartbeats, they should be removed from the IDONTWANT cache.
* Keep the hashes of IDONTWANT message ids instead
Rather than keeping the raw message ids, keep their hashes instead to
save memory and protect again memory DoS attacks.
* Increase GossipSubMaxIHaveMessages to 1000
* fixup! Clear expired message IDs from the IDONTWANT cache
* Not send IDONTWANT if the receiver doesn't support
* fixup! Replace sending channel with the smart rpcQueue
* Not use pointers in rpcQueue
* Simply rcpQueue by using only one mutex
* Check ctx error in rpc sending worker
Co-authored-by: Steven Allen <steven@stebalien.com>
* fixup! Simply rcpQueue by using only one mutex
* fixup! Keep the hashes of IDONTWANT message ids instead
* Use AfterFunc instead implementing our own
* Fix misc lint errors
* fixup! Fix misc lint errors
* Revert "Increase GossipSubMaxIHaveMessages to 1000"
This reverts commit 6fabcdd068a5f5238c5280a3460af9c3998418ec.
* Increase GossipSubMaxIDontWantMessages to 1000
* fixup! Handle IDONTWANT control messages
* Skip TestGossipsubConnTagMessageDeliveries
* Skip FuzzAppendOrMergeRPC
* Revert "Skip FuzzAppendOrMergeRPC"
This reverts commit f141e13234de0960d139339acb636a1afea9e219.
* fixup! Send IDONWANT only for large messages
* fixup! fixup! Keep the hashes of IDONTWANT message ids instead
* fixup! Implement UrgentPush in the smart rpcQueue
* fixup! Use AfterFunc instead implementing our own
---------
Co-authored-by: Steven Allen <steven@stebalien.com>
And change it to take into account the fact that libp2p now trims
connections immediately (when no grace-period is specified) instead of
waiting for a timeout.
This removes dependencies on swarm/testing and the blank host.
1. swarm/testing really shouldn't be used at all except for internal
libp2p stuff.
2. The blank host should only be used in _very_ special cases (autonat,
mostly).
* Subscribe to libp2p events to maintain our own Certified Address Book
* Update go version
* Use TestGossipsubStarTopology test instead of new test
* Don't return an error in manageAddrBook
* Return on error while subscribing
* Use null resource manager so that the new IP limit doesn't break tests
* Mod tidy
This will allow us to add more logic around when we split/merge
messages. It will also allow us to build the outgoing rpcs as we go
rather than building one giant rpc and then splitting it.
* Implement Unsusbcribe backoff
* Add test to check that prune backoff time is used
* Update which backoff to use in TestGossibSubJoinTopic test
* Fix race in TestGossipSubLeaveTopic
* Wait for all the backoff checks, and check that we aren't missing too many
* Remove open question
* cleanup: fix vet failures and most staticcheck failures
* Fix remaining staticcheck failures
* Give test goroutines chance to exit early when context is canceled
In [drand](https://github.com/drand/drand) we have a gossipsub relay to allow users to subscribe to getting random values over pubsub. We want to support pure gossip relays who relay from a relay. For this we need direct peering agreements and want to mitigate the possibility of "missing" randomness messages by ensuring the direct connect ticks period is less than the period between updates.
This PR simply adds a new functional option allowing us to set the direct connect ticks value without modifying the global variable.