We had a very nasty problem: handshakes were serial so incoming
dials would wait for each other to finish handshaking. this was
particularly problematic when handshakes hung-- nodes would not
recover quickly. This led to gateways not bootstrapping peers
fast enough.
The approach taken here is to do what crypto/tls does:
defer the handshake until Read/Write[1]. There are a number of
reasons why this is _the right thing to do_:
- it delays handshaking until it is known to be necessary (doing io)
- it "accepts" before the handshake, getting the handshake out of the
critical path entirely.
- it defers to the user's parallelization of conn handling. users
must implement this in some way already so use that, instead of
picking constants surely to be wrong (how many handshakes to run
in parallel?)
[0] http://golang.org/src/crypto/tls/conn.go#L886
We now consider debugerrors harmful: we've run into cases where
debugerror.Wrap() hid valuable error information (err == io.EOF?).
I've removed them from the main code, but left them in some tests.
Go errors are lacking, but unfortunately, this isn't the solution.
It is possible that debugerros.New or debugerrors.Errorf should
remain still (i.e. only remove debugerrors.Wrap) but we don't use
these errors often enough to keep.
The same keys + nonces in secio were being observed. As described in
https://github.com/ipfs/go-ipfs/issues/1016 -- the handshake must
be talking to itself. This can happen in an outgoing TCP dial with
REUSEPORT on to the same address.
reuseport is a hack. It is necessary for us to do certain kinds of
tcp nat traversal. Ideally, reuseport would be available in go:
https://github.com/golang/go/issues/9661
But until that issue is fixed, we're stuck with this. In some cases,
reuseport is strictly a detriment: nodes are not NATed. This commit
introduces an ENV var IPFS_REUSEPORT that can be set to false to
avoid using reuseport entirely:
IPFS_REUSEPORT=false ipfs daemon
This approach addresses our current need. It could become a config
var if necessary. If reuseport continues to give problems, we should
look into improving it.
humanize bandwidth output
instrument conn.Conn for bandwidth metrics
add poll command for continuous bandwidth reporting
move bandwidth tracking onto multiaddr net connections
another mild refactor of recording locations
address concerns from PR
lower mock nodes in race test due to increased goroutines per connection
- updated go-ctxgroup and goprocess
ctxgroup: AddChildGroup was changed to AddChild. Used in two files:
- p2p/net/mock/mock_net.go
- routing/dht/dht.go
- updated context from hg repo to git
prev. commit in hg was ad01a6fcc8a19d3a4478c836895ffe883bd2ceab. (context: make parentCancelCtx iterative)
represents commit 84f8955a887232b6308d79c68b8db44f64df455c in git repo
- updated context to master (b6fdb7d8a4ccefede406f8fe0f017fb58265054c)
Aaron Jacobs (2):
net/context: Don't accept a context in the DoSomethingSlow example.
context: Be clear that users must cancel the result of WithCancel.
Andrew Gerrand (1):
go.net: use golang.org/x/... import paths
Bryan C. Mills (1):
net/context: Don't leak goroutines in Done example.
Damien Neil (1):
context: fix removal of cancelled timer contexts from parent
David Symonds (2):
context: Fix WithValue example code.
net: add import comments.
Sameer Ajmani (1):
context: fix TestAllocs to account for ints in interfaces
Use of the ratelimiter should be conscious of the ratelimiter's
potential closing. any loops that add work to ratelimiter
should (a) only do so if the rate limiter is not closed,
or (b) prevent limiter while work is added
(i.e. use limiter.Go(addWorkHere))
TODOs:
- need to consolidate all the versioning stuff into one package
- need to do the version check as a handshake, before further
communication happens. we used to do this.
This will mitigate the fd explosion, but slow down dials majorly
as any peer with more addresses than the rate limit will have
to wait a whole dial timeout (~15s)