Commit Graph

215 Commits

Author SHA1 Message Date
Felix Lange 6fb810adaa p2p: throttle all discovery lookups
Lookup calls would spin out of control when network connectivity was
lost. The throttling that was in place only took effect when the table
returned zero results, which doesn't happen very often.

The new throttling should not have a negative impact when the host is
online. Lookups against the network take some time and dials for all
results must complete or hit the cache before a new one is started. This
usually takes longer than four seconds, leaving online lookups
unaffected.

Fixes #1296
2015-06-22 01:07:58 +02:00
Felix Lange 70da79f04c p2p: improve disconnect logging 2015-06-15 15:03:46 +02:00
Felix Lange 8dcbdcad0a p2p: track write errors and prevent writes during shutdown
As of this commit, we no longer rely on the protocol handler to report
write errors in a timely fashion. When a write fails, shutdown is
initiated immediately and no new writes can start. This will also
prevent new writes from starting after Server.Stop has been called.
2015-06-15 15:03:46 +02:00
Felix Lange a8e4cb6dfe p2p/discover: use separate rand.Source instances in tests
rand.Source isn't safe for concurrent use.
2015-06-10 15:18:01 +02:00
Felix Lange 261a8077c4 p2p/discover: deflake TestUDP_successfulPing 2015-06-10 13:08:21 +02:00
Péter Szilágyi 1cbbfbe7fa p2p: fix a close race in the dial test 2015-06-09 22:26:26 +03:00
Felix Lange 3239aca69b p2p: bump global write timeout to 20s
The previous value of 5 seconds causes timeouts for legitimate messages
if large messages are sent.
2015-06-09 17:07:10 +02:00
Péter Szilágyi ff84352fb7 p2p: fix close data race 2015-06-09 16:12:24 +03:00
Felix Lange fc6a5ae3ec p2p/nat: add timeout for UPnP SOAP requests 2015-06-04 22:25:43 +02:00
Felix Lange 8e4512a5e7 p2p/nat: bump timeout in TestAutoDiscRace 2015-05-28 01:09:26 +02:00
Péter Szilágyi 612f01400f p2p/discover: bond with seed nodes too (runs only if findnode failed) 2015-05-26 23:30:41 +02:00
Péter Szilágyi 3630432dfb p2p/discovery: fix a cornercase loop if no seeds or bootnodes are known 2015-05-26 23:30:40 +02:00
Péter Szilágyi f539ed1e66 p2p/discover: force refresh if the table is empty 2015-05-26 23:30:40 +02:00
Péter Szilágyi 5076170f34 p2p/discover: permit temporary bond failures for previously known nodes 2015-05-26 23:30:40 +02:00
Péter Szilágyi 6078aa08eb p2p/discover: watch find failures, evacuate on too many, rebond if failed 2015-05-26 23:30:40 +02:00
Péter Szilágyi 64174f196f p2p/discover: add support for counting findnode failures 2015-05-26 23:30:40 +02:00
Péter Szilágyi 68898a4d6b p2p: fix Self() panic if listening is disabled 2015-05-26 19:16:05 +03:00
Péter Szilágyi e1a0ee8fc5 cmd/geth, cmd/utils, eth, p2p: pass and honor a no discovery flag 2015-05-26 19:07:24 +03:00
Péter Szilágyi 278183c7e7 eth, p2p: start the p2p server even if maxpeers == 0 2015-05-26 17:49:37 +03:00
Felix Lange 9e1fd70b50 p2p: decrease frameReadTimeout to 30s
This detects hanging connections sooner. We send a ping every 15s and
other implementation have similar limits.
2015-05-25 01:17:14 +02:00
Felix Lange 1440f9a37a p2p: new dialer, peer management without locks
The most visible change is event-based dialing, which should be an
improvement over the timer-based system that we have at the moment.
The dialer gets a chance to compute new tasks whenever peers change or
dials complete. This is better than checking peers on a timer because
dials happen faster. The dialer can now make more precise decisions
about whom to dial based on the peer set and we can test those
decisions without actually opening any sockets.

Peer management is easier to test because the tests can inject
connections at checkpoints (after enc handshake, after protocol
handshake).

Most of the handshake stuff is now part of the RLPx code. It could be
exported or move to its own package because it is no longer entangled
with Server logic.
2015-05-25 01:17:14 +02:00
Felix Lange 9f38ef5d97 p2p/discover: add ReadRandomNodes 2015-05-25 01:17:14 +02:00
Felix Lange 64564da20b p2p: decrease maximum message size for devp2p to 1kB
The previous limit was 10MB which is unacceptable for all kinds
of reasons, the most important one being that we don't want to
allow the remote side to make us allocate 10MB at handshake time.
2015-05-25 01:17:14 +02:00
Felix Lange dbdc5fd4b3 p2p: delete Server.Broadcast 2015-05-25 01:17:14 +02:00
Péter Szilágyi cbd3ae6906 p2p/discover: fix #838, evacuate self entries from the node db 2015-05-21 19:41:46 +03:00
Péter Szilágyi af24c271c7 p2p/discover: fix database presistency test folder 2015-05-21 19:28:10 +03:00
Jeffrey Wilcke 90b94e64fc Merge pull request #971 from fjl/p2p-limit-tweaks
p2p: tweak connection limits
2015-05-14 08:15:51 -07:00
Felix Lange d2f119cf9b p2p/discover: limit open files for node database 2015-05-14 15:01:13 +02:00
Felix Lange 206fe25971 p2p: remove testlog 2015-05-14 14:56:34 +02:00
Felix Lange 7fa2607bd1 p2p/discover: bump maxBondingPingPongs to 16
This should increase the speed a bit because all findnode
results (up to 16) can be verified at the same time.
2015-05-14 14:53:29 +02:00
Felix Lange 691cb90284 p2p: log remote reason when disconnect is requested
The returned reason is currently not used except for the log
message. This change makes the log messages a bit more useful.
The handshake code also returns the remote reason.
2015-05-14 14:53:29 +02:00
Felix Lange c14de2e973 p2p/nat: tweak port mapping log messages and levels
People stil get confused about the messages. This commit changes
the levels so that the only thing printed at the default level (info)
is a successful mapping.
2015-05-14 12:54:59 +02:00
Felix Lange 663d4e0aff p2p/nat: add test for UPnP auto discovery via SSDP
The test listens for multicast UDP packets on the default interface
because I couldn't get it to work reliably on loopback without massive
changes to goupnp. This means that the test might fail when there is a
UPnP-enabled router attached on that interface. I checked that locally
by looping the test and it passes reliably because the local SSDP server
always responds faster.
2015-05-14 12:13:19 +02:00
Felix Lange 983f5a717a p2p/nat: fix concurrent access to autodisc Interface
Concurrent calls to Interface methods on autodisc could return a "not
discovered" error if the discovery did not finish before the call.
autodisc.wait expected the done channel to carry the found Interface
but it was closed instead.

The fix is to use sync.Once for now, which is easier to get right.
And there is a test. Finally.

This will have to change again when we introduce re-discovery.
2015-05-14 03:53:11 +02:00
Felix Lange 7efeb4bd96 p2p: bump maxAcceptConns and defaultDialTimout
On the test network, we've seen that it becomes harder to connect
if the queues are so short.
2015-05-14 03:48:28 +02:00
Felix Lange 251846d65a p2p/discover: fix out-of-bounds slicing for chunked neighbors packets
The code assumed that Table.closest always returns at least 13 nodes.
This is not true for small tables (e.g. during bootstrap).
2015-05-13 21:49:04 +02:00
subtly 8eef2b765a fix test. 2015-05-13 20:15:01 +02:00
subtly a32693770c Manual send of multiple neighbours packets. Test receiving multiple neighbours packets. 2015-05-13 20:03:17 +02:00
subtly 7473c93668 UDP Interop. Limit datagrams to 1280bytes.
We don't have a UDP which specifies any messages that will be 4KB. Aside from being implemented for months and a necessity for encryption and piggy-backing packets, 1280bytes is ideal, and, means this TODO can be completed!

Why 1280 bytes?
* It's less than the default MTU for most WAN/LAN networks. That means fewer fragmented datagrams (esp on well-connected networks).
* Fragmented datagrams and dropped packets suck and add latency while OS waits for a dropped fragment to never arrive (blocking readLoop())
* Most of our packets are < 1280 bytes.
* 1280 bytes is minimum datagram size and MTU for IPv6 -- on IPv6, a datagram < 1280bytes will *never* be fragmented.

UDP datagrams are dropped. A lot! And fragmented datagrams are worse. If a datagram has a 30% chance of being dropped, then a fragmented datagram has a 60% chance of being dropped. More importantly, we have signed packets and can't do anything with a packet unless we receive the entire datagram because the signature can't be verified. The same is true when we have encrypted packets.

So the solution here to picking an ideal buffer size for receiving datagrams is a number under 1400bytes. And the lower-bound value for IPv6 of 1280 bytes make's it a non-decision. On IPv4 most ISPs and 3g/4g/let networks have an MTU just over 1400 -- and *never* over 1500. Never -- that means packets over 1500 (in reality: ~1450) bytes are fragmented. And probably dropped a lot.

Just to prove the point, here are pings sending non-fragmented packets over wifi/ISP, and a second set of pings via cell-phone tethering. It's important to note that, if *any* router between my system and the EC2 node has a lower MTU, the message would not go through:

On wifi w/normal ISP:
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=104.831 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=119.004 ms
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 104.831/111.918/119.004/7.087 ms
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1


Tethering to O2:
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=107.844 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=105.127 ms
1458 bytes from 52.6.250.242: icmp_seq=2 ttl=42 time=120.483 ms
1458 bytes from 52.6.250.242: icmp_seq=3 ttl=42 time=102.136 ms
2015-05-13 19:03:00 +02:00
Bas van Kervel 95773b9673 removed redundant newlines in import block 2015-05-12 15:20:53 +02:00
Bas van Kervel b79dd188d9 replaced several path.* with filepath.* which is platform independent 2015-05-12 14:24:11 +02:00
Felix Lange d4f0a67323 p2p: drop connections with no matching protocols 2015-05-08 16:09:55 +02:00
Felix Lange 9c0f36c46d p2p: use maxDialingConns instead of maxAcceptConns as dial limit 2015-05-08 16:09:55 +02:00
Felix Lange 914e57e49b p2p: fix disconnect at capacity
With the introduction of static/trusted nodes, the peer count
can go above MaxPeers. Update the capacity check to handle this.
While here, decouple the trusted nodes check from the handshake
by passing a function instead.
2015-05-08 16:09:54 +02:00
Péter Szilágyi 8735e5addd p2p: increase the handshake timeout in the tests 2015-05-07 15:30:56 +03:00
Péter Szilágyi 4d5a719f25 cmd, eth, p2p: introduce pending peer cli arg, add tests 2015-05-07 15:30:56 +03:00
Péter Szilágyi af93217775 p2p: reduce the concurrent handshakes to 10/10 in/out 2015-05-07 15:22:09 +03:00
Péter Szilágyi 2060bc8bac p2p: fix dial throttling race condition 2015-05-07 15:22:08 +03:00
Péter Szilágyi 29fef349ef p2p: fix a dialing race in the throttler 2015-05-07 15:22:08 +03:00
Péter Szilágyi 3953bf0031 p2p: limit the outbound dialing too 2015-05-07 15:22:08 +03:00