Commit Graph

299 Commits

Author SHA1 Message Date
Felix Lange 04c6369a09 p2p, p2p/discover: track bootstrap state in p2p/discover
This change simplifies the dial scheduling logic because it
no longer needs to track whether the discovery table has been
bootstrapped.
2015-12-17 23:38:54 +01:00
Felix Lange d1f507b7f1 p2p/discover: support incomplete node URLs, add Resolve 2015-12-17 23:38:54 +01:00
Péter Szilágyi abb53644c6 p2p: always allow dynamic dials if network not disabled 2015-12-03 11:45:35 +02:00
Gustav Simonsson c8ad64f33c crypto, crypto/ecies, crypto/secp256k1: libsecp256k1 scalar mult
thanks to Felix Lange (fjl) for help with design & impl
2015-11-30 13:43:32 +01:00
Péter Szilágyi 9e1d9bff3b node: customizable protocol and service stacks 2015-11-27 11:06:12 +02:00
Jeffrey Wilcke e165c2d23c Merge pull request #1934 from karalabe/polish-protocol-infos
eth, p2p, rpc/api: polish protocol info gathering
2015-11-04 11:59:31 +01:00
Felix Lange f570b68ed1 p2p/nat: add docs for discover 2015-10-29 22:54:44 +01:00
Felix Lange bf11a47f22 Godeps: upgrade github.com/huin/goupnp to 90f71cb5 2015-10-29 22:53:59 +01:00
Péter Szilágyi e46ab3bdcd eth, p2p, rpc/api: polish protocol info gathering 2015-10-28 12:44:15 +02:00
Felix Lange 32dda97602 p2p/discover: ignore packet version numbers
The strict matching can get in the way of protocol upgrades.
2015-09-30 16:23:03 +02:00
Felix Lange 631bf36102 p2p/discover: remove unused lastLookup field 2015-09-30 16:23:03 +02:00
Felix Lange b4374436f3 p2p/discover: fix race involving the seed node iterator
nodeDB.querySeeds was not safe for concurrent use but could be called
concurrenty on multiple goroutines in the following case:

- the table was empty
- a timed refresh started
- a lookup was started and initiated refresh

These conditions are unlikely to coincide during normal use, but are
much more likely to occur all at once when the user's machine just woke
from sleep. The root cause of the issue is that querySeeds reused the
same leveldb iterator until it was exhausted.

This commit moves the refresh scheduling logic into its own goroutine
(so only one refresh is ever active) and changes querySeeds to not use
a persistent iterator. The seed node selection is now more random and
ignores nodes that have not been contacted in the last 5 days.
2015-09-30 16:23:03 +02:00
Péter Szilágyi c51e153b5c eth, metrics, p2p: prepare metrics and net packets to eth/62 2015-08-21 10:30:57 +03:00
Jeffrey Wilcke e2d44814a5 Merge pull request #1694 from obscuren/hide-fdtrack
fdtrack: hide message
2015-08-19 13:50:54 -07:00
Jeffrey Wilcke 269c5c7107 Revert "fdtrack: temporary hack for tracking file descriptor usage"
This reverts commit 5c949d3b3b.
2015-08-19 21:46:01 +02:00
Felix Lange dd54fef898 p2p/discover: don't attempt to replace nodes that are being replaced
PR #1621 changed Table locking so the mutex is not held while a
contested node is being pinged. If multiple nodes ping the local node
during this time window, multiple ping packets will be sent to the
contested node. The changes in this commit prevent multiple packets by
tracking whether the node is being replaced.
2015-08-19 14:57:16 +02:00
Felix Lange edccc7ae34 p2p: continue listening after temporary errors 2015-08-19 14:39:04 +02:00
Felix Lange 7d5ff770e2 p2p/discover: continue reading after temporary errors
Might solve #1579
2015-08-19 14:38:55 +02:00
Felix Lange a89cfe92cc Merge pull request #1470 from ebuchman/encHandshake
p2p: validate recovered ephemeral pubkey
2015-08-13 11:59:27 +02:00
Felix Lange 0d10d5a0a5 p2p: fix value of DiscSubprotocolError
We had the wrong value (12) since forever.
2015-08-12 14:15:54 +02:00
Felix Lange 590c99a98f p2p/discover: fix UDP reply packet timeout handling
If the timeout fired (even just nanoseconds) before the deadline of the
next pending reply, the timer was not rescheduled. The timer would've
been rescheduled anyway once the next packet was sent, but there were
cases where no next packet could ever be sent due to the locking issue
fixed in the previous commit.

As timing-related bugs go, this issue had been present for a long time
and I could never reproduce it. The test added in this commit did
reproduce the issue on about one out of 15 runs.
2015-08-11 11:42:17 +02:00
Felix Lange 01ed3fa1a9 p2p/discover: unlock the table during ping replacement
Table.mutex was being held while waiting for a reply packet, which
effectively made many parts of the whole stack block on that packet,
including the net_peerCount RPC call.
2015-08-11 11:42:17 +02:00
Felix Lange 6ee908848c p2p/nat: disable UPnP test on windows 2015-08-06 17:18:59 +02:00
Felix Lange b23b4dbd79 p2p/discover: close Table during testing
Not closing the table used to be fine, but now the table has a database.
2015-08-06 12:27:59 +02:00
Felix Lange 5c949d3b3b fdtrack: temporary hack for tracking file descriptor usage
Package fdtrack logs statistics about open file descriptors.
This should help identify the source of #1549.
2015-08-04 03:10:27 +02:00
Felix Lange bfbcfbe4a9 all: fix license headers one more time
I forgot to update one instance of "go-ethereum" in commit 3f047be5a.
2015-07-23 18:35:11 +02:00
Felix Lange 3f047be5aa all: update license headers to distiguish GPL/LGPL
All code outside of cmd/ is licensed as LGPL. The headers
now reflect this by calling the whole work "the go-ethereum library".
2015-07-22 18:51:45 +02:00
Ethan Buchman 37efd08b42 p2p: validate recovered ephemeral pubkey against checksum in decodeAuthMsg 2015-07-14 03:06:44 +00:00
Felix Lange bdae4fd573 all: add some godoc synopsis comments 2015-07-07 14:12:45 +02:00
Felix Lange ea54283b30 all: update license information 2015-07-07 14:12:44 +02:00
Péter Szilágyi 01fe972113 cmd, core, eth, metrics, p2p: require enabling metrics 2015-06-30 00:51:46 +02:00
Péter Szilágyi 216fc267fa p2p: fix local/remote cap/protocol mixup 2015-06-26 20:45:13 +03:00
Péter Szilágyi d84638bd31 p2p: support protocol version negotiation 2015-06-26 15:48:50 +03:00
Péter Szilágyi 6994a3daaa p2p: instrument P2P networking layer 2015-06-24 18:33:33 +03:00
Felix Lange 6fb810adaa p2p: throttle all discovery lookups
Lookup calls would spin out of control when network connectivity was
lost. The throttling that was in place only took effect when the table
returned zero results, which doesn't happen very often.

The new throttling should not have a negative impact when the host is
online. Lookups against the network take some time and dials for all
results must complete or hit the cache before a new one is started. This
usually takes longer than four seconds, leaving online lookups
unaffected.

Fixes #1296
2015-06-22 01:07:58 +02:00
Felix Lange 70da79f04c p2p: improve disconnect logging 2015-06-15 15:03:46 +02:00
Felix Lange 8dcbdcad0a p2p: track write errors and prevent writes during shutdown
As of this commit, we no longer rely on the protocol handler to report
write errors in a timely fashion. When a write fails, shutdown is
initiated immediately and no new writes can start. This will also
prevent new writes from starting after Server.Stop has been called.
2015-06-15 15:03:46 +02:00
Felix Lange a8e4cb6dfe p2p/discover: use separate rand.Source instances in tests
rand.Source isn't safe for concurrent use.
2015-06-10 15:18:01 +02:00
Felix Lange 261a8077c4 p2p/discover: deflake TestUDP_successfulPing 2015-06-10 13:08:21 +02:00
Péter Szilágyi 1cbbfbe7fa p2p: fix a close race in the dial test 2015-06-09 22:26:26 +03:00
Felix Lange 3239aca69b p2p: bump global write timeout to 20s
The previous value of 5 seconds causes timeouts for legitimate messages
if large messages are sent.
2015-06-09 17:07:10 +02:00
Péter Szilágyi ff84352fb7 p2p: fix close data race 2015-06-09 16:12:24 +03:00
Felix Lange fc6a5ae3ec p2p/nat: add timeout for UPnP SOAP requests 2015-06-04 22:25:43 +02:00
Felix Lange 8e4512a5e7 p2p/nat: bump timeout in TestAutoDiscRace 2015-05-28 01:09:26 +02:00
Péter Szilágyi 612f01400f p2p/discover: bond with seed nodes too (runs only if findnode failed) 2015-05-26 23:30:41 +02:00
Péter Szilágyi 3630432dfb p2p/discovery: fix a cornercase loop if no seeds or bootnodes are known 2015-05-26 23:30:40 +02:00
Péter Szilágyi f539ed1e66 p2p/discover: force refresh if the table is empty 2015-05-26 23:30:40 +02:00
Péter Szilágyi 5076170f34 p2p/discover: permit temporary bond failures for previously known nodes 2015-05-26 23:30:40 +02:00
Péter Szilágyi 6078aa08eb p2p/discover: watch find failures, evacuate on too many, rebond if failed 2015-05-26 23:30:40 +02:00
Péter Szilágyi 64174f196f p2p/discover: add support for counting findnode failures 2015-05-26 23:30:40 +02:00
Péter Szilágyi 68898a4d6b p2p: fix Self() panic if listening is disabled 2015-05-26 19:16:05 +03:00
Péter Szilágyi e1a0ee8fc5 cmd/geth, cmd/utils, eth, p2p: pass and honor a no discovery flag 2015-05-26 19:07:24 +03:00
Péter Szilágyi 278183c7e7 eth, p2p: start the p2p server even if maxpeers == 0 2015-05-26 17:49:37 +03:00
Felix Lange 9e1fd70b50 p2p: decrease frameReadTimeout to 30s
This detects hanging connections sooner. We send a ping every 15s and
other implementation have similar limits.
2015-05-25 01:17:14 +02:00
Felix Lange 1440f9a37a p2p: new dialer, peer management without locks
The most visible change is event-based dialing, which should be an
improvement over the timer-based system that we have at the moment.
The dialer gets a chance to compute new tasks whenever peers change or
dials complete. This is better than checking peers on a timer because
dials happen faster. The dialer can now make more precise decisions
about whom to dial based on the peer set and we can test those
decisions without actually opening any sockets.

Peer management is easier to test because the tests can inject
connections at checkpoints (after enc handshake, after protocol
handshake).

Most of the handshake stuff is now part of the RLPx code. It could be
exported or move to its own package because it is no longer entangled
with Server logic.
2015-05-25 01:17:14 +02:00
Felix Lange 9f38ef5d97 p2p/discover: add ReadRandomNodes 2015-05-25 01:17:14 +02:00
Felix Lange 64564da20b p2p: decrease maximum message size for devp2p to 1kB
The previous limit was 10MB which is unacceptable for all kinds
of reasons, the most important one being that we don't want to
allow the remote side to make us allocate 10MB at handshake time.
2015-05-25 01:17:14 +02:00
Felix Lange dbdc5fd4b3 p2p: delete Server.Broadcast 2015-05-25 01:17:14 +02:00
Péter Szilágyi cbd3ae6906 p2p/discover: fix #838, evacuate self entries from the node db 2015-05-21 19:41:46 +03:00
Péter Szilágyi af24c271c7 p2p/discover: fix database presistency test folder 2015-05-21 19:28:10 +03:00
Jeffrey Wilcke 90b94e64fc Merge pull request #971 from fjl/p2p-limit-tweaks
p2p: tweak connection limits
2015-05-14 08:15:51 -07:00
Felix Lange d2f119cf9b p2p/discover: limit open files for node database 2015-05-14 15:01:13 +02:00
Felix Lange 206fe25971 p2p: remove testlog 2015-05-14 14:56:34 +02:00
Felix Lange 7fa2607bd1 p2p/discover: bump maxBondingPingPongs to 16
This should increase the speed a bit because all findnode
results (up to 16) can be verified at the same time.
2015-05-14 14:53:29 +02:00
Felix Lange 691cb90284 p2p: log remote reason when disconnect is requested
The returned reason is currently not used except for the log
message. This change makes the log messages a bit more useful.
The handshake code also returns the remote reason.
2015-05-14 14:53:29 +02:00
Felix Lange c14de2e973 p2p/nat: tweak port mapping log messages and levels
People stil get confused about the messages. This commit changes
the levels so that the only thing printed at the default level (info)
is a successful mapping.
2015-05-14 12:54:59 +02:00
Felix Lange 663d4e0aff p2p/nat: add test for UPnP auto discovery via SSDP
The test listens for multicast UDP packets on the default interface
because I couldn't get it to work reliably on loopback without massive
changes to goupnp. This means that the test might fail when there is a
UPnP-enabled router attached on that interface. I checked that locally
by looping the test and it passes reliably because the local SSDP server
always responds faster.
2015-05-14 12:13:19 +02:00
Felix Lange 983f5a717a p2p/nat: fix concurrent access to autodisc Interface
Concurrent calls to Interface methods on autodisc could return a "not
discovered" error if the discovery did not finish before the call.
autodisc.wait expected the done channel to carry the found Interface
but it was closed instead.

The fix is to use sync.Once for now, which is easier to get right.
And there is a test. Finally.

This will have to change again when we introduce re-discovery.
2015-05-14 03:53:11 +02:00
Felix Lange 7efeb4bd96 p2p: bump maxAcceptConns and defaultDialTimout
On the test network, we've seen that it becomes harder to connect
if the queues are so short.
2015-05-14 03:48:28 +02:00
Felix Lange 251846d65a p2p/discover: fix out-of-bounds slicing for chunked neighbors packets
The code assumed that Table.closest always returns at least 13 nodes.
This is not true for small tables (e.g. during bootstrap).
2015-05-13 21:49:04 +02:00
subtly 8eef2b765a fix test. 2015-05-13 20:15:01 +02:00
subtly a32693770c Manual send of multiple neighbours packets. Test receiving multiple neighbours packets. 2015-05-13 20:03:17 +02:00
subtly 7473c93668 UDP Interop. Limit datagrams to 1280bytes.
We don't have a UDP which specifies any messages that will be 4KB. Aside from being implemented for months and a necessity for encryption and piggy-backing packets, 1280bytes is ideal, and, means this TODO can be completed!

Why 1280 bytes?
* It's less than the default MTU for most WAN/LAN networks. That means fewer fragmented datagrams (esp on well-connected networks).
* Fragmented datagrams and dropped packets suck and add latency while OS waits for a dropped fragment to never arrive (blocking readLoop())
* Most of our packets are < 1280 bytes.
* 1280 bytes is minimum datagram size and MTU for IPv6 -- on IPv6, a datagram < 1280bytes will *never* be fragmented.

UDP datagrams are dropped. A lot! And fragmented datagrams are worse. If a datagram has a 30% chance of being dropped, then a fragmented datagram has a 60% chance of being dropped. More importantly, we have signed packets and can't do anything with a packet unless we receive the entire datagram because the signature can't be verified. The same is true when we have encrypted packets.

So the solution here to picking an ideal buffer size for receiving datagrams is a number under 1400bytes. And the lower-bound value for IPv6 of 1280 bytes make's it a non-decision. On IPv4 most ISPs and 3g/4g/let networks have an MTU just over 1400 -- and *never* over 1500. Never -- that means packets over 1500 (in reality: ~1450) bytes are fragmented. And probably dropped a lot.

Just to prove the point, here are pings sending non-fragmented packets over wifi/ISP, and a second set of pings via cell-phone tethering. It's important to note that, if *any* router between my system and the EC2 node has a lower MTU, the message would not go through:

On wifi w/normal ISP:
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=104.831 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=119.004 ms
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 104.831/111.918/119.004/7.087 ms
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1


Tethering to O2:
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=107.844 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=105.127 ms
1458 bytes from 52.6.250.242: icmp_seq=2 ttl=42 time=120.483 ms
1458 bytes from 52.6.250.242: icmp_seq=3 ttl=42 time=102.136 ms
2015-05-13 19:03:00 +02:00
Bas van Kervel 95773b9673 removed redundant newlines in import block 2015-05-12 15:20:53 +02:00
Bas van Kervel b79dd188d9 replaced several path.* with filepath.* which is platform independent 2015-05-12 14:24:11 +02:00
Felix Lange d4f0a67323 p2p: drop connections with no matching protocols 2015-05-08 16:09:55 +02:00
Felix Lange 9c0f36c46d p2p: use maxDialingConns instead of maxAcceptConns as dial limit 2015-05-08 16:09:55 +02:00
Felix Lange 914e57e49b p2p: fix disconnect at capacity
With the introduction of static/trusted nodes, the peer count
can go above MaxPeers. Update the capacity check to handle this.
While here, decouple the trusted nodes check from the handshake
by passing a function instead.
2015-05-08 16:09:54 +02:00
Péter Szilágyi 8735e5addd p2p: increase the handshake timeout in the tests 2015-05-07 15:30:56 +03:00
Péter Szilágyi 4d5a719f25 cmd, eth, p2p: introduce pending peer cli arg, add tests 2015-05-07 15:30:56 +03:00
Péter Szilágyi af93217775 p2p: reduce the concurrent handshakes to 10/10 in/out 2015-05-07 15:22:09 +03:00
Péter Szilágyi 2060bc8bac p2p: fix dial throttling race condition 2015-05-07 15:22:08 +03:00
Péter Szilágyi 29fef349ef p2p: fix a dialing race in the throttler 2015-05-07 15:22:08 +03:00
Péter Szilágyi 3953bf0031 p2p: limit the outbound dialing too 2015-05-07 15:22:08 +03:00
Jeffrey Wilcke a0cb1945ae Merge pull request #866 from fjl/p2p-last-minute
Last minute p2p fixes
2015-05-06 14:49:52 -07:00
Felix Lange 3e2a928caa p2p: stop dialing at half the maximum peer count 2015-05-06 23:44:51 +02:00
Felix Lange 6a2fec5309 p2p, whisper: use glog for peer-level logging 2015-05-06 23:19:14 +02:00
Felix Lange bcfd788661 p2p/discover: bump packet timeouts to 500ms 2015-05-06 22:59:00 +02:00
Felix Lange fd4b75cfa8 p2p/nat: less confusing error logging 2015-05-06 22:58:03 +02:00
obscuren 062fa049d0 fixed merge issue 2015-05-06 22:54:21 +02:00
Felix Lange 2adcc31bb4 p2p/discover: new distance metric based on sha3(id)
The previous metric was pubkey1^pubkey2, as specified in the Kademlia
paper. We missed that EC public keys are not uniformly distributed.
Using the hash of the public keys addresses that. It also makes it
a bit harder to generate node IDs that are close to a particular node.
2015-05-06 16:10:41 +02:00
Péter Szilágyi 4accc187d5 eth, p2p: add trusted node list beside static list 2015-05-04 13:59:51 +03:00
Péter Szilágyi 54db54931e p2p: add static node dialing test 2015-05-04 13:08:42 +03:00
Péter Szilágyi e82ddd9198 p2p: correct a leftover trusted -> static 2015-04-30 19:34:33 +03:00
Péter Szilágyi 413ace37d3 eth, p2p: rename trusted nodes to static, drop inbound extra slots 2015-04-30 19:32:48 +03:00
Péter Szilágyi 701591b403 cmd, eth, p2p: fix review issues enumerated by Felix 2015-04-30 16:15:29 +03:00
Péter Szilágyi 1528dbc171 p2p: add trust check to handshake, test privileged connectivity
Conflicts:
	p2p/server_test.go
2015-04-30 16:06:47 +03:00
Péter Szilágyi 14f32a0c3a p2p: reduce the severity of a debug log 2015-04-30 16:04:09 +03:00
Péter Szilágyi de0549fabb cmd/geth, cmd/mist, cmd/utils, eth, p2p: support trusted peers 2015-04-30 16:03:10 +03:00
Felix Lange 72ab6d3255 p2p/discover: track sha3(ID) in Node 2015-04-30 15:02:23 +02:00