Commit Graph

319 Commits

Author SHA1 Message Date
Ferenc Szabo 50b872bf05 p2p, swarm: fix node up races by granular locking (#18976)
* swarm/network: DRY out repeated giga comment

I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.

* p2p/simulations: encapsulate Node.Up field so we avoid data races

The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the  field. Other times the locking was missed and we had
a data race.

For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.

resolves: ethersphere/go-ethereum#1146

* p2p/simulations: fix unmarshal of simulations.Node

Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.

Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177

* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON

* p2p/simulations: revert back to defer Unlock() pattern for Network

It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.

The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.

* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON

* p2p/simulations: remove JSON annotation from private fields of Node

As unexported fields are not serialized.

* p2p/simulations: fix deadlock in Network.GetRandomDownNode()

Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock

On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.

* p2p/simulations: ensure method conformity for Network

Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.

* p2p/simulations: fix deadlock during network shutdown

`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.

`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.

Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.

Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223

* swarm/state: simplify if statement in DBStore.Put()

* p2p/simulations: remove faulty godoc from private function

The comment started with the wrong method name.

The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
2019-02-18 07:38:14 +01:00
gluk256 12ca3b172a swarm/pss: refactoring (#19110)
* swarm/pss: split pss and keystore

* swarm/pss: moved whisper to keystore

* swarm/pss: goimports fixed
2019-02-17 06:29:41 +01:00
Elad 5b8ae7885e swarm/storage: fix influxdb gc metrics report (#19102) 2019-02-15 07:41:42 +01:00
holisticode 2af24724dd swarm/network: Saturation check for healthy networks (#19071)
* swarm/network: new saturation for  implementation

* swarm/network: re-added saturation func in Kademlia as it is used elsewhere

* swarm/network: saturation with higher MinBinSize

* swarm/network: PeersPerBin with depth check

* swarm/network: edited tests to pass new saturated check

* swarm/network: minor fix saturated check

* swarm/network/simulations/discovery: fixed renamed RPC call

* swarm/network: renamed to isSaturated and returns bool

* swarm/network: early depth check
2019-02-14 19:01:50 +01:00
Elad 3ee09ba035 swarm/storage/netstore: add fetcher cancellation on shutdown (#19049)
swarm/network/stream: remove netstore internal wg
swarm/network/stream: run individual tests with t.Run
2019-02-14 07:51:57 +01:00
Janoš Guljaš 3fd6db2bf6 swarm: fix network/stream data races (#19051)
* swarm/network/stream: newStreamerTester cleanup only if err is nil

* swarm/network/stream: raise newStreamerTester waitForPeers timeout

* swarm/network/stream: fix data races in GetPeerSubscriptions

* swarm/storage: prevent data race on LDBStore.batchesC

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461775049

* swarm/network/stream: fix TestGetSubscriptionsRPC data race

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461768477

* swarm/network/stream: correctly use Simulation.Run callback

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461783804

* swarm/network: protect addrCountC in Kademlia.AddrCountC function

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462273444

* p2p/simulations: fix a deadlock calling getRandomNode with lock

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462317407

* swarm/network/stream: terminate disconnect goruotines in tests

* swarm/network/stream: reduce memory consumption when testing data races

* swarm/network/stream: add watchDisconnections helper function

* swarm/network/stream: add concurrent counter for tests

* swarm/network/stream: rename race/norace test files and use const

* swarm/network/stream: remove watchSim and its panic

* swarm/network/stream: pass context in watchDisconnections

* swarm/network/stream: add concurrent safe bool for watchDisconnections

* swarm/storage: fix LDBStore.batchesC data race by not closing it
2019-02-13 13:03:23 +01:00
Elad d596bea2d5 swarm: fix uptime gauge update goroutine leak by introducing cleanup functions (#19040) 2019-02-13 08:15:03 +01:00
holisticode 3d22a46c94 swarm/storage: fix HashExplore concurrency bug ethersphere#1211 (#19028)
* swarm/storage: fix HashExplore concurrency bug ethersphere#1211

*  swarm/storage: lock as value not pointer

* swarm/storage: wait for  to complete

* swarm/storage: fix linter problems

* swarm/storage: append to nil slice
2019-02-13 00:17:44 +01:00
gluk256 b30109df3c swarm/pss: mutex lifecycle fixed (#19045) 2019-02-13 00:12:41 +01:00
Rafael Matias 6cb7d52a29 swarm/docker: add global-store and split docker images (#19038) 2019-02-12 08:34:08 +01:00
Ferenc Szabo 27e3f96819 swarm: CI race detector test adjustments (#19017) 2019-02-08 17:07:11 +01:00
gluk256 cde02e017e swarm/pss: transition to whisper v6 (#19023) 2019-02-08 17:05:10 +01:00
lash 0c10d37606 swarm/network, swarm/storage: Preserve opentracing contexts (#19022) 2019-02-08 16:57:48 +01:00
Janoš Guljaš 4f3d22f06c swarm/storage/localstore: new localstore package (#19015) 2019-02-07 18:40:26 +01:00
holisticode 41597c2856 swarm: Debug API and HasChunks() API endpoint (#18980) 2019-02-07 15:49:19 +01:00
Janoš Guljaš 33d0a0efa6 cmd/swarm/global-store: global store cmd (#19014) 2019-02-07 15:46:58 +01:00
holisticode 7f55b0cbd8 cmd/swarm: hashes command (#19008) 2019-02-07 13:51:24 +01:00
Kiel barry 53b823afc8
contracts/*: golint updates for this or self warning 2019-02-07 13:15:14 +02:00
holisticode 3eff652a7b swarm/storage: Get all chunk references for a given file (#19002) 2019-02-06 12:16:43 +01:00
lash 7c60d0a6a2 swarm/pss: Remove pss service leak in test (#18992) 2019-02-05 14:35:20 +01:00
Ferenc Szabo 1c3aa8d9b1 swarm/storage: fix test timeout with -race by increasing mget timeout 2019-02-05 14:34:34 +01:00
Anton Evangelatov 597597e8b2 swarm/network: refactor simulation tests bootstrap (#18975) 2019-02-01 09:58:46 +01:00
holisticode 43e1b7b124 swarm: GetPeerSubscriptions RPC (#18972) 2019-01-30 21:03:08 +01:00
Janoš Guljaš 592bf6a59c swarm: fix flaky delivery tests (#18971) 2019-01-30 14:03:11 +01:00
lash f9401ae011 swarm/network: Remove extra random peer, connect test sanity, comments (#18964) 2019-01-30 09:49:58 +01:00
Felix Lange f4094d09cd params, swarm/version: Geth v1.9.0 unstable, Swarm v0.3.11-unstable 2019-01-29 17:43:13 +01:00
Anton Evangelatov 21acf0bc8d cmd/utils: allow for multiple influxdb tags (#18520)
This PR is replacing the metrics.influxdb.host.tag cmd-line flag with metrics.influxdb.tags - a comma-separated key/value tags, that are passed to the InfluxDB reporter, so that we can index measurements with multiple tags, and not just one host tag.

This will be useful for Swarm, where we want to index measurements not just with the host tag, but also with bzzkey and git commit version (for long-running deployments).
2019-01-29 09:14:24 +01:00
Janoš Guljaš 104e6b2050 swarm/pss/notify: shutdown net in TestStart to fix OOM issue (#18953) 2019-01-28 16:08:33 +01:00
Ferenc Szabo 2209fede4e swarm/pss: fix data race on topicHandlerCaps map (#18523) 2019-01-25 20:18:28 +01:00
Jerzy Lasyk f28da4f602 swarm/metrics: Send the accounting registry to InfluxDB (#18470) 2019-01-24 18:57:20 +01:00
Elad 2abeb35d54 p2p/testing, swarm: remove unused testing.T in protocol tester (#18500) 2019-01-24 17:23:34 +01:00
Ferenc Szabo 6167dd65b5 swarm/pss: fix data race in notify_test.go (TestStart) (#18518) 2019-01-24 17:07:43 +01:00
gluk256 ad13d2d407 swarm/version: commit version added (#18510) 2019-01-24 12:35:10 +01:00
Ferenc Szabo 3591fc603f swarm/storage: Fix race in TestLDBStoreCollectGarbage. Disable testLDBStoreRemoveThenCollectGarbage (#18512) 2019-01-24 12:34:12 +01:00
Janoš Guljaš fa34429a26 swarm: fix a data race on startTime (#18511) 2019-01-24 12:02:47 +01:00
Anton Evangelatov bbd120354a
swarm: bootnode-mode, new bootnodes and no p2p package discovery (#18498) 2019-01-24 12:02:18 +01:00
gluk256 105008b6a1 swarm/pss: fixing race condition (#18487) 2019-01-21 15:22:51 +01:00
Viktor Trón 15b9b39e6c
swarm/network: unskip tests previously skipped due to suggestPeer issues (#18477) 2019-01-19 08:12:57 +01:00
Ferenc Szabo 19bfcbf911 swarm/network: fix data race in fetcher_test.go (#18469) 2019-01-17 16:45:36 +01:00
Ferenc Szabo 4f8ec44565 swarm/network: fix data race in stream.(*Peer).handleOfferedHashesMsg() (#18468)
* swarm/network: fix data race in stream.(*Peer).handleOfferedHashesMsg()

handleOfferedHashesMsg() contained a data race:
- read => in a goroutine, call to c.batchDone()
- write => in the main thread, write to c.sessionAt

c.batchDone() contained a call to c.AddInterval(). Client was a value
receiver for AddInterval. So on c.AddInterval() call the whole client
struct got copied (read) while one of its field was modified in
handleOfferedHashesMsg() (write).

fixes ethersphere/go-ethereum#1086

* swarm/network: simplify some trivial statements
2019-01-17 14:44:29 +01:00
Elad 81e26d5a48 swarm/network: fix data race warning on TestBzzHandshakeLightNode (#18459) 2019-01-17 11:38:23 +01:00
Viktor Trón bcb2594151
swarm/network: rewrite of peer suggestion engine, fix skipped tests (#18404)
* swarm/network: fix skipped tests related to suggestPeer

* swarm/network: rename depth to radius

* swarm/network: uncomment assertHealth and improve comments

* swarm/network: remove commented code

* swarm/network: kademlia suggestPeer algo correction

* swarm/network: kademlia suggest peer

 * simplify suggest Peer code
 * improve peer suggestion algo
 * add comments
 * kademlia testing improvements
   * assertHealth -> checkHealth (test helper)
   * testSuggestPeer -> checkSuggestPeer (test helper)
   * remove testSuggestPeerBug and TestKademliaCase

* swarm/network: kademlia suggestPeer cleanup, improved comments

* swarm/network: minor comment, discovery test default arg
2019-01-17 07:29:34 +01:00
Elad 34f11e752f cmd/swarm/swarm-snapshot: swarm snapshot generator (#18453)
* cmd/swarm/swarm-snapshot: add binary to create network snapshots

* cmd/swarm/swarm-snapshot: refactor and extend tests

* p2p/simulations: remove unused triggerChecks func and fix linter

* internal/cmdtest: raise the timeout for killing TestCmd

* cmd/swarm/swarm-snapshot: add more comments and other minor adjustments

* cmd/swarm/swarm-snapshot: remove redundant check in createSnapshot

* cmd/swarm/swarm-snapshot: change comment wording

* p2p/simulations: revert Simulation.Run from master

https://github.com/ethersphere/go-ethereum/pull/1077/files#r247078904

* cmd/swarm/swarm-snapshot: address pr comments

* swarm/network/simulations/discovery: removed snapshot write to file

* cmd/swarm/swarm-snapshot, swarm/network/simulations: removed redundant connection event check, fixed lint error
2019-01-16 14:33:02 +01:00
Janoš Guljaš f728837ee6 swarm/storage: fix mockNetFetcher data races (#18462)
fixes: ethersphere/go-ethereum#1117
2019-01-16 14:31:32 +01:00
Janoš Guljaš 96c7c18b18 swarm/network: fix data race in TestNetworkID test (#18460) 2019-01-16 12:56:34 +01:00
Péter Szilágyi 24d66944cb
params, swarm: begin Geth v1.8.22 and Swarm v0.3.10 cycle 2019-01-15 22:52:47 +02:00
Péter Szilágyi 9dc5d1a915
params, swarm: release Geth v1.8.21 and Swarm v0.3.9 2019-01-15 22:51:31 +02:00
gluk256 4aeeecfded swarm/pot: each() functions refactored (#18452) 2019-01-15 11:51:33 +01:00
gluk256 1636d9574b swarm/pot: pot.remove fixed (#18431)
* swarm/pot: refactored pot.remove(), updated comments

* swarm/pot: comments updated
2019-01-11 20:42:33 +01:00
holisticode 88168ff5c5 Stream subscriptions (#18355)
* swarm/network: eachBin now starts at kaddepth for nn

* swarm/network: fix Kademlia.EachBin

* swarm/network: fix kademlia.EachBin

* swarm/network: correct EachBin implementation according to requirements

* swarm/network: less addresses simplified tests

* swarm: calc kad depth outside loop in EachBin test

* swarm/network: removed printResults

* swarm/network: cleanup imports

* swarm/network: remove kademlia.EachBin; fix RequestSubscriptions and add unit test

* swarm/network/stream: address PR comments

* swarm/network/stream: package-wide subscriptionFunc

* swarm/network/stream: refactor to kad.EachConn
2019-01-11 15:08:09 +01:00