op-geth/swarm/network/simulations/overlay_test.go
Ferenc Szabo 50b872bf05 p2p, swarm: fix node up races by granular locking (#18976)
* swarm/network: DRY out repeated giga comment

I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.

* p2p/simulations: encapsulate Node.Up field so we avoid data races

The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the  field. Other times the locking was missed and we had
a data race.

For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.

resolves: ethersphere/go-ethereum#1146

* p2p/simulations: fix unmarshal of simulations.Node

Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.

Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177

* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON

* p2p/simulations: revert back to defer Unlock() pattern for Network

It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.

The patten was abandoned in 85a79b3ad3c5863f8612d25c246bcfad339f36b7,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.

* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON

* p2p/simulations: remove JSON annotation from private fields of Node

As unexported fields are not serialized.

* p2p/simulations: fix deadlock in Network.GetRandomDownNode()

Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock

On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.

* p2p/simulations: ensure method conformity for Network

Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.

* p2p/simulations: fix deadlock during network shutdown

`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.

`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.

Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.

Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223

* swarm/state: simplify if statement in DBStore.Put()

* p2p/simulations: remove faulty godoc from private function

The comment started with the wrong method name.

The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
2019-02-18 07:38:14 +01:00

195 lines
5.3 KiB
Go

// Copyright 2018 The go-ethereum Authors
// This file is part of the go-ethereum library.
//
// The go-ethereum library is free software: you can redistribute it and/or modify
// it under the terms of the GNU Lesser General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// The go-ethereum library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public License
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.
package main
import (
"context"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"net/http/httptest"
"net/url"
"testing"
"time"
"github.com/ethereum/go-ethereum/p2p/enode"
"github.com/ethereum/go-ethereum/p2p/simulations"
"github.com/ethereum/go-ethereum/swarm/log"
)
var (
nodeCount = 10
)
//This test is used to test the overlay simulation.
//As the simulation is executed via a main, it is easily missed on changes
//An automated test will prevent that
//The test just connects to the simulations, starts the network,
//starts the mocker, gets the number of nodes, and stops it again.
//It also provides a documentation on the steps needed by frontends
//to use the simulations
func TestOverlaySim(t *testing.T) {
//start the simulation
log.Info("Start simulation backend")
//get the simulation networ; needed to subscribe for up events
net := newSimulationNetwork()
//create the overlay simulation
sim := newOverlaySim(net)
//create a http test server with it
srv := httptest.NewServer(sim)
defer srv.Close()
log.Debug("Http simulation server started. Start simulation network")
//start the simulation network (initialization of simulation)
resp, err := http.Post(srv.URL+"/start", "application/json", nil)
if err != nil {
t.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Fatalf("Expected Status Code %d, got %d", http.StatusOK, resp.StatusCode)
}
log.Debug("Start mocker")
//start the mocker, needs a node count and an ID
resp, err = http.PostForm(srv.URL+"/mocker/start",
url.Values{
"node-count": {fmt.Sprintf("%d", nodeCount)},
"mocker-type": {simulations.GetMockerList()[0]},
})
if err != nil {
t.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
reason, err := ioutil.ReadAll(resp.Body)
if err != nil {
t.Fatal(err)
}
t.Fatalf("Expected Status Code %d, got %d, response body %s", http.StatusOK, resp.StatusCode, string(reason))
}
//variables needed to wait for nodes being up
var upCount int
trigger := make(chan enode.ID)
//wait for all nodes to be up
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
//start watching node up events...
go watchSimEvents(net, ctx, trigger)
//...and wait until all expected up events (nodeCount) have been received
LOOP:
for {
select {
case <-trigger:
//new node up event received, increase counter
upCount++
//all expected node up events received
if upCount == nodeCount {
break LOOP
}
case <-ctx.Done():
t.Fatalf("Timed out waiting for up events")
}
}
//at this point we can query the server
log.Info("Get number of nodes")
//get the number of nodes
resp, err = http.Get(srv.URL + "/nodes")
if err != nil {
t.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Fatalf("err %s", resp.Status)
}
b, err := ioutil.ReadAll(resp.Body)
if err != nil {
t.Fatal(err)
}
//unmarshal number of nodes from JSON response
var nodesArr []simulations.Node
err = json.Unmarshal(b, &nodesArr)
if err != nil {
t.Fatal(err)
}
//check if number of nodes received is same as sent
if len(nodesArr) != nodeCount {
t.Fatal(fmt.Errorf("Expected %d number of nodes, got %d", nodeCount, len(nodesArr)))
}
//need to let it run for a little while, otherwise stopping it immediately can crash due running nodes
//wanting to connect to already stopped nodes
time.Sleep(1 * time.Second)
log.Info("Stop the network")
//stop the network
resp, err = http.Post(srv.URL+"/stop", "application/json", nil)
if err != nil {
t.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Fatalf("err %s", resp.Status)
}
log.Info("Reset the network")
//reset the network (removes all nodes and connections)
resp, err = http.Post(srv.URL+"/reset", "application/json", nil)
if err != nil {
t.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Fatalf("err %s", resp.Status)
}
}
//watch for events so we know when all nodes are up
func watchSimEvents(net *simulations.Network, ctx context.Context, trigger chan enode.ID) {
events := make(chan *simulations.Event)
sub := net.Events().Subscribe(events)
defer sub.Unsubscribe()
for {
select {
case ev := <-events:
//only catch node up events
if ev.Type == simulations.EventTypeNode {
if ev.Node.Up() {
log.Debug("got node up event", "event", ev, "node", ev.Node.Config.ID)
select {
case trigger <- ev.Node.Config.ID:
case <-ctx.Done():
return
}
}
}
case <-ctx.Done():
return
}
}
}