mirror of
https://github.com/status-im/consul.git
synced 2025-01-24 20:51:10 +00:00
8e67d8eaeb
Occasionally we are seeing the go-test-api job timeout at 10 minutes. Looking at the stack trace I saw the following: 1. Lots of tests blocked on server.Stop in NewTestServerConfigT. This suggests that SIGINT is being sent to the server, but the server is not properly shutting down. 2. Over 20k goroutines that look like this: goroutine 16355 [select, 8 minutes]: net/http.(*persistConn).readLoop(0xc004270240) /usr/local/go/src/net/http/transport.go:2099 +0x99e created by net/http.(*Transport).dialConn /usr/local/go/src/net/http/transport.go:1647 +0xc56 Issue 1 seems to be the main problem, but debugging that directly is not possible because our buffered logs do not get sent when the tests timeout. To mitigate this problem I've added a timeout to the cmd.Wait() to force kill the process and return an error. Unfortunately because we retry this operation, we still may not see the cause because the next attempt will likely pass. I'm tempted to remove the retry around NewTestServerConfigT. Issue 2 seems to be caused by not closing the response body. Since the request is performed many times in a loop, many goroutines are created and are not closed until the response body is closed.
Consul Testing Utilities
This package provides some generic helpers to facilitate testing in Consul.
TestServer
TestServer is a harness for managing Consul agents and initializing them with
test data. Using it, you can form test clusters, create services, add health
checks, manipulate the K/V store, etc. This test harness is completely decoupled
from Consul's core and API client, meaning it can be easily imported and used in
external unit tests for various applications. It works by invoking the Consul
CLI, which means it is a requirement to have Consul installed in the $PATH
.
Following is an example usage:
package my_program
import (
"testing"
"github.com/hashicorp/consul/consul/structs"
"github.com/hashicorp/consul/sdk/testutil"
)
func TestFoo_bar(t *testing.T) {
// Create a test Consul server
srv1, err := testutil.NewTestServerConfigT(t, nil)
if err != nil {
t.Fatal(err)
}
defer srv1.Stop()
// Create a secondary server, passing in configuration
// to avoid bootstrapping as we are forming a cluster.
srv2, err := testutil.NewTestServerConfigT(t, func(c *testutil.TestServerConfig) {
c.Bootstrap = false
})
if err != nil {
t.Fatal(err)
}
defer srv2.Stop()
// Join the servers together
srv1.JoinLAN(t, srv2.LANAddr)
// Create a test key/value pair
srv1.SetKV(t, "foo", []byte("bar"))
// Create lots of test key/value pairs
srv1.PopulateKV(t, map[string][]byte{
"bar": []byte("123"),
"baz": []byte("456"),
})
// Create a service
srv1.AddService(t, "redis", structs.HealthPassing, []string{"master"})
// Create a service that will be accessed in target source code
srv1.AddAccessibleService("redis", structs.HealthPassing, "127.0.0.1", 6379, []string{"master"})
// Create a service check
srv1.AddCheck(t, "service:redis", "redis", structs.HealthPassing)
// Create a node check
srv1.AddCheck(t, "mem", "", structs.HealthCritical)
// The HTTPAddr field contains the address of the Consul
// API on the new test server instance.
println(srv1.HTTPAddr)
// All functions also have a wrapper method to limit the passing of "t"
wrap := srv1.Wrap(t)
wrap.SetKV("foo", []byte("bar"))
}