Daniel Nephin 8e67d8eaeb sdk: mitigate api test timeout
Occasionally we are seeing the go-test-api job timeout at 10 minutes.
Looking at the stack trace I saw the following:

1. Lots of tests blocked on server.Stop in NewTestServerConfigT. This
   suggests that SIGINT is being sent to the server, but the server is
   not properly shutting down.

2. Over 20k goroutines that look like this:

goroutine 16355 [select, 8 minutes]:
net/http.(*persistConn).readLoop(0xc004270240)
    /usr/local/go/src/net/http/transport.go:2099 +0x99e
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1647 +0xc56

Issue 1 seems to be the main problem, but debugging that directly is not
possible because our buffered logs do not get sent when the tests
timeout. To mitigate this problem I've added a timeout to the cmd.Wait()
to force kill the process and return an error.

Unfortunately because we retry this operation, we still may not see the
cause because the next attempt will likely pass. I'm tempted to remove
the retry around NewTestServerConfigT.

Issue 2 seems to be caused by not closing the response body. Since the
request is performed many times in a loop, many goroutines are created
and are not closed until the response body is closed.
2020-08-06 17:00:20 -04:00
..
2019-04-30 16:27:16 -07:00
2020-08-06 17:00:20 -04:00

Consul Testing Utilities

This package provides some generic helpers to facilitate testing in Consul.

TestServer

TestServer is a harness for managing Consul agents and initializing them with test data. Using it, you can form test clusters, create services, add health checks, manipulate the K/V store, etc. This test harness is completely decoupled from Consul's core and API client, meaning it can be easily imported and used in external unit tests for various applications. It works by invoking the Consul CLI, which means it is a requirement to have Consul installed in the $PATH.

Following is an example usage:

package my_program

import (
	"testing"

	"github.com/hashicorp/consul/consul/structs"
	"github.com/hashicorp/consul/sdk/testutil"
)

func TestFoo_bar(t *testing.T) {
	// Create a test Consul server
	srv1, err := testutil.NewTestServerConfigT(t, nil)
	if err != nil {
		t.Fatal(err)
	}
	defer srv1.Stop()

	// Create a secondary server, passing in configuration
	// to avoid bootstrapping as we are forming a cluster.
	srv2, err := testutil.NewTestServerConfigT(t, func(c *testutil.TestServerConfig) {
		c.Bootstrap = false
	})
	if err != nil {
		t.Fatal(err)
	}
	defer srv2.Stop()

	// Join the servers together
	srv1.JoinLAN(t, srv2.LANAddr)

	// Create a test key/value pair
	srv1.SetKV(t, "foo", []byte("bar"))

	// Create lots of test key/value pairs
	srv1.PopulateKV(t, map[string][]byte{
		"bar": []byte("123"),
		"baz": []byte("456"),
	})

	// Create a service
	srv1.AddService(t, "redis", structs.HealthPassing, []string{"master"})

	// Create a service that will be accessed in target source code
	srv1.AddAccessibleService("redis", structs.HealthPassing, "127.0.0.1", 6379, []string{"master"})

	// Create a service check
	srv1.AddCheck(t, "service:redis", "redis", structs.HealthPassing)

	// Create a node check
	srv1.AddCheck(t, "mem", "", structs.HealthCritical)

	// The HTTPAddr field contains the address of the Consul
	// API on the new test server instance.
	println(srv1.HTTPAddr)

	// All functions also have a wrapper method to limit the passing of "t"
	wrap := srv1.Wrap(t)
	wrap.SetKV("foo", []byte("bar"))
}