consul

Commit Graph

Author	SHA1	Message	Date
Daniel Nephin	5d4df54296	agent: extract dependency creation from New With this change, Agent.New() accepts many of the dependencies instead of creating them in New. Accepting fully constructed dependencies from a constructor makes the type easier to test, and easier to change. There are still a number of dependencies created in Start() which can be addressed in a follow up.	2020-08-18 19:04:55 -04:00
Daniel Nephin	070e843113	testutil: Add t.Cleanup to TempDir TempDir registers a Cleanup so that the directory is always removed. To disable to cleanup, set the TEST_NOCLEANUP env var.	2020-08-14 13:19:10 -04:00
Daniel Nephin	3a4e62836b	testing: Remove TestAgent.Key and change TestAgent.DataDir TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of tests helps keep the shared part small and more focused. This required a second change (which I was planning on making anyway), which was to change the behaviour of DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.	2020-08-13 17:53:24 -04:00
Daniel Nephin	b1679508d4	testing: use t.Cleanup in TestAgent for returnPorts	2020-08-13 17:09:37 -04:00
Daniel Nephin	9919e5dfa5	agent: unmethod consulConfig To allow us to move newConsulConfig out of Agent.	2020-08-13 11:58:21 -04:00
Daniel Nephin	38980ebb4c	config: Make Source an interface This will allow us to accept config from auto-config without needing to go through a serialziation cycle.	2020-08-10 12:46:28 -04:00
Daniel Nephin	51efba2c7d	testutil: NewLogBuffer - buffer logs until a test fails Replaces #7559 Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case. Attaching log output from background goroutines also cause data races. If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race. The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long. This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful. All of the logs are printed from the test goroutine, so they should be associated with the correct test. Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly. Related: https://github.com/golang/go/issues/38458 (may be fixed in go1.15) https://github.com/golang/go/issues/38382#issuecomment-612940030	2020-07-21 12:50:40 -04:00
Daniel Nephin	a5e45defb1	agent/http: un-embed the HTTPServer The embedded HTTPServer struct is not used by the large HTTPServer struct. It is used by tests and the agent. This change is a small first step in the process of removing that field. The eventual goal is to reduce the scope of HTTPServer making it easier to test, and split into separate packages.	2020-07-02 17:21:12 -04:00
Matt Keeler	d6e05482ab	Allow cancelling startup when performing auto-config (#8157 ) Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>	2020-06-19 15:16:00 -04:00
Matt Keeler	3dbbd2d37d	Implement Client Agent Auto Config There are a couple of things in here. First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism. This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.	2020-06-17 16:49:46 -04:00
Daniel Nephin	77101eee82	config: rename Flags to BuilderOpts Flags is an overloaded term in this context. It generally is used to refer to command line flags. This struct, however, is a data object used as input to the construction. It happens to be partially populated by command line flags, but otherwise has very little to do with them. Renaming this struct should make the actual responsibility of this struct more obvious, and remove the possibility that it is confused with command line flags. This change is in preparation for adding additional fields to BuilderOpts.	2020-06-16 12:51:19 -04:00
R.B. Boyer	ffb9c7d6f7	acl: remove the deprecated `acl_enforce_version_8` option (#7991 ) Fixes #7292	2020-05-29 16:16:03 -05:00
Daniel Nephin	e759daafdd	Rename NewTestAgentWithFields to StartTestAgent This function now only starts the agent. Using: git grep -l 'StartTestAgent(t, true,' \| \ xargs sed -i -e 's/StartTestAgent(t, true,/StartTestAgent(t,/g'	2020-03-31 17:14:55 -04:00
Daniel Nephin	f9f6b14533	Convert the remaining calls to NewTestAgentWithFields After removing the t.Name() parameter with sed, convert the last few tests which use a custom name to call NewTestAgentWithFields instead.	2020-03-31 17:14:55 -04:00
Daniel Nephin	475659a132	Remove name from NewTestAgent Using: git grep -l 'NewTestAgent(t, t.Name(),' \| \ xargs sed -i -e 's/NewTestAgent(t, t.Name(),/NewTestAgent(t,/g'	2020-03-31 16:13:44 -04:00
Daniel Nephin	ad7c78f134	Remove t.Name() from TestAgent.Name And re-add the name to the logger so that log messages from different agents in a single can be identified.	2020-03-30 16:47:24 -04:00
Daniel Nephin	dd40a1535e	testing: reduce verbosity of output log Previously the log output included the test name twice and a long date format. The test output is already grouped by test, so adding the test name did not add any new information. The date and time are only useful to understand elapsed time, so using a short format should provide succident detail. Also fixed a bug in NewTestAgentWithFields where nil was returned instead of the test agent.	2020-03-30 13:23:13 -04:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
gaoxinge	216eb29d6b	tests: convert windows style path to posix style path to avoid hcl parsing error (#6351 )	2020-02-11 10:13:31 +01:00
Chris Piraino	401221de58	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matt Keeler	8f0ab0129e	Miscellaneous Fixes (#6896 ) Ensure we close the Sentinel Evaluator so as not to leak go routines Fix a bunch of test logging so that various warnings when starting a test agent go to the ltest logger and not straight to stdout. Various canned ent meta types always return a valid pointer (no more nils). This allows us to blindly deref + assign in various places. Update ACL index tracking to ensure oss -> ent upgrades will work as expected. Update ent meta parsing to include function to disallow wildcarding.	2019-12-06 14:01:34 -05:00
Matt Keeler	deb91f3d3c	[Feature] API: Add a internal endpoint to query for ACL authori… (#6888 ) * Implement endpoint to query whether the given token is authorized for a set of operations * Updates to allow for remote ACL authorization via RPC This is only used when making an authorization request to a different datacenter.	2019-12-06 09:25:26 -05:00
Matt Keeler	923d8671a4	Add support for parameterizing the ACL config used with a TestA… (#6559 ) * Add support for parameterizing the ACL config used with a TestAgent Using tokens that are UUIDs will get rid of some warnings * Refactor to allow setting all tokens and change the template to ignore unset values.	2019-09-27 17:06:43 -04:00
R.B. Boyer	f9496dc627	sdk: add freelist tracking and ephemeral port range skipping to freeport This should cut down on test flakiness. Problems handled: - If you had enough parallel test cases running, the former circular approach to handling the port block could hand out the same port to multiple cases before they each had a chance to bind them, leading to one of the two tests to fail. - The freeport library would allocate out of the ephemeral port range. This has been corrected for Linux (which should cover CI). - The library now waits until a formerly-in-use port is verified to be free before putting it back into circulation.	2019-09-17 14:30:43 -05:00
R.B. Boyer	a86e63f81e	test: actually wait for the TestAgent to be fully shutdown (#6441 )	2019-09-05 13:36:26 -05:00
Sarah Adams	001137e5e5	test: ensure all TestAgent constructions use a constructor (#6443 ) ensure all TestAgent constructions use a constructor to get start retries + test logs going to the right place Fixes #6435	2019-09-05 10:24:36 -07:00
Sarah Adams	74461406e0	remove funky panic/recover in agent tests (#6442 )	2019-09-04 13:59:11 -07:00
Sarah Adams	4ed5515fca	refactor & add better retry logic to NewTestAgent (#6363 ) Fixes #6361	2019-09-03 15:05:51 -07:00
R.B. Boyer	7deaba63e1	test: ensure the node name is a valid dns name (#6424 ) The space in the node name was making every test emit a useless warning.	2019-08-29 16:52:13 -05:00
R.B. Boyer	b962fe38cd	test: send testagent logs through testing.Logf (#6411 )	2019-08-27 12:21:30 -05:00
R.B. Boyer	91da908d2f	test: fix TestAgent.Start() to not segfault if the DNSServer cannot ListenAndServe (#6409 ) The embedded `Server` field on a `DNSServer` is only set inside of the `ListenAndServe` method. If that method fails for reasons like the address being in use and is not bindable, then the `Server` field will not be set and the overall `Agent.Start()` will fail. This will trigger the inner loop of `TestAgent.Start()` to invoke `ShutdownEndpoints` which will attempt to pretty print the DNS servers using fields on that inner `Server` field. Because it was never set, this causes a nil pointer dereference and crashes the test.	2019-08-27 10:45:05 -05:00
Mike Morris	65be58703c	connect: remove managed proxies (#6220 ) * connect: remove managed proxies implementation and all supporting config options and structs * connect: remove deprecated ProxyDestination * command: remove CONNECT_PROXY_TOKEN env var * agent: remove entire proxyprocess proxy manager * test: remove all managed proxy tests * test: remove irrelevant managed proxy note from TestService_ServerTLSConfig * test: update ContentHash to reflect managed proxy removal * test: remove deprecated ProxyDestination test * telemetry: remove managed proxy note * http: remove /v1/agent/connect/proxy endpoint * ci: remove deprecated test exclusion * website: update managed proxies deprecation page to note removal * website: remove managed proxy configuration API docs * website: remove managed proxy note from built-in proxy config * website: add note on removing proxy subdirectory of data_dir	2019-08-09 15:19:30 -04:00
Hans Hasselberg	33a7df3330	tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597 )	2019-06-27 22:22:07 +02:00
R.B. Boyer	40336fd353	agent: fix several data races and bugs related to node-local alias checks (#5876 ) The observed bug was that a full restart of a consul datacenter (servers and clients) in conjunction with a restart of a connect-flavored application with bring-your-own-service-registration logic would very frequently cause the envoy sidecar service check to never reflect the aliased service. Over the course of investigation several bugs and unfortunate interactions were corrected: (1) local.CheckState objects were only shallow copied, but the key piece of data that gets read and updated is one of the things not copied (the underlying Check with a Status field). When the stock code was run with the race detector enabled this highly-relevant-to-the-test-scenario field was found to be racy. Changes: a) update the existing Clone method to include the Check field b) copy-on-write when those fields need to change rather than incrementally updating them in place. This made the observed behavior occur slightly less often. (2) If anything about how the runLocal method for node-local alias check logic was ever flawed, there was no fallback option. Those checks are purely edge-triggered and failure to properly notice a single edge transition would leave the alias check incorrect until the next flap of the aliased check. The change was to introduce a fallback timer to act as a control loop to double check the alias check matches the aliased check every minute (borrowing the duration from the non-local alias check logic body). This made the observed behavior eventually go away when it did occur. (3) Originally I thought there were two main actions involved in the data race: A. The act of adding the original check (from disk recovery) and its first health evaluation. B. The act of the HTTP API requests coming in and resetting the local state when re-registering the same services and checks. It took awhile for me to realize that there's a third action at work: C. The goroutines associated with the original check and the later checks. The actual sequence of actions that was causing the bad behavior was that the API actions result in the original check to be removed and re-added _without waiting for the original goroutine to terminate_. This means for brief windows of time during check definition edits there are two goroutines that can be sending updates for the alias check status. In extremely unlikely scenarios the original goroutine sees the aliased check start up in `critical` before being removed but does not get the notification about the nearly immediate update of that check to `passing`. This is interlaced wit the new goroutine coming up, initializing its base case to `passing` from the current state and then listening for new notifications of edge triggers. If the original goroutine "finishes" its update, it then commits one more write into the local state of `critical` and exits leaving the alias check no longer reflecting the underlying check. The correction here is to enforce that the old goroutines must terminate before spawning the new one for alias checks.	2019-05-24 13:36:56 -05:00
Matt Keeler	f665695b6b	Ensure ServiceName is populated correctly for agent service checks Also update some snapshot agent docs * Enforce correct permissions when registering a check Previously we had attempted to enforce service:write for a check associated with a service instead of node:write on the agent but due to how we decoded the health check from the request it would never do it properly. This commit fixes that. * Update website/source/docs/commands/snapshot/agent.html.markdown.erb Co-Authored-By: mkeeler <mkeeler@users.noreply.github.com>	2019-04-30 19:00:57 -04:00
Kyle Havlovitz	d8f8400fe1	Merge pull request #5700 from hashicorp/service-reg-manager Use centralized service config on agent service registrations	2019-04-25 06:39:50 -07:00
Aestek	f669bb7b0f	Add support for DNS config hot-reload (#4875 ) The DNS config parameters `recursors` and `dns_config.*` are now hot reloaded on SIGHUP or `consul reload` and do not need an agent restart to be modified. Config is stored in an atomic.Value and loaded at the beginning of each request. Reloading only affects requests that start _after_ the reload. Ongoing requests are not affected. To match the current behavior the recursor handler is loaded and unloaded as needed on config reload.	2019-04-24 14:11:54 -04:00
Kyle Havlovitz	c269369760	Make central service config opt-in and rework the initial registration	2019-04-24 06:11:08 -07:00
Jeff Mitchell	4243c3ae42	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00
Jeff Mitchell	47c390025b	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00
Hans Hasselberg	e7134a0dab	agent: only use TestAgent when appropriate (#5502 )	2019-03-18 17:06:16 +01:00
Hans Hasselberg	7e11dd82aa	agent: enable reloading of tls config (#5419 ) This PR introduces reloading tls configuration. Consul will now be able to reload the TLS configuration which previously required a restart. It is not yet possible to turn TLS ON or OFF with these changes. Only when TLS is already turned on, the configuration can be reloaded. Most importantly the certificates and CAs.	2019-03-13 10:29:06 +01:00
Hans Hasselberg	786b3b1095	Centralise tls configuration part 1 (#5366 ) In order to be able to reload the TLS configuration, we need one way to generate the different configurations. This PR introduces a `tlsutil.Configurator` which holds a `tlsutil.Config`. Afterwards it is responsible for rendering every `tls.Config`. In this particular PR I moved `IncomingHTTPSConfig`, `IncomingTLSConfig`, and `OutgoingTLSWrapper` into `tlsutil.Configurator`. This PR is a pure refactoring - not a single feature added. And not a single test added. I only slightly modified existing tests as necessary.	2019-02-26 16:52:07 +01:00
Matt Keeler	766d771017	Pass a testing.T into NewTestAgent and TestAgent.Start (#5342 ) This way we can avoid unnecessary panics which cause other tests not to run. This doesn't remove all the possibilities for panics causing other tests not to run, it just fixes the TestAgent	2019-02-14 10:59:14 -05:00
Paul Banks	0638e09b6e	connect: agent leaf cert caching improvements (#5091 ) * Add State storage and LastResult argument into Cache so that cache.Types can safely store additional data that is eventually expired. * New Leaf cache type working and basic tests passing. TODO: more extensive testing for the Root change jitter across blocking requests, test concurrent fetches for different leaves interact nicely with rootsWatcher. * Add multi-client and delayed rotation tests. * Typos and cleanup error handling in roots watch * Add comment about how the FetchResult can be used and change ca leaf state to use a non-pointer state. * Plumb test override of root CA jitter through TestAgent so that tests are deterministic again! * Fix failing config test	2019-01-10 12:46:11 +00:00
Matt Keeler	e81c85c051	Fix #4515 : Segfault when serf_wan port was -1 but reconnect_time_wan was set (#4531 ) Fixes #4515 This just slightly refactors the logic to only attempt to set the serf wan reconnect timeout when the rest of the serf wan settings are configured - thus avoiding a segfault.	2018-08-17 14:44:25 -04:00
Paul Banks	8aeb7bd206	Disable TestAgent proxy execution properly	2018-06-25 12:25:38 -07:00
Paul Banks	2e223ea2b7	Fix hot loop in cache for RPC returning zero index.	2018-06-25 12:25:37 -07:00
Paul Banks	cdc7cfaa36	Abandon daemonize for simpler solution (preserving history): Reverts: - bdb274852ae469c89092d6050697c0ff97178465 - 2c689179c4f61c11f0016214c0fc127a0b813bfe - d62e25c4a7ab753914b6baccd66f88ffd10949a3 - c727ffbcc98e3e0bf41e1a7bdd40169bd2d22191 - 31b4d18933fd0acbe157e28d03ad59c2abf9a1fb - 85c3f8df3eabc00f490cd392213c3b928a85aa44	2018-06-25 12:24:10 -07:00
Paul Banks	8cf4b3a6eb	Sanity check that we are never trying to self-exec a test binary. Add daemonize bypass for TestAgent so that we don't have to jump through ridiculous self-execution hooks for every package that might possibly invoke a managed proxy	2018-06-25 12:24:09 -07:00

1 2

71 Commits