consul

Commit Graph

Author	SHA1	Message	Date
Hans Hasselberg	33a7df3330	tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597 )	2019-06-27 22:22:07 +02:00
Pierre Souchay	0e907f5aa8	Support for maximum size for Output of checks (#5233 ) * Support for maximum size for Output of checks This PR allows users to limit the size of output produced by checks at the agent and check level. When set at the agent level, it will limit the output for all checks monitored by the agent. When set at the check level, it can override the agent max for a specific check but only if it is lower than the agent max. Default value is 4k, and input must be at least 1.	2019-06-26 09:43:25 -06:00
R.B. Boyer	40336fd353	agent: fix several data races and bugs related to node-local alias checks (#5876 ) The observed bug was that a full restart of a consul datacenter (servers and clients) in conjunction with a restart of a connect-flavored application with bring-your-own-service-registration logic would very frequently cause the envoy sidecar service check to never reflect the aliased service. Over the course of investigation several bugs and unfortunate interactions were corrected: (1) local.CheckState objects were only shallow copied, but the key piece of data that gets read and updated is one of the things not copied (the underlying Check with a Status field). When the stock code was run with the race detector enabled this highly-relevant-to-the-test-scenario field was found to be racy. Changes: a) update the existing Clone method to include the Check field b) copy-on-write when those fields need to change rather than incrementally updating them in place. This made the observed behavior occur slightly less often. (2) If anything about how the runLocal method for node-local alias check logic was ever flawed, there was no fallback option. Those checks are purely edge-triggered and failure to properly notice a single edge transition would leave the alias check incorrect until the next flap of the aliased check. The change was to introduce a fallback timer to act as a control loop to double check the alias check matches the aliased check every minute (borrowing the duration from the non-local alias check logic body). This made the observed behavior eventually go away when it did occur. (3) Originally I thought there were two main actions involved in the data race: A. The act of adding the original check (from disk recovery) and its first health evaluation. B. The act of the HTTP API requests coming in and resetting the local state when re-registering the same services and checks. It took awhile for me to realize that there's a third action at work: C. The goroutines associated with the original check and the later checks. The actual sequence of actions that was causing the bad behavior was that the API actions result in the original check to be removed and re-added _without waiting for the original goroutine to terminate_. This means for brief windows of time during check definition edits there are two goroutines that can be sending updates for the alias check status. In extremely unlikely scenarios the original goroutine sees the aliased check start up in `critical` before being removed but does not get the notification about the nearly immediate update of that check to `passing`. This is interlaced wit the new goroutine coming up, initializing its base case to `passing` from the current state and then listening for new notifications of edge triggers. If the original goroutine "finishes" its update, it then commits one more write into the local state of `critical` and exits leaving the alias check no longer reflecting the underlying check. The correction here is to enforce that the old goroutines must terminate before spawning the new one for alias checks.	2019-05-24 13:36:56 -05:00
Alvin Huang	f45e495e38	Merge pull request #5376 from hashicorp/fix-tests Fix tests in prep for CircleCI Migration	2019-04-04 17:09:32 -04:00
Jeff Mitchell	4243c3ae42	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00
Jeff Mitchell	47c390025b	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00
Hans Hasselberg	e7134a0dab	agent: only use TestAgent when appropriate (#5502 )	2019-03-18 17:06:16 +01:00
Valentin Fritz	21f149de8b	Fix checks removal when removing service (#5457 ) Fix my recently discovered issue described here: #5456	2019-03-14 11:02:49 -04:00
Hans Hasselberg	7e11dd82aa	agent: enable reloading of tls config (#5419 ) This PR introduces reloading tls configuration. Consul will now be able to reload the TLS configuration which previously required a restart. It is not yet possible to turn TLS ON or OFF with these changes. Only when TLS is already turned on, the configuration can be reloaded. Most importantly the certificates and CAs.	2019-03-13 10:29:06 +01:00
Aestek	2aac4d5168	Register and deregisters services and their checks atomically in the local state (#5012 ) Prevent race between register and deregister requests by saving them together in the local state on registration. Also adds more cleaning in case of failure when registering services / checks.	2019-03-04 09:34:05 -05:00
Matt Keeler	118adbb123	ACL Token Persistence and Reloading (#5328 ) This PR adds two features which will be useful for operators when ACLs are in use. 1. Tokens set in configuration files are now reloadable. 2. If `acl.enable_token_persistence` is set to `true` in the configuration, tokens set via the `v1/agent/token` endpoint are now persisted to disk and loaded when the agent starts (or during configuration reload) Note that token persistence is opt-in so our users who do not want tokens on the local disk will see no change. Some other secondary changes: * Refactored a bunch of places where the replication token is retrieved from the token store. This token isn't just for replicating ACLs and now it is named accordingly. * Allowed better paths in the `v1/agent/token/` API. Instead of paths like: `v1/agent/token/acl_replication_token` the path can now be just `v1/agent/token/replication`. The old paths remain to be valid. * Added a couple new API functions to set tokens via the new paths. Deprecated the old ones and pointed to the new names. The names are also generally better and don't imply that what you are setting is for ACLs but rather are setting ACL tokens. There is a minor semantic difference there especially for the replication token as again, its no longer used only for ACL token/policy replication. The new functions will detect 404s and fallback to using the older token paths when talking to pre-1.4.3 agents. * Docs updated to reflect the API additions and to show using the new endpoints. * Updated the ACL CLI set-agent-tokens command to use the non-deprecated APIs.	2019-02-27 14:28:31 -05:00
Alvin Huang	c2a19e5090	add wait to TestAgent_RPCPing	2019-02-22 17:34:45 -05:00
Matt Keeler	766d771017	Pass a testing.T into NewTestAgent and TestAgent.Start (#5342 ) This way we can avoid unnecessary panics which cause other tests not to run. This doesn't remove all the possibilities for panics causing other tests not to run, it just fixes the TestAgent	2019-02-14 10:59:14 -05:00
Matt Keeler	acfd87c673	Improve Connect with Prepared Queries (#5291 ) Given a query like: ``` { "Name": "tagged-connect-query", "Service": { "Service": "foo", "Tags": ["tag"], "Connect": true } } ``` And a Consul configuration like: ``` { "services": [ "name": "foo", "port": 8080, "connect": { "sidecar_service": {} }, "tags": ["tag"] ] } ``` If you executed the query it would always turn up with 0 results. This was because the sidecar service was being created without any tags. You could instead make your config look like: ``` { "services": [ "name": "foo", "port": 8080, "connect": { "sidecar_service": { "tags": ["tag"] } }, "tags": ["tag"] ] } ``` However that is a bit redundant for most cases. This PR ensures that the tags and service meta of the parent service get copied to the sidecar service. If there are any tags or service meta set in the sidecar service definition then this copying does not take place. After the changes, the query will now return the expected results. A second change was made to prepared queries in this PR which is to allow filtering on ServiceMeta just like we allow for filtering on NodeMeta.	2019-02-04 09:36:51 -05:00
Kyle Havlovitz	7118f42950	Fix failing TestAgent_PurgeCheckOnDuplicate after merge	2019-01-28 13:19:38 -08:00
Kyle Havlovitz	1a4978fb94	Re-add ReadableDuration types to health check definition This is to fix the backwards-incompatible change made in 1.4.1 by changing these fields to time.Duration.	2019-01-25 14:47:35 -08:00
Paul Banks	bb7145f27d	agent: add default weights to service in local state to prevent AE churn (#5126 ) * Add default weights when adding a service with no weights to local state to prevent constant AE re-sync. This fix was contributed by @42wim in https://github.com/hashicorp/consul/pull/5096 but was merged against the wrong base. This adds it to master and adds a test to cover the behaviour. * Fix tests that broke due to comparing internal state which now has default weights	2019-01-08 10:13:49 +00:00
Aestek	8709213d6e	Prevent status flap when re-registering a check (#4904 ) Fixes point `#2` of: https://github.com/hashicorp/consul/issues/4903 When registering a service each healthcheck status is saved and restored (https://github.com/hashicorp/consul/blob/master/agent/agent.go#L1914) to avoid unnecessary flaps in health state. This change extends this feature to single check registration by moving this protection in `AddCheck()` so that both `PUT /v1/agent/service/register` and `PUT /v1/agent/check/register` behave in the same idempotent way. #### Steps to reproduce 1. Register a check : ``` curl -X PUT \ http://127.0.0.1:8500/v1/agent/check/register \ -H 'Content-Type: application/json' \ -d '{ "Name": "my_check", "ServiceID": "srv", "Interval": "10s", "Args": ["true"] }' ``` 2. The check will initialize and change to `passing` 3. Run the same request again 4. The check status will quickly go from `critical` to `passing` (the delay for this transission is determined by https://github.com/hashicorp/consul/blob/master/agent/checks/check.go#L95)	2019-01-07 13:53:03 -05:00
Aestek	25f04fbd21	[Security] Add finer control over script checks (#4715 ) * Add -enable-local-script-checks options These options allow for a finer control over when script checks are enabled by giving the option to only allow them when they are declared from the local file system. * Add documentation for the new option * Nitpick doc wording	2018-10-11 13:22:11 +01:00
Paul Banks	1e7eace066	Add SidecarService Syntax sugar to Service Definition (#4686 ) * Added new Config for SidecarService in ServiceDefinitions. * WIP: all the code needed for SidecarService is written... none of it is tested other than config :). Need API updates too. * Test coverage for the new sidecarServiceFromNodeService method. * Test API registratrion with SidecarService * Recursive Key Translation 🤦 * Add tests for nested sidecar defintion arrays to ensure they are translated correctly * Use dedicated internal state rather than Service Meta for tracking sidecars for deregistration. Add tests for deregistration. * API struct for agent register. No other endpoint should be affected yet. * Additional test cases to cover updates to API registrations	2018-10-10 16:55:34 +01:00
Hans Hasselberg	8e235a72b4	Allow disabling the HTTP API again. (#4655 ) If you provide an invalid HTTP configuration consul will still start again instead of failing. But if you do so the build-in proxy won't be able to start which you might need for connect.	2018-09-13 16:06:04 +02:00
Pierre Souchay	eddcf228ea	Implementation of Weights Data structures (#4468 ) * Implementation of Weights Data structures Adding this datastructure will allow us to resolve the issues #1088 and #4198 This new structure defaults to values: ``` { Passing: 1, Warning: 0 } ``` Which means, use weight of 0 for a Service in Warning State while use Weight 1 for a Healthy Service. Thus it remains compatible with previous Consul versions. * Implemented weights for DNS SRV Records * DNS properly support agents with weight support while server does not (backwards compatibility) * Use Warning value of Weights of 1 by default When using DNS interface with only_passing = false, all nodes with non-Critical healthcheck used to have a weight value of 1. While having weight.Warning = 0 as default value, this is probably a bad idea as it breaks ascending compatibility. Thus, we put a default value of 1 to be consistent with existing behaviour. * Added documentation for new weight field in service description * Better documentation about weights as suggested by @banks * Return weight = 1 for unknown Check states as suggested by @banks * Fixed typo (of -> or) in error message as requested by @mkeeler * Fixed unstable unit test TestRetryJoin * Fixed unstable tests * Fixed wrong Fatalf format in `testrpc/wait.go` * Added notes regarding DNS SRV lookup limitations regarding number of instances * Documentation fixes and clarification regarding SRV records with weights as requested by @banks * Rephrase docs	2018-09-07 15:30:47 +01:00
Pierre Souchay	92acdaa94c	Fixed flaky tests (#4626 )	2018-09-04 12:31:51 +01:00
Matt Keeler	e81c85c051	Fix #4515 : Segfault when serf_wan port was -1 but reconnect_time_wan was set (#4531 ) Fixes #4515 This just slightly refactors the logic to only attempt to set the serf wan reconnect timeout when the rest of the serf wan settings are configured - thus avoiding a segfault.	2018-08-17 14:44:25 -04:00
Siva Prasad	c88900aaa9	PR to fix TestAgent_IndexChurn and TestPreparedQuery_Wrapper. (#4512 ) * Fixes TestAgent_IndexChurn * Fixes TestPreparedQuery_Wrapper * Increased sleep in agent_test for IndexChurn to 500ms * Made the comment about joinWAN operation much less of a cliffhanger	2018-08-09 12:40:07 -04:00
Mitchell Hashimoto	7fa6bb022f	Merge pull request #4320 from hashicorp/f-alias-check Add "Alias" Check Type	2018-07-20 13:01:33 -05:00
Matt Keeler	ca5851318d	Update a couple erroneous tests.	2018-07-19 09:20:51 -04:00
Matt Keeler	3fe5f566f2	Persist proxies from config files Also change how loadProxies works. Now it will load all persisted proxies into a map, then when loading config file proxies will look up the previous proxy token in that map.	2018-07-18 17:04:35 -04:00
Mitchell Hashimoto	d6ecd97d1d	agent: use the correct ACL token for alias checks	2018-07-12 10:17:53 -07:00
Mitchell Hashimoto	4a67beb734	agent: run alias checks	2018-07-12 09:36:10 -07:00
Paul Banks	8405b41f2b	Update proxy config docs and add test for ipv6	2018-07-12 13:07:48 +01:00
Paul Banks	bb9a5c703b	Default managed proxy TCP check address sanely when proxy is bound to 0.0.0.0. This also provides a mechanism to configure custom address or disable the check entirely from managed proxy config.	2018-07-12 12:57:10 +01:00
Mitchell Hashimoto	a76f652fd2	agent: convert the proxy bind_port to int if it is a float	2018-06-25 12:26:18 -07:00
Paul Banks	17789d4fe3	register TCP check for managed proxies	2018-06-25 12:25:40 -07:00
Mitchell Hashimoto	a82726f0b8	agent: RemoveProxy also removes the proxy service	2018-06-25 12:25:12 -07:00
Mitchell Hashimoto	e2653bec02	Fix broken tests from PR merge related to proxy secure defaults	2018-06-25 12:25:12 -07:00
Paul Banks	df2cb30b01	Make tests pass and clean proxy persistence. No detached child changes yet. This is a good state for persistence stuff to re-start the detached child work that got mixed up last time.	2018-06-25 12:24:10 -07:00
Paul Banks	cdc7cfaa36	Abandon daemonize for simpler solution (preserving history): Reverts: - bdb274852ae469c89092d6050697c0ff97178465 - 2c689179c4f61c11f0016214c0fc127a0b813bfe - d62e25c4a7ab753914b6baccd66f88ffd10949a3 - c727ffbcc98e3e0bf41e1a7bdd40169bd2d22191 - 31b4d18933fd0acbe157e28d03ad59c2abf9a1fb - 85c3f8df3eabc00f490cd392213c3b928a85aa44	2018-06-25 12:24:10 -07:00
Paul Banks	ef9c40643e	Fix import tooling fail	2018-06-25 12:24:09 -07:00
Paul Banks	e21723a891	Persist proxy state through agent restart	2018-06-25 12:24:08 -07:00
Paul Banks	a80559e439	Make invalid clusterID be fatal	2018-06-14 09:42:17 -07:00
Paul Banks	4aeab3897c	Fixed many tests after rebase. Some still failing and seem unrelated to any connect changes.	2018-06-14 09:42:16 -07:00
Mitchell Hashimoto	ba00fa3548	agent: add additional tests for defaulting in AddProxy	2018-06-14 09:42:10 -07:00
Mitchell Hashimoto	171bf8d599	agent: clean up defaulting of proxy configuration This cleans up and unifies how proxy settings defaults are applied.	2018-06-14 09:42:10 -07:00
Mitchell Hashimoto	6539280f2a	agent: fix crash that could happen if proxy was nil on load	2018-06-14 09:42:09 -07:00
Mitchell Hashimoto	aaa2431350	agent: change connect command paths to be slices, not strings This matches other executable configuration and allows us to cleanly separate executable from arguments without trying to emulate shell parsing.	2018-06-14 09:42:08 -07:00
Paul Banks	90c574ebaa	Wire up agent leaf endpoint to cache framework to support blocking.	2018-06-14 09:42:07 -07:00
Paul Banks	1b197d934a	Don't allow connect watches in agent/cli yet	2018-06-14 09:42:06 -07:00
Paul Banks	730da74369	Fix various test failures and vet warnings. Intention de-duplication in previously merged PR actualy failed some tests that were not caught be me or CI. I ran the test files for state changes but they happened not to trigger this case so I made sure they did first and then fixed. That fixed some upstream intention endpoint tests that I'd not run as part of testing the previous fix.	2018-06-14 09:41:58 -07:00
Paul Banks	8d09381b96	Super ugly hack to get TeamCity build to work for this PR without adding a vendor that is being added elsewhere and will conflict...	2018-06-14 09:41:58 -07:00

1 2

83 Commits