consul

Commit Graph

Author	SHA1	Message	Date
Jared Kirschner	b393c90ce7	Clarify service and check error messages (use ID) Error messages related to service and check operations previously included the following substrings: - service %q - check %q From this error message, it isn't clear that the expected field is the ID for the entity, not the name. For example, if the user has a service named test, the error message would read 'Unknown service "test"'. This is misleading - a service with that name does exist, but not with that ID. The substrings above have been modified to make it clear that ID is needed, not name: - service with ID %q - check with ID %q	2022-01-04 11:42:37 -08:00
Kyle Havlovitz	4f2cfee4b0	consul: add virtual IP generation for connect services	2021-12-02 15:42:47 -08:00
Will Jordan	68efecafed	Update node info sync comment (#11465 )	2021-11-16 11:16:11 -08:00
Daniel Nephin	eb632c53a2	structs: rename the last helper method. This one gets used a bunch, but we can rename it to make the behaviour more obvious.	2021-09-29 11:48:38 -04:00
R.B. Boyer	097e1645e3	agent: ensure that most agent behavior correctly respects partition configuration (#10880 )	2021-08-19 15:09:42 -05:00
R.B. Boyer	188e8dc51f	agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669 )	2021-07-22 13:20:45 -05:00
Daniel Nephin	df5e18c9c5	agent/local: only fallback to agent token for deletes Fallback to the default user token for synching registrations.	2021-05-06 18:44:05 -04:00
Daniel Nephin	0b017e2497	agent/local: do not persist the agent tokens Only default to the user token and agent token for the sync. Change the exported methods to only return the stored tokens associated with a specific check or service.	2021-05-06 13:18:58 -04:00
Daniel Nephin	8a5163b184	local: default to the agent token instead of the user token When de-registering in anti-entropy sync, when there is no service or check token. The agent token will fall back to the default (aka user) token if no agent token is set, so the existing behaviour still works, but it will prefer the agent token over the user token if both are set. ref: https://www.consul.io/docs/agent/options#acl_tokens The agent token seems more approrpiate in this case, since this is an "internal operation", not something initiated by the user.	2021-02-19 18:35:08 -05:00
Daniel Nephin	d113f0e690	structs: Fix printing of IDs These types are used as values (not pointers) in other structs. Using a pointer receiver causes problems when the value is printed. fmt will not call the String method if it is passed a value and the String method has a pointer receiver. By using a value receiver the correct string is printed. Also remove some unused methods.	2021-01-07 18:47:38 -05:00
Daniel Nephin	33b81067f8	local: mark service and checks as InSync when added If the existing service and checks are the same as the new registration.	2020-11-27 15:31:12 -05:00
Kit Patella	15af5ead0b	trim help strings to save a few bytes	2020-11-16 11:02:11 -08:00
Kit Patella	5da2f1efa8	finish adding static server metrics	2020-11-13 16:26:08 -08:00
Kit Patella	06d59c03b9	add the service name in the agent rather than in the definitions themselves	2020-11-13 13:18:04 -08:00
Kit Patella	24a2471029	first pass on agent-configured prometheusDefs and adding defs for every consul metric	2020-11-12 18:12:12 -08:00
Matt Keeler	a3a653342b	Fix a bunch of linter warnings	2020-11-09 09:22:12 -05:00
Matt Keeler	c048e86bb2	Switch to using the external autopilot module	2020-11-09 09:22:11 -05:00
Freddy	d72f72dcd5	Notify alias checks when aliased service is [de]registered (#8456 )	2020-08-12 09:47:41 -06:00
R.B. Boyer	9cfa4a3fc9	tests: ensure that the ServiceExists helper function normalizes entmeta (#8025 ) This fixes a unit test failure over in enterprise due to https://github.com/hashicorp/consul/pull/7384	2020-06-05 10:41:39 +02:00
Pierre Souchay	9813ae512b	checks: when a service does not exists in an alias, consider it failing (#7384 ) In current implementation of Consul, check alias cannot determine if a service exists or not. Because a service without any check is semantically considered as passing, so when no healthchecks are found for an agent, the check was considered as passing. But this make little sense as the current implementation does not make any difference between: * a non-existing service (passing) * a service without any check (passing as well) In order to make it work, we have to ensure that when a check did not find any healthcheck, the service does indeed exists. If it does not, lets consider the check as failing.	2020-06-04 14:50:52 +02:00
Matt Keeler	acccdbe45c	Fix identity resolution on clients and in secondary dcs (#7862 ) Previously this happened to be using the method on the Server/Client that was meant to allow the ACLResolver to locally resolve tokens. On Servers that had tokens (primary or secondary dc + token replication) this function would lookup the token from raft and return the ACLIdentity. On clients this was always a noop. We inadvertently used this function instead of creating a new one when we added logging accessor ids for permission denied RPC requests. With this commit, a new method is used for resolving the identity properly via the ACLResolver which may still resolve locally in the case of being on a server with tokens but also supports remote token resolution.	2020-05-13 13:00:08 -04:00
Freddy	3956cff60f	Fix check deletion in anti-entropy sync (#7690 ) * Incorporate entMeta into service equality check	2020-04-23 10:16:50 -06:00
Daniel Nephin	f46d1b5c94	agent/structs: Remove ServiceID.Init and CheckID.Init The Init method provided the same functionality as the New constructor. The constructor is both more widely used, and more idiomatic, so remove the Init method. This change is in preparation for fixing printing of these IDs.	2020-04-15 12:09:56 -04:00
Hans Hasselberg	6a18f01b42	agent: ensure node info sync and full sync. (#7189 ) This fixes #7020. There are two problems this PR solves: * if the node info changes it is highly likely to get service and check registration permission errors unless those service tokens have node:write. Hopefully services you register don’t have this permission. * the timer for a full sync gets reset for every partial sync which means that many partial syncs are preventing a full sync from happening Instead of syncing node info last, after services and checks, and possibly saving one RPC because it is included in every service sync, I am syncing node info first. It is only ever going to be a single RPC that we are only doing when node info has changed. This way we are guaranteed to sync node info even when something goes wrong with services or checks which is more likely because there are more syncs happening for them.	2020-02-06 15:30:58 +01:00
R.B. Boyer	d78b5008ce	various tweaks on top of the hclog work (#7165 )	2020-01-29 11:16:08 -06:00
Chris Piraino	401221de58	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Kit Patella	0d336edb65	Add accessorID of token when ops are denied by ACL system (#7117 ) * agent: add and edit doc comments * agent: add ACL token accessorID to debugging traces * agent: polish acl debugging * agent: minor fix + string fmt over value interp * agent: undo export & fix logging field names * agent: remove note and migrate up to code review * Update agent/consul/acl.go Co-Authored-By: Matt Keeler <mkeeler@users.noreply.github.com> * agent: incorporate review feedback * Update agent/acl.go Co-Authored-By: R.B. Boyer <public@richardboyer.net> Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: R.B. Boyer <public@richardboyer.net>	2020-01-27 11:54:32 -08:00
Chris Piraino	59f1462801	Fix segfault when removing both a service and associated check (#7108 ) * Fix segfault when removing both a service and associated check updateSyncState creates entries in the services and checks maps for remote services/checks that are not found locally, so that we can then make sure to delete them in our reconciliation process. However, the values added to the map are missing key fields that the rest of the code expects to not be nil. * Add comment stating Check field can be nil	2020-01-23 10:38:32 -06:00
Aestek	5dc8875bd3	agent: do not deregister service checks twice (#6168 ) Deregistering a service from the catalog automatically deregisters its checks, however the agent still performs a deregister call for each service checks even after the service has been deregistered. With ACLs enabled this results in logs like: "message:consul: "Catalog.Deregister" RPC failed to server server_ip:8300: rpc error making call: rpc error making call: Unknown check 'check_id'" This change removes associated checks from the agent state when deregistering a service, which results in less calls to the servers and supresses the error logs.	2020-01-17 14:26:53 +01:00
Matt Keeler	5934f803bf	Sync of OSS changes to support namespaces (#6909 )	2019-12-09 21:26:41 -05:00
Freddy	fdd10dd8b8	Expose HTTP-based paths through Connect proxy (#6446 ) Fixes: #5396 This PR adds a proxy configuration stanza called expose. These flags register listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only listening on the loopback interface, while still accepting traffic from non Connect-enabled services. Under expose there is a boolean checks flag that would automatically expose all registered HTTP and gRPC check paths. This stanza also accepts a paths list to expose individual paths. The primary use case for this functionality would be to expose paths for third parties like Prometheus or the kubelet. Listeners for requests to exposed paths are be configured dynamically at run time. Any time a proxy, or check can be registered, a listener can also be created. In this initial implementation requests to these paths are not authenticated/encrypted.	2019-09-25 20:55:52 -06:00
Mike Morris	65be58703c	connect: remove managed proxies (#6220 ) * connect: remove managed proxies implementation and all supporting config options and structs * connect: remove deprecated ProxyDestination * command: remove CONNECT_PROXY_TOKEN env var * agent: remove entire proxyprocess proxy manager * test: remove all managed proxy tests * test: remove irrelevant managed proxy note from TestService_ServerTLSConfig * test: update ContentHash to reflect managed proxy removal * test: remove deprecated ProxyDestination test * telemetry: remove managed proxy note * http: remove /v1/agent/connect/proxy endpoint * ci: remove deprecated test exclusion * website: update managed proxies deprecation page to note removal * website: remove managed proxy configuration API docs * website: remove managed proxy note from built-in proxy config * website: add note on removing proxy subdirectory of data_dir	2019-08-09 15:19:30 -04:00
Aestek	24a0f2bba2	ae: use stale requests when performing full sync (#5873 ) Read requests performed during anti antropy full sync currently target the leader only. This generates a non-negligible load on the leader when the DC is large enough and can be offloaded to the followers following the "eventually consistent" policy for the agent state. We switch the AE read calls to use stale requests with a small (2s) MaxStaleDuration value and make sure we do not read too fast after a write.	2019-06-17 18:05:47 +02:00
R.B. Boyer	40336fd353	agent: fix several data races and bugs related to node-local alias checks (#5876 ) The observed bug was that a full restart of a consul datacenter (servers and clients) in conjunction with a restart of a connect-flavored application with bring-your-own-service-registration logic would very frequently cause the envoy sidecar service check to never reflect the aliased service. Over the course of investigation several bugs and unfortunate interactions were corrected: (1) local.CheckState objects were only shallow copied, but the key piece of data that gets read and updated is one of the things not copied (the underlying Check with a Status field). When the stock code was run with the race detector enabled this highly-relevant-to-the-test-scenario field was found to be racy. Changes: a) update the existing Clone method to include the Check field b) copy-on-write when those fields need to change rather than incrementally updating them in place. This made the observed behavior occur slightly less often. (2) If anything about how the runLocal method for node-local alias check logic was ever flawed, there was no fallback option. Those checks are purely edge-triggered and failure to properly notice a single edge transition would leave the alias check incorrect until the next flap of the aliased check. The change was to introduce a fallback timer to act as a control loop to double check the alias check matches the aliased check every minute (borrowing the duration from the non-local alias check logic body). This made the observed behavior eventually go away when it did occur. (3) Originally I thought there were two main actions involved in the data race: A. The act of adding the original check (from disk recovery) and its first health evaluation. B. The act of the HTTP API requests coming in and resetting the local state when re-registering the same services and checks. It took awhile for me to realize that there's a third action at work: C. The goroutines associated with the original check and the later checks. The actual sequence of actions that was causing the bad behavior was that the API actions result in the original check to be removed and re-added _without waiting for the original goroutine to terminate_. This means for brief windows of time during check definition edits there are two goroutines that can be sending updates for the alias check status. In extremely unlikely scenarios the original goroutine sees the aliased check start up in `critical` before being removed but does not get the notification about the nearly immediate update of that check to `passing`. This is interlaced wit the new goroutine coming up, initializing its base case to `passing` from the current state and then listening for new notifications of edge triggers. If the original goroutine "finishes" its update, it then commits one more write into the local state of `critical` and exits leaving the alias check no longer reflecting the underlying check. The correction here is to enforce that the old goroutines must terminate before spawning the new one for alias checks.	2019-05-24 13:36:56 -05:00
Freddy	f2213f60e0	Update alias checks on local add and remove	2019-04-24 12:17:06 -06:00
R.B. Boyer	f4a3b9d518	fix typos reported by golangci-lint:misspell (#5434 )	2019-03-06 11:13:28 -06:00
Aestek	2aac4d5168	Register and deregisters services and their checks atomically in the local state (#5012 ) Prevent race between register and deregister requests by saving them together in the local state on registration. Also adds more cleaning in case of failure when registering services / checks.	2019-03-04 09:34:05 -05:00
Aestek	5960974db1	[Fix] Services sometimes not being synced with acl_enforce_version_8 = false (#4771 ) Fixes: https://github.com/hashicorp/consul/issues/3676 This fixes a bug were registering an agent with a non-existent ACL token can prevent other services registered with a good token from being synced to the server when using `acl_enforce_version_8 = false`. ## Background When `acl_enforce_version_8` is off the agent does not check the ACL token validity before storing the service in its state. When syncing a service registered with a missing ACL token we fall into the default error handling case (https://github.com/hashicorp/consul/blob/master/agent/local/state.go#L1255) and stop the sync (https://github.com/hashicorp/consul/blob/master/agent/local/state.go#L1082) without setting its Synced property to true like in the permission denied case. This means that the sync will always stop at the faulty service(s). The order in which the services are synced is random since we iterate on a map. So eventually all services with good ACL tokens will be synced, this can however take some time and is influenced by the cluster size, the bigger the slower because retries are less frequent. Having a service in this state also prevent all further sync of checks as they are done after the services. ## Changes This change modify the sync process to continue even if there is an error. This fixes the issue described above as well as making the sync more error tolerant: if the server repeatedly refuses a service (the ACL token could have been deleted by the time the service is synced, the servers were upgraded to a newer version that has more strict checks on the service definition...). Then all services and check that can be synced will, and those that don't will be marked as errors in the logs instead of blocking the whole process.	2019-01-04 10:01:50 -05:00
Paul Banks	0f27ffd163	Proxy Config Manager (#4729 ) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies	2018-10-10 16:55:34 +01:00
Paul Banks	e812f5516a	Add -sidecar-for and new /agent/service/:service_id endpoint (#4691 ) - A new endpoint `/v1/agent/service/:service_id` which is a generic way to look up the service for a single instance. The primary value here is that it: - supports hash-based blocking and so; - replaces `/agent/connect/proxy/:proxy_id` as the mechanism the built-in proxy uses to read its config. - It's not proxy specific and so works for any service. - It has a temporary shim to call through to the existing endpoint to preserve current managed proxy config defaulting behaviour until that is removed entirely (tested). - The built-in proxy now uses the new endpoint exclusively for it's config - The built-in proxy now has a `-sidecar-for` flag that allows the service ID of the _target_ service to be specified, on the condition that there is exactly one "sidecar" proxy (that is one that has `Proxy.DestinationServiceID` set) for the service registered. - Several fixes for edge cases for SidecarService - A fix for `Alias` checks - when running locally they didn't update their state until some external thing updated the target. If the target service has no checks registered as below, then the alias never made it past critical.	2018-10-10 16:55:34 +01:00
Paul Banks	b83bbf248c	Add Proxy Upstreams to Service Definition (#4639 ) * Refactor Service Definition ProxyDestination. This includes: - Refactoring all internal structs used - Updated tests for both deprecated and new input for: - Agent Services endpoint response - Agent Service endpoint response - Agent Register endpoint - Unmanaged deprecated field - Unmanaged new fields - Managed deprecated upstreams - Managed new - Catalog Register - Unmanaged deprecated field - Unmanaged new fields - Managed deprecated upstreams - Managed new - Catalog Services endpoint response - Catalog Node endpoint response - Catalog Service endpoint response - Updated API tests for all of the above too (both deprecated and new forms of register) TODO: - config package changes for on-disk service definitions - proxy config endpoint - built-in proxy support for new fields * Agent proxy config endpoint updated with upstreams * Config file changes for upstreams. * Add upstream opaque config and update all tests to ensure it works everywhere. * Built in proxy working with new Upstreams config * Command fixes and deprecations * Fix key translation, upstream type defaults and a spate of other subtele bugs found with ned to end test scripts... TODO: tests still failing on one case that needs a fix. I think it's key translation for upstreams nested in Managed proxy struct. * Fix translated keys in API registration. ≈ * Fixes from docs - omit some empty undocumented fields in API - Bring back ServiceProxyDestination in Catalog responses to not break backwards compat - this was removed assuming it was only used internally. * Documentation updates for Upstreams in service definition * Fixes for tests broken by many refactors. * Enable travis on f-connect branch in this branch too. * Add consistent Deprecation comments to ProxyDestination uses * Update version number on deprecation notices, and correct upstream datacenter field with explanation in docs	2018-10-10 16:55:34 +01:00
Martin	feb3ce4ee0	Use target service name instead of ID as connect proxy service name (#4620 )	2018-09-05 20:33:17 +01:00
Mitchell Hashimoto	f97bfd5be8	agent: address some basic feedback	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	7543d270e2	agent/local: support local alias checks	2018-07-12 09:36:10 -07:00
Pierre Souchay	ff53648df2	Merge remote-tracking branch 'origin/master' into ACL_additional_info	2018-07-07 14:09:18 +02:00
Paul Banks	e21723a891	Persist proxy state through agent restart	2018-06-25 12:24:08 -07:00
Mitchell Hashimoto	d9bd4ffebd	agent/local: clarify the non-risk of a full buffer	2018-06-14 09:42:10 -07:00
Mitchell Hashimoto	437689e83c	agent/local: remove outdated comment	2018-06-14 09:42:10 -07:00
Mitchell Hashimoto	2bd39a84a6	agent/local: add Notify mechanism for proxy changes	2018-06-14 09:42:08 -07:00
Mitchell Hashimoto	476ea7b04a	agent: start/stop proxies	2018-06-14 09:42:08 -07:00

1 2

97 Commits