consul

Commit Graph

Author	SHA1	Message	Date
Derek Menteer	0ac8ae6c3b	Fix xDS deadlock due to syncLoop termination. (#20867 ) * Fix xDS deadlock due to syncLoop termination. This fixes an issue where agentless xDS streams can deadlock permanently until a server is restarted. When this issue occurs, no new proxies are able to successfully connect to the server. Effectively, the trigger for this deadlock stems from the following return statement: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202 When this happens, the entire `syncLoop()` terminates and stops consuming from the following channel: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192 Which results in the `ConfigSource.cleanup()` function never receiving a response and holding a mutex indefinitely: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247 Because this mutex is shared, it effectively deadlocks the server's ability to process new xDS streams. ---- The fix to this issue involves removing the `chan chan struct{}` used like an RPC-over-channels pattern and replacing it with two distinct channels: + `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon. + `syncLoopDoneCh` - indicates that the `syncLoop()` has terminated. Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the `syncLoop()` function ensures that the deadlock above should no longer occur. We also now evict xDS connections of all proxies for the corresponding `syncLoop()` whenever it encounters an irrecoverable error. This is done by hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta processing. Prior to this fix, the behavior was to simply orphan them so they would never receive catalog-registration or service-defaults updates. * Add changelog.	2024-03-15 13:57:11 -05:00
Derek Menteer	8f4c43727d	[NET-5916] Fix locality-aware routing config and tests (CE) (#19483 ) Fix locality-aware routing config and tests	2023-11-02 14:05:06 -05:00
Chris S. Kim	92ce814693	Remove old build tags (#19128 )	2023-10-10 10:58:06 -04:00
Iryna Shustava	e6b724d062	catalog,mesh,auth: Move resource types to the proto-public module (#18935 )	2023-09-22 15:50:56 -06:00
Dhia Ayachi	b1688ad856	Run copyright after running deep-copy as part of the Makefile/CI (#18741 ) * execute copyright headers after performing deep-copy generation. * fix copyright install * Apply suggestions from code review Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * Apply suggestions from code review Co-authored-by: Semir Patel <semir.patel@hashicorp.com> * rename steps to match codegen naming * remove copywrite install category --------- Co-authored-by: Semir Patel <semir.patel@hashicorp.com>	2023-09-11 13:50:52 -04:00
Ashwin Venkatesh	797e42dc24	Watch the ProxyTracker from xDS controller (#18611 )	2023-08-29 14:39:29 -07:00
John Murret	0e606504bc	NET-4944 - wire up controllers with proxy tracker (#18603 ) Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>	2023-08-29 09:15:34 -06:00
John Murret	051f250edb	NET-5338 - NET-5338 - Run a v2 mode xds server (#18579 ) * NET-5338 - NET-5338 - Run a v2 mode xds server * fix linting	2023-08-24 16:44:14 -06:00
hashicorp-copywrite[bot]	5fb9df1640	[COMPLIANCE] License changes (#18443 ) * Adding explicit MPL license for sub-package This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository. * Adding explicit MPL license for sub-package This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository. * Updating the license from MPL to Business Source License Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at <Blog URL>, FAQ at www.hashicorp.com/licensing-faq, and details of the license at www.hashicorp.com/bsl. * add missing license headers * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 --------- Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-08-11 09:12:13 -04:00
Ronald	94ec4eb2f4	copyright headers for agent folder (#16704 ) * copyright headers for agent folder * Ignore test data files * fix proto files and remove headers in agent/uiserver folder * ignore deep-copy files	2023-03-28 14:39:22 -04:00
R.B. Boyer	9a485cdb49	proxycfg: ensure that an irrecoverable error in proxycfg closes the xds session and triggers a replacement proxycfg watcher (#16497 ) Receiving an "acl not found" error from an RPC in the agent cache and the streaming/event components will cause any request loops to cease under the assumption that they will never work again if the token was destroyed. This prevents log spam (#14144, #9738). Unfortunately due to things like: - authz requests going to stale servers that may not have witnessed the token creation yet - authz requests in a secondary datacenter happening before the tokens get replicated to that datacenter - authz requests from a primary TO a secondary datacenter happening before the tokens get replicated to that datacenter The caller will get an "acl not found" before the token exists, rather than just after. The machinery added above in the linked PRs will kick in and prevent the request loop from looping around again once the tokens actually exist. For `consul-dataplane` usages, where xDS is served by the Consul servers rather than the clients ultimately this is not a problem because in that scenario the `agent/proxycfg` machinery is on-demand and launched by a new xDS stream needing data for a specific service in the catalog. If the watching goroutines are terminated it ripples down and terminates the xDS stream, which CDP will eventually re-establish and restart everything. For Consul client usages, the `agent/proxycfg` machinery is ahead-of-time launched at service registration time (called "local" in some of the proxycfg machinery) so when the xDS stream comes in the data is already ready to go. If the watching goroutines terminate it should terminate the xDS stream, but there's no mechanism to re-spawn the watching goroutines. If the xDS stream reconnects it will see no `ConfigSnapshot` and will not get one again until the client agent is restarted, or the service is re-registered with something changed in it. This PR fixes a few things in the machinery: - there was an inadvertent deadlock in fetching snapshot from the proxycfg machinery by xDS, such that when the watching goroutine terminated the snapshots would never be fetched. This caused some of the xDS machinery to get indefinitely paused and not finish the teardown properly. - Every 30s we now attempt to re-insert all locally registered services into the proxycfg machinery. - When services are re-inserted into the proxycfg machinery we special case "dead" ones such that we unilaterally replace them rather that doing that conditionally.	2023-03-03 14:27:53 -06:00
Dan Upton	7a55de375c	xds: don't attempt to load-balance sessions for local proxies (#15789 ) Previously, we'd begin a session with the xDS concurrency limiter regardless of whether the proxy was registered in the catalog or in the server's local agent state. This caused problems for users who run `consul connect envoy` directly against a server rather than a client agent, as the server's locally registered proxies wouldn't be included in the limiter's capacity. Now, the `ConfigSource` is responsible for beginning the session and we only do so for services in the catalog. Fixes: https://github.com/hashicorp/consul/issues/15753	2023-01-18 12:33:21 -06:00
Paul Glass	f5231b9157	Add new config_file_service_registration token (#15828 )	2023-01-10 10:24:02 -06:00
Dan Upton	7b2d08d461	chore: remove unused argument from MergeNodeServiceWithCentralConfig (#15024 ) Previously, the MergeNodeServiceWithCentralConfig method accepted a ServiceSpecificRequest argument, of which only the Datacenter and QueryOptions fields were used. Digging a little deeper, it turns out these fields were only passed down to the ComputeResolvedServiceConfig method (through the ServiceConfigRequest struct) which didn't actually use them. As such, not all call-sites passed a valid ServiceSpecificRequest so it's safer to remove the argument altogether to prevent future changes from depending on it.	2022-11-09 14:54:57 +00:00
Dan Upton	cbb4a030c4	xds: properly merge central config for "agentless" services (#14962 )	2022-10-13 12:04:59 +01:00
Dan Upton	0af9f16343	bug: fix goroutine leaks caused by incorrect usage of `WatchCh` (#14916 ) memdb's `WatchCh` method creates a goroutine that will publish to the returned channel when the watchset is triggered or the given context is canceled. Although this is called out in its godoc comment, it's not obvious that this method creates a goroutine who's lifecycle you need to manage. In the xDS capacity controller, we were calling `WatchCh` on each iteration of the control loop, meaning the number of goroutines would grow on each autopilot event until there was catalog churn. In the catalog config source, we were calling `WatchCh` with the background context, meaning that the goroutine would keep running after the sync loop had terminated.	2022-10-13 12:04:27 +01:00
Daniel Upton	6452118c15	proxycfg-sources: fix hot loop when service not found in catalog Fixes a bug where a service getting deleted from the catalog would cause the ConfigSource to spin in a hot loop attempting to look up the service. This is because we were returning a nil WatchSet which would always unblock the select. Kudos to @freddygv for discovering this!	2022-08-02 15:42:29 +01:00
cskh	59e81a728e	chore: removed unused method AddService (#13905 ) - This AddService is not used anywhere. AddServiceWithChecks is place of AddService - Test code is updated	2022-07-26 16:54:53 -04:00
Matt Keeler	3795769729	Fix a flaky test (#13282 ) At the end of this test we were trying to ensure that updating a service in the local state causes it to re-register the service with the config manager. The config manager in the same method will also call RegisteredProxies to determine if any need to be removed. This portion of the test is not attempting to verify that behavior. Because the test is only blocked waiting for the Register event before it can end and assert all the mock expectations were met, we may not see the call to RegisteredProxies. This is especially apparent when tests are run with the race detector. As we don’t actually care if that method is executed before the end of the test we can simply transition from expecting it to be called exactly once to a 0 or 1 times assertion.	2022-05-27 13:25:08 -04:00
Dan Upton	2427e38839	Enable servers to configure arbitrary proxies from the catalog (#13244 ) OSS port of enterprise PR 1822 Includes the necessary changes to the `proxycfg` and `xds` packages to enable Consul servers to configure arbitrary proxies using catalog data. Broadly, `proxycfg.Manager` now has public methods for registering, deregistering, and listing registered proxies — the existing local agent state-sync behavior has been moved into a separate component that makes use of these methods. When an xDS session is started for a proxy service in the catalog, a goroutine will be spawned to watch the service in the server's state store and re-register it with the `proxycfg.Manager` whenever it is updated (and clean it up when the client goes away).	2022-05-27 12:38:52 +01:00

20 Commits