consul

Commit Graph

Author	SHA1	Message	Date
R.B. Boyer	1535844c62	gossip: refactor some gossip related libraries into a central place (#21036 ) This refactors and relocates the following packages to live under internal/gossip instead of either in the toplevel lib or agent/consul: - librtt : related to serf coordinates - libserf : random serf stuff	2024-05-07 10:30:49 -05:00
Nathan Coleman	b5b3a63183	[NET-9098] Narrow scope of peering config on terminating gw filter chain to TCP services (#21054 )	2024-05-06 16:21:09 -04:00
Dan Stough	03ab7367a6	feat(dataplane): allow token and tenancy information for proxied DNS (#20899 ) * feat(dataplane): allow token and tenancy information for proxied DNS * changelog	2024-04-22 14:30:43 -04:00
sarahalsmiller	08761f16c8	Net 6820 customize mesh gateway limits (#20945 ) * add upstream limits to mesh gateway cluster generation * changelog * go mod tidy * readd changelog data * undo reversion from rebase * run codegen * Update .changelog/20945.txt Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * address notes * gofmt * clean up * gofmt * Update agent/proxycfg/mesh_gateway.go * gofmt * nil check --------- Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>	2024-04-16 10:59:41 -05:00
Nathan Coleman	5e9f02d4be	[NET-8091] Add file-system-certificate config entry for API gateway (#20873 ) * Define file-system-certificate config entry * Collect file-system-certificate(s) referenced by api-gateway onto snapshot * Add file-system-certificate to config entry kind allow lists * Remove inapplicable validation This validation makes sense for inline certificates since Consul server is holding the certificate; however, for file system certificates, Consul server never actually sees the certificate. * Support file-system-certificate as source for listener TLS certificate * Add more required mappings for the new config entry type * Construct proper TLS context based on certificate kind * Add support or SDS in xdscommon * Remove unused param * Adds back verification of certs for inline-certificates * Undo tangential changes to TLS config consumption * Remove stray curly braces * Undo some more tangential changes * Improve function name for generating API gateway secrets * Add changelog entry * Update .changelog/20873.txt Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> * Add some nil-checking, remove outdated TODO * Update test assertions to include file-system-certificate * Add documentation for file-system-certificate config entry Add new doc to nav * Fix grammar mistake * Rename watchmaps, remove outdated TODO --------- Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com> Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>	2024-04-15 16:45:05 -04:00
Michael Zalimeni	a8d08e759f	fix: consume ignored entries in CE downgrade via Ent snapshot (#20977 ) This operation would previously fail due to unconsumed bytes in the decoder buffer when reading the Ent snapshot (the first byte of the record would be misinterpreted as a type indicator, and the remaining bytes would fail to be deserialized or read as invalid data). Ensure restore succeeds by decoding the ignored record as an interface{}, which will consume the record bytes without requiring a concrete target struct, then moving on to the next record.	2024-04-11 21:08:44 +00:00
Eric Haberkorn	e231f0ee9b	Add an agent config option to diable per tenancy usage metrics. (#20976 )	2024-04-11 15:20:09 -04:00
John Murret	d261a987f1	update go-control-plane envoy dependency to 0.12.0 (#20973 ) * update go-control-plane envoy dependency to 0.12.0 * add changelog * go mod tidy * fix linting issues * add agent/grpc-internal to the list of SA1019 ignores	2024-04-10 01:23:04 +00:00
Nathan Coleman	9af713ff17	[NET-5772] Make tcp external service registered on terminating gw reachable from peered cluster (#19881 ) * Include SNI + root PEMs from peered cluster on terminating gw filter chain This allows an external service registered on a terminating gateway to be exported to and reachable from a peered cluster * Abstract existing logic into re-usable function * Regenerate golden files w/ new listener logic * Add changelog entry * Use peering bundles that are stable across test runs	2024-04-03 12:38:09 -04:00
George Ma	44facc2ea3	chore: remove repetitive words (#20890 ) Signed-off-by: availhang <mayangang@outlook.com>	2024-03-28 16:31:55 -07:00
John Murret	39112c7a98	GH-20889 - put conditionals are hcp initialization for consul server (#20926 ) * put conditionals are hcp initialization for consul server * put more things behind configuration flags * add changelog * TestServer_hcpManager * fix TestAgent_scadaProvider	2024-03-28 14:47:11 -06:00
Dan Stough	6026ada0c9	[CE] feat(v2dns): enable v2 dns as default (#20715 ) * feat(v2dns): enable v2 dns as default * changelog	2024-03-25 16:09:01 -04:00
Iryna Shustava	d747b51dab	Handle ACL errors consistently when blocking query timeout is reached. (#20876 ) Currently, when a client starts a blocking query and an ACL token expires within that time, Consul will return ACL not found error with a 403 status code. However, sometimes if an ACL token is invalidated at the same time as the query's deadline is reached, Consul will instead return an empty response with a 200 status code. This is because of the events being executed. 1. Client issues a blocking query request with timeout `t`. 2. ACL is deleted. 3. Server detects a change in ACLs and force closes the gRPC stream. 4. Client resubscribes with the same token and resets its state (view). 5. Client sees "ACL not found" error. If ACL is deleted before step 4, the client is unaware that the stream was closed due to an ACL error and will return an empty view (from the reset state) with the 200 status code. To fix this problem, we introduce another state to the subsciption to indicate when a change to ACLs has occured. If the server sees that there was an error due to ACL change, it will re-authenticate the request and return an error if the token is no longer valid. Fixes #20790	2024-03-22 14:59:54 -06:00
Chris S. Kim	f3f2175edd	Update go-jose library (#20888 )	2024-03-22 10:54:58 -04:00
Derek Menteer	ac83ac1343	Fix streaming RPCs for agentless. (#20868 ) * Fix streaming RPCs for agentless. This PR fixes an issue where cross-dc RPCs were unable to utilize the streaming backend due to having the node name set. The result of this was the agent-cache being utilized, which would cause high cpu utilization and memory consumption due to the fact that it keeps queries alive for 72 hours before purging inactive entries. This resource consumption is compounded by the fact that each pod in consul-k8s gets a unique token. Since the agent-cache uses the token as a component of the key, the same query is duplicated for each pod that is deployed. * Add changelog.	2024-03-15 14:44:51 -05:00
Derek Menteer	0ac8ae6c3b	Fix xDS deadlock due to syncLoop termination. (#20867 ) * Fix xDS deadlock due to syncLoop termination. This fixes an issue where agentless xDS streams can deadlock permanently until a server is restarted. When this issue occurs, no new proxies are able to successfully connect to the server. Effectively, the trigger for this deadlock stems from the following return statement: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202 When this happens, the entire `syncLoop()` terminates and stops consuming from the following channel: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192 Which results in the `ConfigSource.cleanup()` function never receiving a response and holding a mutex indefinitely: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247 Because this mutex is shared, it effectively deadlocks the server's ability to process new xDS streams. ---- The fix to this issue involves removing the `chan chan struct{}` used like an RPC-over-channels pattern and replacing it with two distinct channels: + `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon. + `syncLoopDoneCh` - indicates that the `syncLoop()` has terminated. Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the `syncLoop()` function ensures that the deadlock above should no longer occur. We also now evict xDS connections of all proxies for the corresponding `syncLoop()` whenever it encounters an irrecoverable error. This is done by hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta processing. Prior to this fix, the behavior was to simply orphan them so they would never receive catalog-registration or service-defaults updates. * Add changelog.	2024-03-15 13:57:11 -05:00
Derek Menteer	eabff257d7	Various bug-fixes and improvements (#20866 ) * Shuffle the list of servers returned by `pbserverdiscovery.WatchServers`. This randomizes the list of servers to help reduce the chance of clients all connecting to the same server simultaneously. Consul-dataplane is one such client that does not randomize its own list of servers. * Fix potential goroutine leak in xDS recv loop. This commit ensures that the goroutine which receives xDS messages from proxies will not block forever if the stream's context is cancelled but the `processDelta()` function never consumes the message (due to being terminated). * Add changelog.	2024-03-15 13:10:48 -05:00
sarahalsmiller	262f435800	NET-6821 Disable Terminating Gateway Auto Host Header Rewrite (#20802 ) * disable terminating gateway auto host rewrite * add changelog * clean up unneeded additional snapshot fields * add new field to docs * squash * fix test	2024-03-12 15:37:20 -05:00
Michael Zalimeni	d4761c0ccd	security: upgrade google.golang.org/protobuf to 1.33.0 (#20801 ) Resolves CVE-2024-24786.	2024-03-06 23:04:42 +00:00
Matt Keeler	abe14f11e6	Remove redundant usage metrics (#20674 ) * Remove redundant usage metrics * Add the changelog * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2024-03-05 14:09:47 -05:00
Matt Keeler	5c936fba33	Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage (#20672 ) * Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage * Add changelog	2024-03-01 13:44:55 -05:00
John Murret	a1c6181677	DNS v2 - split up router into multiple responsibilities & break up router tests into multiple files. (#20688 ) * Update agent/dns.go Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com> * PR feedback * split tests out into multiple files. * Extract responsibilities from router into discoveryResultsFetcher, messageSerializer, responseGenerator. * adding recordmaker tests * add response generator test coverage. * changing tests case name based on PR feedback --------- Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>	2024-03-01 15:36:37 +00:00
John Murret	a15a957a36	NET-8056 - v2 DNS Testing Improvements (#20710 ) * NET-8056 - v2 DNS Testing Improvements * adding TestDNSServer_Lifecycle * add license headers to new files.	2024-03-01 05:42:42 -07:00
sarahalsmiller	670ee90a77	Use correct enterprise meta on wildcard service update (#20721 ) * use correct enterprise meta on wildcard service update * changelog * rename changelog file	2024-02-26 12:03:08 -06:00
John Murret	26eed12f04	NET-7813 - DNS : SERVFAIL when resolving PTR records (#20679 ) * NET-7813 - DNS : SERVFAIL when resolving PTR records * Update agent/dns.go Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com> * PR feedback --------- Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>	2024-02-21 17:44:04 +00:00
Semir Patel	943426bc79	v2tenancy: add optional LicenseFeature to type Registration struct (#20673 )	2024-02-20 14:42:31 -06:00
Dan Stough	14efb28086	fix(v2dns): add node ttl to workloads, comment cleanup, and changelog (#20643 ) * fix(v2dns): add node ttl to workloads, plus comment cleanup * docs(v2dns): changelog	2024-02-14 17:38:11 -05:00
Derek Menteer	9f7626d501	Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions (#20642 ) Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions This PR fixes two issues: 1. Not all streams were force closed whenever a snapshot restore happened. This means that anything consuming data from the stream (controllers, queries, etc) were unaware that the data they have is potentially stale / invalid. This first part ensures that all topics are purged. 2. The v1 controllers did not properly handle stream errors (which are likely to appear much more often due to 1 above) and so it introduces a supervisor thread to restart the watches when these errors occur.	2024-02-14 14:17:55 -06:00
Dan Stough	137c9c0973	[CE] Misc cleanup for V2 DNS (#20640 ) * chore: gitignore zed editor * chore(v2dns): remove ent/ce split from router * fix(v2dns): v2 workloads now have tenancy in output * feat(v2dns): support 'cluster' label * chore(v2dns): less chatty debug logs	2024-02-14 12:40:38 -05:00
Melissa Kam	64cd172f30	[CC-7411] Fix environment variable precedence when linking to HCP (#20527 ) Fix so that link API values are used over env vars When a link is created via the API, those values should take precedence over the values set by environment variables. This change loads all the env vars initially as part of the config builder rather than on demand.	2024-02-13 14:06:18 -06:00
Michael Zalimeni	2c1addfd64	[NET-7015] DNS v2 + Catalog v2 int test (#20607 ) test(v2dns): Add Catalog v2 integration test Add a basic integration test covering major functionality tested against Catalog v2 resources. This complements existing tests that ensure compatibility between v1 and v2 DNS when testing against Catalog v1 resources.	2024-02-13 17:40:08 +00:00
Dan Stough	0f0b080514	[CE] feat(v2dns): add v2 style query metrics (#20608 ) feat(v2dns): add v2 style query metrics	2024-02-13 12:08:01 -05:00
Semir Patel	b716a9ef6b	resource: reconcile managed types every ~8hrs (#20606 )	2024-02-13 10:51:54 -06:00
John Murret	7e8f2e5f08	NET-7644/NET-7634 - Implement query lookup for tagged addresses on nodes and services including WAN translation. (#20583 ) NET-7644 - Implement tagged addresses and wan translation	2024-02-12 14:27:25 -05:00
Dan Stough	5802080db1	feat(v2dns): enable peering queries (#20581 )	2024-02-12 14:25:45 -05:00
Nick Cellino	5fb6ab6a3a	Move HCP Manager lifecycle management out of Link controller (#20401 ) * Add function to get update channel for watching HCP Link * Add MonitorHCPLink function This function can be called in a goroutine to manage the lifecycle of the HCP manager. * Update HCP Manager config in link monitor before starting This updates HCPMonitorLink so it updates the HCP manager with an HCP client and management token when a Link is upserted. * Let MonitorHCPManager handle lifecycle instead of link controller * Remove cleanup from Link controller and move it to MonitorHCPLink Previously, the Link Controller was responsible for cleaning up the HCP-related files on the file system. This change makes it so MonitorHCPLink handles this cleanup. As a result, we are able to remove the PlacementEachServer placement strategy for the Link controller because it no longer needs to do this per-node cleanup. * Remove HCP Manager dependency from Link Controller The Link controller does not need to have HCP Manager as a dependency anymore, so this removes that dependency in order to simplify the design. * Add Linked prefix to Linked status variables This is in preparation for adding a new status type to the Link resource. * Add new "validated" status type to link resource The link resource controller will now set a "validated" status in addition to the "linked" status. This is needed so that other components (eg the HCP manager) know when the Link is ready to link with HCP. * Fix tests * Handle new 'EndOfSnapshot' WatchList event * Fix watch test * Remove unnecessary config from TestAgent_scadaProvider Since the Scada provider is now started on agent startup regardless of whether a cloud config is provided, this removes the cloud config override from the relevant test. This change is not exactly related to the changes from this PR, but rather is something small and sort of related that was noticed while working on this PR. * Simplify link watch test and remove sleep from link watch This updates the link watch test so that it uses more mocks and does not require setting up the infrastructure for the HCP Link controller. This also removes the time.Sleep delay in the link watcher loop in favor of an error counter. When we receive 10 consecutive errors, we shut down the link watcher loop. * Add better logging for link validation. Remove EndOfSnapshot test. * Refactor link monitor test into a table test * Add some clarifying comments to link monitor * Simplify link watch test * Test a bunch more errors cases in link monitor test * Use exponential backoff instead of errorCounter in LinkWatch * Move link watch and link monitor into a single goroutine called from server.go * Refactor HCP link watcher to use single go-routine. Previously, if the WatchClient errored, we would've never recovered because we never retry to create the stream. With this change, we have a single goroutine that runs for the life of the server agent and if the WatchClient stream ever errors, we retry the creation of the stream with an exponential backoff.	2024-02-12 10:48:23 -05:00
John Murret	c8e4cea69c	set up ent and CE specific DNS tests to be able to run v1 and v2 (#20571 )	2024-02-09 15:53:56 -07:00
Dan Stough	01001f630e	feat(v2dns): catalog v2 service query support (#20564 )	2024-02-09 17:41:40 -05:00
Dan Stough	24e15cc24e	feat(v2dns): prepared query ttls (#20563 )	2024-02-09 11:26:02 -05:00
John Murret	7cac918811	NET-7637 / NET-7659/NET-7636/NET-7647/NET-7648/NET-7646/NET-7649/NET-7645 - Multiple DNS v2 fixes (#20556 )	2024-02-08 19:56:04 -07:00
Derek Menteer	a1c8d4dd19	Decouple xds capacity controller and raft-autopilot (#20511 ) Decouple xds capacity controller and autopilot This prevents a potential bug where autopilot deadlocks while attempting to execute `AutopilotDelegate.NotifyState()` on an xdscapacity controller that stopped consuming messages.	2024-02-08 15:31:44 -06:00
Chris S. Kim	26661a1c3b	Add default intention policy (#20544 )	2024-02-08 20:25:42 +00:00
Joshua Timmons	242b777547	Fix logging when we fail to export metrics to hcp (#20514 )	2024-02-08 11:00:47 -05:00
Joshua Timmons	c790740cc6	Fix: avoid redundant logs on failures to export metrics (#20519 )	2024-02-08 11:00:20 -05:00
John Murret	8ac54707d6	DNS v2 Multiple fixes. (#20525 ) * DNS v2 Multiple fixes. * add license header * get rid of DefaultIntentionPolicy change that was not supposed to be there.	2024-02-07 21:24:00 -07:00
Nathan Coleman	45d645471b	[NET-7414] Reconcile PST for mesh gateway workloads on change to ComputedExportedServices (#20271 ) * Reconcile ProxyStateTemplate on change to ComputedExportedServices * gofmt changeset --------- Co-authored-by: NiniOak <anita.akaeze@hashicorp.com>	2024-02-07 21:27:13 +00:00
skpratt	57bad0df85	add traffic permissions excludes and tests (#20453 ) * add traffic permissions tests * review fixes * Update internal/mesh/internal/controllers/sidecarproxy/builder/local_app.go Co-authored-by: John Landa <jonathanlanda@gmail.com> --------- Co-authored-by: John Landa <jonathanlanda@gmail.com>	2024-02-07 20:21:44 +00:00
Eric Haberkorn	1bd253021b	V1 Compat Exported Services Controller Optimizations (#20517 ) V1 compat exported services controller optimizations * Don't start the v2 exported services controller in v1 mode. * Use the controller cache.	2024-02-07 14:05:42 -05:00
Matt Keeler	49e6c0232d	Panic for unregistered types (#20476 ) * Panic when controllers attempt to make invalid requests to the resource service This will help to catch bugs in tests that could cause infinite errors to be emitted. * Disable the API GW v2 controller With the previous commit, this would cause a server to panic due to watching a type which has not yet been created/registered. * Ensure that a test server gets the full type registry instead of constructing its own * Skip TestServer_ControllerDependencies * Fix peering tests so that they use the full resource registry.	2024-02-06 11:23:06 -05:00
Dan Stough	fcc43a9a36	feat(v2dns): catalog v2 SOA and NS support (#20480 )	2024-02-06 11:12:04 -05:00

1 2 3 4 5 ...

5493 Commits