consul

Commit Graph

Author	SHA1	Message	Date
R.B. Boyer	1535844c62	gossip: refactor some gossip related libraries into a central place (#21036 ) This refactors and relocates the following packages to live under internal/gossip instead of either in the toplevel lib or agent/consul: - librtt : related to serf coordinates - libserf : random serf stuff	2024-05-07 10:30:49 -05:00
Nathan Coleman	5e9f02d4be	[NET-8091] Add file-system-certificate config entry for API gateway (#20873 ) * Define file-system-certificate config entry * Collect file-system-certificate(s) referenced by api-gateway onto snapshot * Add file-system-certificate to config entry kind allow lists * Remove inapplicable validation This validation makes sense for inline certificates since Consul server is holding the certificate; however, for file system certificates, Consul server never actually sees the certificate. * Support file-system-certificate as source for listener TLS certificate * Add more required mappings for the new config entry type * Construct proper TLS context based on certificate kind * Add support or SDS in xdscommon * Remove unused param * Adds back verification of certs for inline-certificates * Undo tangential changes to TLS config consumption * Remove stray curly braces * Undo some more tangential changes * Improve function name for generating API gateway secrets * Add changelog entry * Update .changelog/20873.txt Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> * Add some nil-checking, remove outdated TODO * Update test assertions to include file-system-certificate * Add documentation for file-system-certificate config entry Add new doc to nav * Fix grammar mistake * Rename watchmaps, remove outdated TODO --------- Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com> Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>	2024-04-15 16:45:05 -04:00
Michael Zalimeni	a8d08e759f	fix: consume ignored entries in CE downgrade via Ent snapshot (#20977 ) This operation would previously fail due to unconsumed bytes in the decoder buffer when reading the Ent snapshot (the first byte of the record would be misinterpreted as a type indicator, and the remaining bytes would fail to be deserialized or read as invalid data). Ensure restore succeeds by decoding the ignored record as an interface{}, which will consume the record bytes without requiring a concrete target struct, then moving on to the next record.	2024-04-11 21:08:44 +00:00
Eric Haberkorn	e231f0ee9b	Add an agent config option to diable per tenancy usage metrics. (#20976 )	2024-04-11 15:20:09 -04:00
John Murret	39112c7a98	GH-20889 - put conditionals are hcp initialization for consul server (#20926 ) * put conditionals are hcp initialization for consul server * put more things behind configuration flags * add changelog * TestServer_hcpManager * fix TestAgent_scadaProvider	2024-03-28 14:47:11 -06:00
Dan Stough	6026ada0c9	[CE] feat(v2dns): enable v2 dns as default (#20715 ) * feat(v2dns): enable v2 dns as default * changelog	2024-03-25 16:09:01 -04:00
Iryna Shustava	d747b51dab	Handle ACL errors consistently when blocking query timeout is reached. (#20876 ) Currently, when a client starts a blocking query and an ACL token expires within that time, Consul will return ACL not found error with a 403 status code. However, sometimes if an ACL token is invalidated at the same time as the query's deadline is reached, Consul will instead return an empty response with a 200 status code. This is because of the events being executed. 1. Client issues a blocking query request with timeout `t`. 2. ACL is deleted. 3. Server detects a change in ACLs and force closes the gRPC stream. 4. Client resubscribes with the same token and resets its state (view). 5. Client sees "ACL not found" error. If ACL is deleted before step 4, the client is unaware that the stream was closed due to an ACL error and will return an empty view (from the reset state) with the 200 status code. To fix this problem, we introduce another state to the subsciption to indicate when a change to ACLs has occured. If the server sees that there was an error due to ACL change, it will re-authenticate the request and return an error if the token is no longer valid. Fixes #20790	2024-03-22 14:59:54 -06:00
Chris S. Kim	f3f2175edd	Update go-jose library (#20888 )	2024-03-22 10:54:58 -04:00
sarahalsmiller	262f435800	NET-6821 Disable Terminating Gateway Auto Host Header Rewrite (#20802 ) * disable terminating gateway auto host rewrite * add changelog * clean up unneeded additional snapshot fields * add new field to docs * squash * fix test	2024-03-12 15:37:20 -05:00
Matt Keeler	abe14f11e6	Remove redundant usage metrics (#20674 ) * Remove redundant usage metrics * Add the changelog * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> --------- Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2024-03-05 14:09:47 -05:00
Matt Keeler	5c936fba33	Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage (#20672 ) * Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage * Add changelog	2024-03-01 13:44:55 -05:00
sarahalsmiller	670ee90a77	Use correct enterprise meta on wildcard service update (#20721 ) * use correct enterprise meta on wildcard service update * changelog * rename changelog file	2024-02-26 12:03:08 -06:00
Semir Patel	943426bc79	v2tenancy: add optional LicenseFeature to type Registration struct (#20673 )	2024-02-20 14:42:31 -06:00
Derek Menteer	9f7626d501	Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions (#20642 ) Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions This PR fixes two issues: 1. Not all streams were force closed whenever a snapshot restore happened. This means that anything consuming data from the stream (controllers, queries, etc) were unaware that the data they have is potentially stale / invalid. This first part ensures that all topics are purged. 2. The v1 controllers did not properly handle stream errors (which are likely to appear much more often due to 1 above) and so it introduces a supervisor thread to restart the watches when these errors occur.	2024-02-14 14:17:55 -06:00
Semir Patel	b716a9ef6b	resource: reconcile managed types every ~8hrs (#20606 )	2024-02-13 10:51:54 -06:00
Nick Cellino	5fb6ab6a3a	Move HCP Manager lifecycle management out of Link controller (#20401 ) * Add function to get update channel for watching HCP Link * Add MonitorHCPLink function This function can be called in a goroutine to manage the lifecycle of the HCP manager. * Update HCP Manager config in link monitor before starting This updates HCPMonitorLink so it updates the HCP manager with an HCP client and management token when a Link is upserted. * Let MonitorHCPManager handle lifecycle instead of link controller * Remove cleanup from Link controller and move it to MonitorHCPLink Previously, the Link Controller was responsible for cleaning up the HCP-related files on the file system. This change makes it so MonitorHCPLink handles this cleanup. As a result, we are able to remove the PlacementEachServer placement strategy for the Link controller because it no longer needs to do this per-node cleanup. * Remove HCP Manager dependency from Link Controller The Link controller does not need to have HCP Manager as a dependency anymore, so this removes that dependency in order to simplify the design. * Add Linked prefix to Linked status variables This is in preparation for adding a new status type to the Link resource. * Add new "validated" status type to link resource The link resource controller will now set a "validated" status in addition to the "linked" status. This is needed so that other components (eg the HCP manager) know when the Link is ready to link with HCP. * Fix tests * Handle new 'EndOfSnapshot' WatchList event * Fix watch test * Remove unnecessary config from TestAgent_scadaProvider Since the Scada provider is now started on agent startup regardless of whether a cloud config is provided, this removes the cloud config override from the relevant test. This change is not exactly related to the changes from this PR, but rather is something small and sort of related that was noticed while working on this PR. * Simplify link watch test and remove sleep from link watch This updates the link watch test so that it uses more mocks and does not require setting up the infrastructure for the HCP Link controller. This also removes the time.Sleep delay in the link watcher loop in favor of an error counter. When we receive 10 consecutive errors, we shut down the link watcher loop. * Add better logging for link validation. Remove EndOfSnapshot test. * Refactor link monitor test into a table test * Add some clarifying comments to link monitor * Simplify link watch test * Test a bunch more errors cases in link monitor test * Use exponential backoff instead of errorCounter in LinkWatch * Move link watch and link monitor into a single goroutine called from server.go * Refactor HCP link watcher to use single go-routine. Previously, if the WatchClient errored, we would've never recovered because we never retry to create the stream. With this change, we have a single goroutine that runs for the life of the server agent and if the WatchClient stream ever errors, we retry the creation of the stream with an exponential backoff.	2024-02-12 10:48:23 -05:00
Derek Menteer	a1c8d4dd19	Decouple xds capacity controller and raft-autopilot (#20511 ) Decouple xds capacity controller and autopilot This prevents a potential bug where autopilot deadlocks while attempting to execute `AutopilotDelegate.NotifyState()` on an xdscapacity controller that stopped consuming messages.	2024-02-08 15:31:44 -06:00
Chris S. Kim	26661a1c3b	Add default intention policy (#20544 )	2024-02-08 20:25:42 +00:00
Nathan Coleman	45d645471b	[NET-7414] Reconcile PST for mesh gateway workloads on change to ComputedExportedServices (#20271 ) * Reconcile ProxyStateTemplate on change to ComputedExportedServices * gofmt changeset --------- Co-authored-by: NiniOak <anita.akaeze@hashicorp.com>	2024-02-07 21:27:13 +00:00
Eric Haberkorn	1bd253021b	V1 Compat Exported Services Controller Optimizations (#20517 ) V1 compat exported services controller optimizations * Don't start the v2 exported services controller in v1 mode. * Use the controller cache.	2024-02-07 14:05:42 -05:00
Matt Keeler	49e6c0232d	Panic for unregistered types (#20476 ) * Panic when controllers attempt to make invalid requests to the resource service This will help to catch bugs in tests that could cause infinite errors to be emitted. * Disable the API GW v2 controller With the previous commit, this would cause a server to panic due to watching a type which has not yet been created/registered. * Ensure that a test server gets the full type registry instead of constructing its own * Skip TestServer_ControllerDependencies * Fix peering tests so that they use the full resource registry.	2024-02-06 11:23:06 -05:00
Tauhid Anjum	88b8a1cc36	NET-6776 - Update Routes controller to use ComputedFailoverPolicy CE (#20496 ) Update Routes controller to use ComputedFailoverPolicy	2024-02-06 13:28:18 +05:30
Derek Menteer	922844b8e0	Fix issue with persisting proxy-defaults (#20481 ) Fix issue with persisting proxy-defaults This resolves an issue introduced in hashicorp/consul#19829 where the proxy-defaults configuration entry with an HTTP protocol cannot be updated after it has been persisted once and a router exists. This occurs because the protocol field is not properly pre-computed before being passed into validation functions.	2024-02-05 16:00:19 -06:00
Eric Haberkorn	543c6a30af	Trigger the V1 Compat exported-services Controller when V1 Config Entries are Updated (#20456 ) * Trigger the v1 compat exported-services controller when the v1 config entry is modified. * Hook up exported-services config entries to the event publisher. * Add tests to the v2 exported services shim. * Use the local materializer trigger updates on the v1 compat exported services controller when exported-services config entries are modified. * stop sleeping when context is cancelled	2024-02-02 15:30:04 -05:00
Eric Haberkorn	d0243b618d	Change the multicluster group to v2 (#20430 )	2024-02-01 12:08:26 -05:00
wangxinyi7	3b44be530d	only forwarding the resource service traffic in client agent to server agent (#20347 ) * only forwarding the resource service traffic in client agent to server agent	2024-01-31 12:05:47 -08:00
Nick Ethier	383d92e9ab	hcp.v2.TelemetryState resource and controller implementation (#20257 ) * pbhcp: add TelemetryState resource * agent/hcp: add GetObservabilitySecrets to client * internal/hcp: add TelemetryState controller logic * hcp/telemetry-state: added config options for hcp sdk and debug key to skip deletion during reconcile * pbhcp: update proto documentation * hcp: address PR feedback, additional validations and code cleanup * internal/hcp: fix type sig change in test * update testdata/v2-resource-dependencies	2024-01-31 14:47:05 -05:00
Ronald	8799c36410	[NET-6231] Handle Partition traffic permissions when reconciling traffic permissions (#20408 ) [NET-6231] Partition traffic permissions Co-authored-by: Chris S. Kim <ckim@hashicorp.com>	2024-01-30 22:14:32 +00:00
Chris S. Kim	7cc88a1577	Handle NamespaceTrafficPermissions when reconciling TrafficPermissions (#20407 )	2024-01-30 21:31:25 +00:00
Melissa Kam	b0e87dbe13	[CC-7049] Stop the HCP manager when link is deleted (#20351 ) * Add Stop method to telemetry provider Stop the main loop of the provider and set the config to disabled. * Add interface for telemetry provider Added for easier testing. Also renamed Run to Start, which better fits with Stop. * Add Stop method to HCP manager * Add manager interface, rename implementation Add interface for easier testing, rename existing Manager to HCPManager. * Stop HCP manager in link Finalizer * Attempt to cleanup if resource has been deleted The link should be cleaned up by the finalizer, but there's an edge case in a multi-server setup where the link is fully deleted on one server before the other server reconciles. This will cover the case where the reconcile happens after the resource is deleted. * Add a delete mananagement token function Passes a function to the HCP manager that deletes the management token that was initially created by the manager. * Delete token as part of stopping the manager * Lock around disabling config, remove descriptions	2024-01-30 09:40:36 -06:00
Melissa Kam	3b9bb8d6f9	[CC-7044] Start HCP manager as part of link creation (#20312 ) * Check for ACL write permissions on write Link eventually will be creating a token, so require acl:write. * Convert Run to Start, only allow to start once * Always initialize HCP components at startup * Support for updating config and client * Pass HCP manager to controller * Start HCP manager in link resource Start as part of link creation rather than always starting. Update the HCP manager with values from the link before starting as well. * Fix metrics sink leaked goroutine * Remove the hardcoded disabled hostname prefix The HCP metrics sink will always be enabled, so the length of sinks will always be greater than zero. This also means that we will also always default to prefixing metrics with the hostname, which is what our documentation states is the expected behavior anyway. * Add changelog * Check and set running status in one method * Check for primary datacenter, add back test * Clarify merge reasoning, fix timing issue in test * Add comment about controller placement * Expand on breaking change, fix typo in changelog	2024-01-29 16:31:44 -06:00
Matt Keeler	34a32d4ce5	Remove V2 PeerName field from pbresource.Tenancy (#19865 ) The peer name will eventually show up elsewhere in the resource. For now though this rips it out of where we don’t want it to be.	2024-01-29 15:08:31 -05:00
sarahalsmiller	37ebaa6920	Net 7155- Consul API Gateway Controller Stub Work (#20324 ) * API Gateway proto * fix lint issue * new line * run make proto format * checkpoint * stub * Update internal/mesh/internal/controllers/apigateways/controller.go	2024-01-25 23:16:20 +00:00
Melissa Kam	7900544249	[CC-7063] Fetch HCP agent bootstrap config in Link reconciler (#20306 ) * Move config-dependent methods to separate package In order to reuse the fetching and file creation part of the bootstrap package, move the code that would cause cyclical dependencies to a different package. * Export needed bootstrap methods and variables Also add back validating persisted config and update tests. * Add support to check for just management token Add a new method that fetches the bootstrap configuration only if there isn't a valid management token file instead of checking for all the hcp-config files. * Pass data dir as a dependency to link controller The link controller needs to check the data directory for the hcp-config files. * Fetch bootstrap config for token in controller Load the management token when reconciling a link resource, which will fetch the agent boostrap configuration if the token is not already persisted locally. Skip this step if the cluster is in read-only mode. * Validate resource ID format in link creation * Handle unauthorized and forbidden errors Check for 401 and 403s when making GNM requests, exit bootstrap fetch loop and return specific failure statuses for link. * Move test function to a testing file * Log load and status write errors	2024-01-24 09:51:43 -06:00
aahel	3446eb3b1b	added computed failover controller (#20329 ) * added computed failover controller * removed some uncessary changes * removed uncessary changes * minor refactor * minor refactor fmt * added copyright	2024-01-24 11:50:27 +05:30
skpratt	44bcda8523	Net 7074/decentralized exported services management (#20318 ) * Add decentralized management of V1 exported-services config entries using V2 multicluster resources. * cleanup --------- Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>	2024-01-23 19:44:10 -06:00
Tauhid Anjum	5d294b26d3	NET-5824 Exported services api (#20015 ) * Exported services api implemented * Tests added, refactored code * Adding server tests * changelog added * Proto gen added * Adding codegen changes * changing url, response object * Fixing lint error by having namespace and partition directly * Tests changes * refactoring tests * Simplified uniqueness logic for exported services, sorted the response in order of service name * Fix lint errors, refactored code	2024-01-23 10:06:59 +05:30
R.B. Boyer	2e08a7e1c7	v2: prevent use of the v2 experiments in secondary datacenters for now (#20299 ) Ultimately we will have to rectify wan federation with v2 catalog adjacent experiments, but for now blanket prevent usage of the resource-apis, v2dns, and v2tenancy experiments in secondary datacenters.	2024-01-19 16:31:49 -06:00
Nick Cellino	37a5fddffa	Create HCP management token in HCP manager (#19830 ) * Create HCP management token in HCP manager * Change InitializeManagementToken to ManagementTokenUpserter * Implement and use management token upsert function * Fix race condition in test * Add idea for improvement as comment * Return early in upsertManagementToken if token exists	2024-01-19 13:58:49 -05:00
Melissa Kam	98c9702ba3	[CC-7031] Add initialization support to resource controllers (#20138 ) * Add Initializer to the controller The Initializer adds support for running any required initialization steps when the controller is first started. * Implement HCP Link initializer The link initializer will create a Link resource if the cloud configuration has been set. * Simplify retry logic and testing * Remove internal retry, replace with logging logic	2024-01-19 11:47:48 -06:00
Matt Keeler	f9c04881f9	Failover policy cache (#20244 ) * Migrate the Failover controller to use the controller cache * Remove the Catalog FailoverMapper and its usage in the mesh routes controller.	2024-01-19 09:35:34 -05:00
Dhia Ayachi	d641998641	Fix to not create a watch to `Internal.ServiceDump` when mesh gateway is not used (#20168 ) This add a fix to properly verify the gateway mode before creating a watch specific to mesh gateways. This watch have a high performance cost and when mesh gateways are not used is not used. This also adds an optimization to only return the nodes when watching the Internal.ServiceDump RPC to avoid unnecessary disco chain compilation. As watches in proxy config only need the nodes.	2024-01-18 16:44:53 -06:00
Melissa Kam	c112a6632d	[CC-7042] Update and enable the HCP metrics sink in the HCP manager (#20072 ) * Option to set HCP client at runtime Allows us to initially set a nil HCP client for the telemetry provider and update it later. * Set telemetry provider HCP client in HCP manager Set the telemetry provider as a dependency and pass it to the manager. Update the telemetry provider's HCP client when the HCP manager starts. * Add a provider interface for the metrics client This provider will allow us to configure and reconfigure the retryable HTTP client and the headers for the metrics client. * Move HTTP retryable client to separate file Copied directly from the metrics client. * Abstract HCP specific values in HTTP client Remove HCP specific references and instead initiate with a generic TLS configuration and authentication source. * Set up HTTP client and headers in the provider Move setup from the metrics client to the HCP telemetry provider. * Update the telemetry provider in the HCP manager Initialize the provider without the HCP configs and then update it in the HCP manager to enable it. * Improve test assertion, fix method comment * Move client provider to metrics client * Stop the manager on setup error * Add separate lock for http configuration * Start telemetry provider in HCP manager * Update HCP client and config as part of Run * Remove option to set config at initialization * Simplify and clean up setting HCP configs * Add test for telemetry provider Run method * Fix race condition * Use clone of HTTP headers * Only allow initial update and run once	2024-01-16 10:46:12 -06:00
Matt Keeler	326c0ecfbe	In-Memory gRPC (#19942 ) * Implement In-Process gRPC for use by controller caching/indexing This replaces the pipe base listener implementation we were previously using. The new style CAN avoid cloning resources which our controller caching/indexing is taking advantage of to not duplicate resource objects in memory. To maintain safety for controllers and for them to be able to modify data they get back from the cache and the resource service, the client they are presented in their runtime will be wrapped with an autogenerated client which clones request and response messages as they pass through the client. Another sizable change in this PR is to consolidate how server specific gRPC services get registered and managed. Before this was in a bunch of different methods and it was difficult to track down how gRPC services were registered. Now its all in one place. * Fix race in tests * Ensure the resource service is registered to the multiplexed handler for forwarding from client agents * Expose peer streaming on the internal handler	2024-01-12 11:54:07 -05:00
John Murret	3fa4a21edd	remove the skipping of slow tests in go-tests-ce and go-test-enterprise (#20139 ) * remove the skipping of slow tests in go-tests-ce and go-test-enterprise * add license header	2024-01-10 20:39:34 -07:00
Dan Stough	d52e80b619	[OSS] feat: add experiments flag for v2 dns and skeleton interfaces (#20115 ) feat: add experiments flag for v2 dns and skeleton interfaces	2024-01-10 11:19:20 -05:00
Derek Menteer	131ef2a133	Fix broken tests. (#20134 )	2024-01-09 14:57:27 -06:00
Derek Menteer	6854e1e90d	Fix broken tests. (#20130 ) This fixes some tests that were broken, but not caught, due to the CICD pipeline only running a subset of the overall tests on PRs.	2024-01-09 13:45:29 -06:00
Nick Cellino	0deebaf637	Add Link resource type and controller skeleton (#19788 ) * Add HCCLink resource type * Register HCCLink resource type with basic validation * Add validation for required fields * Add test for default ACLs * Add no-op controller for HCCLink * Add resource-apis semantic validation check in hcclink controller * Add copyright headers * Rename HCCLink to Link * Add hcp_cluster_url to link proto * Update 'disabled' reason with more detail * Update link status name to consul.io/hcp/link * Change link version from v1 to v2 * Use feature flag/experiment to enable v2 resources with HCP	2024-01-09 13:57:59 -05:00
Melissa Kam	5dc8eabcce	[CC-7041] Update and start the SCADA provider in HCP manager (#19976 ) * Update SCADA provider version Also update mocks for SCADA provider. * Create SCADA provider w/o HCP config, then update Adds a placeholder config option to allow us to initialize a SCADA provider without the HCP configuration. Also adds an update method to then add the HCP configuration. We need this to be able to eventually always register a SCADA listener at startup before the HCP config values are known. * Pass cloud configuration to HCP manager Save the entire cloud configuration and pass it to the HCP manager. * Update and start SCADA provider in HCP manager Move config updating and starting to the HCP manager. The HCP manager will eventually be responsible for all processes that contribute to linking to HCP.	2024-01-08 09:49:29 -06:00

1 2 3 4 5 ...

2299 Commits