Commit Graph

2299 Commits

Author SHA1 Message Date
R.B. Boyer 1535844c62
gossip: refactor some gossip related libraries into a central place (#21036)
This refactors and relocates the following packages to live under internal/gossip instead of either in the toplevel lib or agent/consul:

- librtt : related to serf coordinates
- libserf : random serf stuff
2024-05-07 10:30:49 -05:00
Nathan Coleman 5e9f02d4be
[NET-8091] Add file-system-certificate config entry for API gateway (#20873)
* Define file-system-certificate config entry

* Collect file-system-certificate(s) referenced by api-gateway onto snapshot

* Add file-system-certificate to config entry kind allow lists

* Remove inapplicable validation

This validation makes sense for inline certificates since Consul server is holding the certificate; however, for file system certificates, Consul server never actually sees the certificate.

* Support file-system-certificate as source for listener TLS certificate

* Add more required mappings for the new config entry type

* Construct proper TLS context based on certificate kind

* Add support or SDS in xdscommon

* Remove unused param

* Adds back verification of certs for inline-certificates

* Undo tangential changes to TLS config consumption

* Remove stray curly braces

* Undo some more tangential changes

* Improve function name for generating API gateway secrets

* Add changelog entry

* Update .changelog/20873.txt

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Add some nil-checking, remove outdated TODO

* Update test assertions to include file-system-certificate

* Add documentation for file-system-certificate config entry

Add new doc to nav

* Fix grammar mistake

* Rename watchmaps, remove outdated TODO

---------

Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
2024-04-15 16:45:05 -04:00
Michael Zalimeni a8d08e759f
fix: consume ignored entries in CE downgrade via Ent snapshot (#20977)
This operation would previously fail due to unconsumed bytes in the
decoder buffer when reading the Ent snapshot (the first byte of the
record would be misinterpreted as a type indicator, and the remaining
bytes would fail to be deserialized or read as invalid data).

Ensure restore succeeds by decoding the ignored record as an
interface{}, which will consume the record bytes without requiring a
concrete target struct, then moving on to the next record.
2024-04-11 21:08:44 +00:00
Eric Haberkorn e231f0ee9b
Add an agent config option to diable per tenancy usage metrics. (#20976) 2024-04-11 15:20:09 -04:00
John Murret 39112c7a98
GH-20889 - put conditionals are hcp initialization for consul server (#20926)
* put conditionals are hcp initialization for consul server

* put more things behind configuration flags

* add changelog

* TestServer_hcpManager

* fix TestAgent_scadaProvider
2024-03-28 14:47:11 -06:00
Dan Stough 6026ada0c9
[CE] feat(v2dns): enable v2 dns as default (#20715)
* feat(v2dns): enable v2 dns as default

* changelog
2024-03-25 16:09:01 -04:00
Iryna Shustava d747b51dab
Handle ACL errors consistently when blocking query timeout is reached. (#20876)
Currently, when a client starts a blocking query and an ACL token expires within
that time, Consul will return ACL not found error with a 403 status code. However,
sometimes if an ACL token is invalidated at the same time as the query's deadline is reached,
Consul will instead return an empty response with a 200 status code.

This is because of the events being executed.
1. Client issues a blocking query request with timeout `t`.
2. ACL is deleted.
3. Server detects a change in ACLs and force closes the gRPC stream.
4. Client resubscribes with the same token and resets its state (view).
5. Client sees "ACL not found" error.

If ACL is deleted before step 4, the client is unaware that the stream was closed due to
an ACL error and will return an empty view (from the reset state) with the 200 status code.

To fix this problem, we introduce another state to the subsciption to indicate when a change
to ACLs has occured. If the server sees that there was an error due to ACL change, it will
re-authenticate the request and return an error if the token is no longer valid.

Fixes #20790
2024-03-22 14:59:54 -06:00
Chris S. Kim f3f2175edd
Update go-jose library (#20888) 2024-03-22 10:54:58 -04:00
sarahalsmiller 262f435800
NET-6821 Disable Terminating Gateway Auto Host Header Rewrite (#20802)
* disable terminating gateway auto host rewrite

* add changelog

* clean up unneeded additional snapshot fields

* add new field to docs

* squash

* fix test
2024-03-12 15:37:20 -05:00
Matt Keeler abe14f11e6
Remove redundant usage metrics (#20674)
* Remove redundant usage metrics

* Add the changelog

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2024-03-05 14:09:47 -05:00
Matt Keeler 5c936fba33
Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage (#20672)
* Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage

* Add changelog
2024-03-01 13:44:55 -05:00
sarahalsmiller 670ee90a77
Use correct enterprise meta on wildcard service update (#20721)
* use correct enterprise meta on wildcard service update

* changelog

* rename changelog file
2024-02-26 12:03:08 -06:00
Semir Patel 943426bc79
v2tenancy: add optional LicenseFeature to type Registration struct (#20673) 2024-02-20 14:42:31 -06:00
Derek Menteer 9f7626d501
Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions (#20642)
Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions

This PR fixes two issues:

1. Not all streams were force closed whenever a snapshot restore happened. This means that anything consuming data from the stream (controllers, queries, etc) were unaware that the data they have is potentially stale / invalid. This first part ensures that all topics are purged.

2. The v1 controllers did not properly handle stream errors (which are likely to appear much more often due to 1 above) and so it introduces a supervisor thread to restart the watches when these errors occur.
2024-02-14 14:17:55 -06:00
Semir Patel b716a9ef6b
resource: reconcile managed types every ~8hrs (#20606) 2024-02-13 10:51:54 -06:00
Nick Cellino 5fb6ab6a3a
Move HCP Manager lifecycle management out of Link controller (#20401)
* Add function to get update channel for watching HCP Link

* Add MonitorHCPLink function

This function can be called in a goroutine to manage the lifecycle
of the HCP manager.

* Update HCP Manager config in link monitor before starting

This updates HCPMonitorLink so it updates the HCP manager
with an HCP client and management token when a Link is upserted.

* Let MonitorHCPManager handle lifecycle instead of link controller

* Remove cleanup from Link controller and move it to MonitorHCPLink

Previously, the Link Controller was responsible for cleaning up the
HCP-related files on the file system. This change makes it so
MonitorHCPLink handles this cleanup. As a result, we are able to remove
the PlacementEachServer placement strategy for the Link controller
because it no longer needs to do this per-node cleanup.

* Remove HCP Manager dependency from Link Controller

The Link controller does not need to have HCP Manager
as a dependency anymore, so this removes that dependency
in order to simplify the design.

* Add Linked prefix to Linked status variables

This is in preparation for adding a new status type to the
Link resource.

* Add new "validated" status type to link resource

The link resource controller will now set a "validated" status
in addition to the "linked" status. This is needed so that other
components (eg the HCP manager) know when the Link is ready to link
with HCP.

* Fix tests

* Handle new 'EndOfSnapshot' WatchList event

* Fix watch test

* Remove unnecessary config from TestAgent_scadaProvider

Since the Scada provider is now started on agent startup
regardless of whether a cloud config is provided, this removes
the cloud config override from the relevant test.

This change is not exactly related to the changes from this PR,
but rather is something small and sort of related that was noticed
while working on this PR.

* Simplify link watch test and remove sleep from link watch

This updates the link watch test so that it uses more mocks
and does not require setting up the infrastructure for the HCP Link
controller.

This also removes the time.Sleep delay in the link watcher loop in favor
of an error counter. When we receive 10 consecutive errors, we shut down
the link watcher loop.

* Add better logging for link validation. Remove EndOfSnapshot test.

* Refactor link monitor test into a table test

* Add some clarifying comments to link monitor

* Simplify link watch test

* Test a bunch more errors cases in link monitor test

* Use exponential backoff instead of errorCounter in LinkWatch

* Move link watch and link monitor into a single goroutine called from server.go

* Refactor HCP link watcher to use single go-routine.

Previously, if the WatchClient errored, we would've never recovered
because we never retry to create the stream. With this change,
we have a single goroutine that runs for the life of the server agent
and if the WatchClient stream ever errors, we retry the creation
of the stream with an exponential backoff.
2024-02-12 10:48:23 -05:00
Derek Menteer a1c8d4dd19
Decouple xds capacity controller and raft-autopilot (#20511)
Decouple xds capacity controller and autopilot

This prevents a potential bug where autopilot deadlocks while attempting
to execute `AutopilotDelegate.NotifyState()` on an xdscapacity controller
that stopped consuming messages.
2024-02-08 15:31:44 -06:00
Chris S. Kim 26661a1c3b
Add default intention policy (#20544) 2024-02-08 20:25:42 +00:00
Nathan Coleman 45d645471b
[NET-7414] Reconcile PST for mesh gateway workloads on change to ComputedExportedServices (#20271)
* Reconcile ProxyStateTemplate on change to ComputedExportedServices

* gofmt changeset

---------

Co-authored-by: NiniOak <anita.akaeze@hashicorp.com>
2024-02-07 21:27:13 +00:00
Eric Haberkorn 1bd253021b
V1 Compat Exported Services Controller Optimizations (#20517)
V1 compat exported services controller optimizations

* Don't start the v2 exported services controller in v1 mode.
* Use the controller cache.
2024-02-07 14:05:42 -05:00
Matt Keeler 49e6c0232d
Panic for unregistered types (#20476)
* Panic when controllers attempt to make invalid requests to the resource service

This will help to catch bugs in tests that could cause infinite errors to be emitted.

* Disable the API GW v2 controller

With the previous commit, this would cause a server to panic due to watching a type which has not yet been created/registered.

* Ensure that a test server gets the full type registry instead of constructing its own

* Skip TestServer_ControllerDependencies

* Fix peering tests so that they use the full resource registry.
2024-02-06 11:23:06 -05:00
Tauhid Anjum 88b8a1cc36
NET-6776 - Update Routes controller to use ComputedFailoverPolicy CE (#20496)
Update Routes controller to use ComputedFailoverPolicy
2024-02-06 13:28:18 +05:30
Derek Menteer 922844b8e0
Fix issue with persisting proxy-defaults (#20481)
Fix issue with persisting proxy-defaults

This resolves an issue introduced in hashicorp/consul#19829
where the proxy-defaults configuration entry with an HTTP protocol
cannot be updated after it has been persisted once and a router
exists. This occurs because the protocol field is not properly
pre-computed before being passed into validation functions.
2024-02-05 16:00:19 -06:00
Eric Haberkorn 543c6a30af
Trigger the V1 Compat exported-services Controller when V1 Config Entries are Updated (#20456)
* Trigger the v1 compat exported-services controller when the v1 config entry is modified.

* Hook up exported-services config entries to the event publisher.
* Add tests to the v2 exported services shim.
* Use the local materializer trigger updates on the v1 compat exported services controller when exported-services config entries are modified.

* stop sleeping when context is cancelled
2024-02-02 15:30:04 -05:00
Eric Haberkorn d0243b618d
Change the multicluster group to v2 (#20430) 2024-02-01 12:08:26 -05:00
wangxinyi7 3b44be530d
only forwarding the resource service traffic in client agent to server agent (#20347)
* only forwarding the resource service traffic in client agent to server agent
2024-01-31 12:05:47 -08:00
Nick Ethier 383d92e9ab
hcp.v2.TelemetryState resource and controller implementation (#20257)
* pbhcp: add TelemetryState resource

* agent/hcp: add GetObservabilitySecrets to client

* internal/hcp: add TelemetryState controller logic

* hcp/telemetry-state: added config options for hcp sdk and debug key to skip deletion during reconcile

* pbhcp: update proto documentation

* hcp: address PR feedback, additional validations and code cleanup

* internal/hcp: fix type sig change in test

* update testdata/v2-resource-dependencies
2024-01-31 14:47:05 -05:00
Ronald 8799c36410
[NET-6231] Handle Partition traffic permissions when reconciling traffic permissions (#20408)
[NET-6231] Partition traffic permissions

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2024-01-30 22:14:32 +00:00
Chris S. Kim 7cc88a1577
Handle NamespaceTrafficPermissions when reconciling TrafficPermissions (#20407) 2024-01-30 21:31:25 +00:00
Melissa Kam b0e87dbe13
[CC-7049] Stop the HCP manager when link is deleted (#20351)
* Add Stop method to telemetry provider

Stop the main loop of the provider and set the config
to disabled.

* Add interface for telemetry provider

Added for easier testing. Also renamed Run to Start, which better
fits with Stop.

* Add Stop method to HCP manager

* Add manager interface, rename implementation

Add interface for easier testing, rename existing Manager to HCPManager.

* Stop HCP manager in link Finalizer

* Attempt to cleanup if resource has been deleted

The link should be cleaned up by the finalizer, but there's an edge
case in a multi-server setup where the link is fully deleted on one
server before the other server reconciles. This will cover the case
where the reconcile happens after the resource is deleted.

* Add a delete mananagement token function

Passes a function to the HCP manager that deletes the management token
that was initially created by the manager.

* Delete token as part of stopping the manager

* Lock around disabling config, remove descriptions
2024-01-30 09:40:36 -06:00
Melissa Kam 3b9bb8d6f9
[CC-7044] Start HCP manager as part of link creation (#20312)
* Check for ACL write permissions on write

Link eventually will be creating a token, so require acl:write.

* Convert Run to Start, only allow to start once

* Always initialize HCP components at startup

* Support for updating config and client

* Pass HCP manager to controller

* Start HCP manager in link resource

Start as part of link creation rather than always starting. Update
the HCP manager with values from the link before starting as well.

* Fix metrics sink leaked goroutine

* Remove the hardcoded disabled hostname prefix

The HCP metrics sink will always be enabled, so the length of sinks will
always be greater than zero. This also means that we will also always
default to prefixing metrics with the hostname, which is what our
documentation states is the expected behavior anyway.

* Add changelog

* Check and set running status in one method

* Check for primary datacenter, add back test

* Clarify merge reasoning, fix timing issue in test

* Add comment about controller placement

* Expand on breaking change, fix typo in changelog
2024-01-29 16:31:44 -06:00
Matt Keeler 34a32d4ce5
Remove V2 PeerName field from pbresource.Tenancy (#19865)
The peer name will eventually show up elsewhere in the resource. For now though this rips it out of where we don’t want it to be.
2024-01-29 15:08:31 -05:00
sarahalsmiller 37ebaa6920
Net 7155- Consul API Gateway Controller Stub Work (#20324)
* API Gateway proto

* fix lint issue

* new line

* run make proto format

* checkpoint

* stub

* Update internal/mesh/internal/controllers/apigateways/controller.go
2024-01-25 23:16:20 +00:00
Melissa Kam 7900544249
[CC-7063] Fetch HCP agent bootstrap config in Link reconciler (#20306)
* Move config-dependent methods to separate package

In order to reuse the fetching and file creation part of the
bootstrap package, move the code that would cause cyclical
dependencies to a different package.

* Export needed bootstrap methods and variables

Also add back validating persisted config and update tests.

* Add support to check for just management token

Add a new method that fetches the bootstrap configuration only if
there isn't a valid management token file instead of checking for
all the hcp-config files.

* Pass data dir as a dependency to link controller

The link controller needs to check the data directory for
the hcp-config files.

* Fetch bootstrap config for token in controller

Load the management token when reconciling a link resource, which will
fetch the agent boostrap configuration if the token is not already
persisted locally. Skip this step if the cluster is in read-only mode.

* Validate resource ID format in link creation

* Handle unauthorized and forbidden errors

Check for 401 and 403s when making GNM requests, exit bootstrap fetch
loop and return specific failure statuses for link.

* Move test function to a testing file

* Log load and status write errors
2024-01-24 09:51:43 -06:00
aahel 3446eb3b1b
added computed failover controller (#20329)
* added computed failover controller

* removed some uncessary changes

* removed uncessary changes

* minor refactor

* minor refactor fmt

* added copyright
2024-01-24 11:50:27 +05:30
skpratt 44bcda8523
Net 7074/decentralized exported services management (#20318)
* Add decentralized management of V1 exported-services config entries using V2 multicluster resources.

* cleanup

---------

Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2024-01-23 19:44:10 -06:00
Tauhid Anjum 5d294b26d3
NET-5824 Exported services api (#20015)
* Exported services api implemented

* Tests added, refactored code

* Adding server tests

* changelog added

* Proto gen added

* Adding codegen changes

* changing url, response object

* Fixing lint error by having namespace and partition directly

* Tests changes

* refactoring tests

* Simplified uniqueness logic for exported services, sorted the response in order of service name

* Fix lint errors, refactored code
2024-01-23 10:06:59 +05:30
R.B. Boyer 2e08a7e1c7
v2: prevent use of the v2 experiments in secondary datacenters for now (#20299)
Ultimately we will have to rectify wan federation with v2 catalog adjacent
experiments, but for now blanket prevent usage of the resource-apis,
v2dns, and v2tenancy experiments in secondary datacenters.
2024-01-19 16:31:49 -06:00
Nick Cellino 37a5fddffa
Create HCP management token in HCP manager (#19830)
* Create HCP management token in HCP manager

* Change InitializeManagementToken to ManagementTokenUpserter

* Implement and use management token upsert function

* Fix race condition in test

* Add idea for improvement as comment

* Return early in upsertManagementToken if token exists
2024-01-19 13:58:49 -05:00
Melissa Kam 98c9702ba3
[CC-7031] Add initialization support to resource controllers (#20138)
* Add Initializer to the controller

The Initializer adds support for running any required initialization
steps when the controller is first started.

* Implement HCP Link initializer

The link initializer will create a Link resource if the
cloud configuration has been set.

* Simplify retry logic and testing

* Remove internal retry, replace with logging logic
2024-01-19 11:47:48 -06:00
Matt Keeler f9c04881f9
Failover policy cache (#20244)
* Migrate the Failover controller to use the controller cache
* Remove the Catalog FailoverMapper and its usage in the mesh routes controller.
2024-01-19 09:35:34 -05:00
Dhia Ayachi d641998641
Fix to not create a watch to `Internal.ServiceDump` when mesh gateway is not used (#20168)
This add a fix to properly verify the gateway mode before creating a watch specific to mesh gateways. This watch have a high performance cost and when mesh gateways are not used is not used.

This also adds an optimization to only return the nodes when watching the Internal.ServiceDump RPC to avoid unnecessary disco chain compilation. As watches in proxy config only need the nodes.
2024-01-18 16:44:53 -06:00
Melissa Kam c112a6632d
[CC-7042] Update and enable the HCP metrics sink in the HCP manager (#20072)
* Option to set HCP client at runtime

Allows us to initially set a nil HCP client for the
telemetry provider and update it later.

* Set telemetry provider HCP client in HCP manager

Set the telemetry provider as a dependency and pass it to
the manager. Update the telemetry provider's HCP client
when the HCP manager starts.

* Add a provider interface for the metrics client

This provider will allow us to configure and reconfigure the
retryable HTTP client and the headers for the metrics client.

* Move HTTP retryable client to separate file

Copied directly from the metrics client.

* Abstract HCP specific values in HTTP client

Remove HCP specific references and instead initiate with
a generic TLS configuration and authentication source.

* Set up HTTP client and headers in the provider

Move setup from the metrics client to the HCP telemetry
provider.

* Update the telemetry provider in the HCP manager

Initialize the provider without the HCP configs and then update
it in the HCP manager to enable it.

* Improve test assertion, fix method comment

* Move client provider to metrics client

* Stop the manager on setup error

* Add separate lock for http configuration

* Start telemetry provider in HCP manager

* Update HCP client and config as part of Run

* Remove option to set config at initialization

* Simplify and clean up setting HCP configs

* Add test for telemetry provider Run method

* Fix race condition

* Use clone of HTTP headers

* Only allow initial update and run once
2024-01-16 10:46:12 -06:00
Matt Keeler 326c0ecfbe
In-Memory gRPC (#19942)
* Implement In-Process gRPC for use by controller caching/indexing

This replaces the pipe base listener implementation we were previously using. The new style CAN avoid cloning resources which our controller caching/indexing is taking advantage of to not duplicate resource objects in memory.

To maintain safety for controllers and for them to be able to modify data they get back from the cache and the resource service, the client they are presented in their runtime will be wrapped with an autogenerated client which clones request and response messages as they pass through the client.

Another sizable change in this PR is to consolidate how server specific gRPC services get registered and managed. Before this was in a bunch of different methods and it was difficult to track down how gRPC services were registered. Now its all in one place.

* Fix race in tests

* Ensure the resource service is registered to the multiplexed handler for forwarding from client agents

* Expose peer streaming on the internal handler
2024-01-12 11:54:07 -05:00
John Murret 3fa4a21edd
remove the skipping of slow tests in go-tests-ce and go-test-enterprise (#20139)
* remove the skipping of slow tests in go-tests-ce and go-test-enterprise

* add license header
2024-01-10 20:39:34 -07:00
Dan Stough d52e80b619
[OSS] feat: add experiments flag for v2 dns and skeleton interfaces (#20115)
feat: add experiments flag for v2 dns and skeleton interfaces
2024-01-10 11:19:20 -05:00
Derek Menteer 131ef2a133
Fix broken tests. (#20134) 2024-01-09 14:57:27 -06:00
Derek Menteer 6854e1e90d
Fix broken tests. (#20130)
This fixes some tests that were broken, but not caught, due to the CICD
pipeline only running a subset of the overall tests on PRs.
2024-01-09 13:45:29 -06:00
Nick Cellino 0deebaf637
Add Link resource type and controller skeleton (#19788)
* Add HCCLink resource type

* Register HCCLink resource type with basic validation

* Add validation for required fields

* Add test for default ACLs

* Add no-op controller for HCCLink

* Add resource-apis semantic validation check in hcclink controller

* Add copyright headers

* Rename HCCLink to Link

* Add hcp_cluster_url to link proto

* Update 'disabled' reason with more detail

* Update link status name to consul.io/hcp/link

* Change link version from v1 to v2

* Use feature flag/experiment to enable v2 resources with HCP
2024-01-09 13:57:59 -05:00
Melissa Kam 5dc8eabcce
[CC-7041] Update and start the SCADA provider in HCP manager (#19976)
* Update SCADA provider version

Also update mocks for SCADA provider.

* Create SCADA provider w/o HCP config, then update

Adds a placeholder config option to allow us to initialize a SCADA provider
without the HCP configuration. Also adds an update method to then add the
HCP configuration. We need this to be able to eventually always register a
SCADA listener at startup before the HCP config values are known.

* Pass cloud configuration to HCP manager

Save the entire cloud configuration and pass it to the HCP
manager.

* Update and start SCADA provider in HCP manager

Move config updating and starting to the HCP manager. The HCP manager
will eventually be responsible for all processes that contribute
to linking to HCP.
2024-01-08 09:49:29 -06:00