5407 Commits

Author SHA1 Message Date
Nick Cellino
34b343a980
Unconditionally add Access-Control-Expose-Headers HTTP header (#20220)
* Unconditionally add Access-Control-Expose-Headers HTTP header

* Return nil instead of err
2024-01-22 10:18:35 -05:00
Dan Stough
97ae244d8a
feat(v2dns): add grpc DNS support (#20296) 2024-01-22 10:10:03 -05:00
Semir Patel
6d9e8fdd05
resource: retry non-CAS deletes automatically (#20292) 2024-01-22 06:45:01 -08:00
R.B. Boyer
2e08a7e1c7
v2: prevent use of the v2 experiments in secondary datacenters for now (#20299)
Ultimately we will have to rectify wan federation with v2 catalog adjacent
experiments, but for now blanket prevent usage of the resource-apis,
v2dns, and v2tenancy experiments in secondary datacenters.
2024-01-19 16:31:49 -06:00
Nick Cellino
37a5fddffa
Create HCP management token in HCP manager (#19830)
* Create HCP management token in HCP manager

* Change InitializeManagementToken to ManagementTokenUpserter

* Implement and use management token upsert function

* Fix race condition in test

* Add idea for improvement as comment

* Return early in upsertManagementToken if token exists
2024-01-19 13:58:49 -05:00
Melissa Kam
98c9702ba3
[CC-7031] Add initialization support to resource controllers (#20138)
* Add Initializer to the controller

The Initializer adds support for running any required initialization
steps when the controller is first started.

* Implement HCP Link initializer

The link initializer will create a Link resource if the
cloud configuration has been set.

* Simplify retry logic and testing

* Remove internal retry, replace with logging logic
2024-01-19 11:47:48 -06:00
Nick Cellino
fe678e9da1
Sync cluster attributes from GNM to Link resource (#20158)
* Add 'GetCluster' function to HCP client

* Sync cluster data inside Link controller

* Add access mode to HCP Link

* Sync AccessLevel property

* Fix imports and remove outdated comments

* Switch accessMode to access level

* Add comment around HCPClientFn

* Fix spacing in link.proto

* Add helper for writing status. Fix reconciliation loop
2024-01-19 10:02:55 -05:00
Matt Keeler
f9c04881f9
Failover policy cache (#20244)
* Migrate the Failover controller to use the controller cache
* Remove the Catalog FailoverMapper and its usage in the mesh routes controller.
2024-01-19 09:35:34 -05:00
Dan Stough
0edfa74d15
feat(v2dns): recursor support (#20249)
* feat(v2dns): recursor support

* test: fix leaking test agent in dns svc test
2024-01-18 18:30:04 -05:00
Dhia Ayachi
d641998641
Fix to not create a watch to Internal.ServiceDump when mesh gateway is not used (#20168)
This add a fix to properly verify the gateway mode before creating a watch specific to mesh gateways. This watch have a high performance cost and when mesh gateways are not used is not used.

This also adds an optimization to only return the nodes when watching the Internal.ServiceDump RPC to avoid unnecessary disco chain compilation. As watches in proxy config only need the nodes.
2024-01-18 16:44:53 -06:00
John Murret
938d2315e0
DNS v2 - add virtual ip questions (#20245) 2024-01-17 23:46:18 +00:00
John Murret
bc4da5f5d6
check error in TestDNSCycleRecursorCheckAllFail before asserting response to stop panic in CI. (#20231) 2024-01-17 07:25:35 -07:00
Dan Stough
cb384ac068
feat(v2dns): addr. query support (#20224) 2024-01-16 22:36:02 -05:00
Melissa Kam
c112a6632d
[CC-7042] Update and enable the HCP metrics sink in the HCP manager (#20072)
* Option to set HCP client at runtime

Allows us to initially set a nil HCP client for the
telemetry provider and update it later.

* Set telemetry provider HCP client in HCP manager

Set the telemetry provider as a dependency and pass it to
the manager. Update the telemetry provider's HCP client
when the HCP manager starts.

* Add a provider interface for the metrics client

This provider will allow us to configure and reconfigure the
retryable HTTP client and the headers for the metrics client.

* Move HTTP retryable client to separate file

Copied directly from the metrics client.

* Abstract HCP specific values in HTTP client

Remove HCP specific references and instead initiate with
a generic TLS configuration and authentication source.

* Set up HTTP client and headers in the provider

Move setup from the metrics client to the HCP telemetry
provider.

* Update the telemetry provider in the HCP manager

Initialize the provider without the HCP configs and then update
it in the HCP manager to enable it.

* Improve test assertion, fix method comment

* Move client provider to metrics client

* Stop the manager on setup error

* Add separate lock for http configuration

* Start telemetry provider in HCP manager

* Update HCP client and config as part of Run

* Remove option to set config at initialization

* Simplify and clean up setting HCP configs

* Add test for telemetry provider Run method

* Fix race condition

* Use clone of HTTP headers

* Only allow initial update and run once
2024-01-16 10:46:12 -06:00
Derek Menteer
b8b8ad46fc
Various race condition and test fixes. (#20212)
* Increase timeouts for flakey peering test.

* Various test fixes.

* Fix race condition in reconcilePeering.

This resolves an issue where a peering object in the state store was
incorrectly mutated by a function, resulting in the test being flagged as
failing when the -race flag was used.
2024-01-16 08:57:43 -06:00
John Murret
93e06b799e
v1 dns - add doc strings for functions and update function names to be consistent and more descriptive. (#20194)
v1 dns - add doc strings for functions and update function names to be consistent and mre descriptive.
2024-01-12 22:07:42 +00:00
R.B. Boyer
7f9ed032fd
agent: remove data race in agent config (#20200)
To fix an issue displaying the current reloaded config in the 
v1/agent/self endpoint #18681 caused the agent's internal 
config struct member to be deepcopied and replaced on reload.

This is not safe because the field is not protected by a lock, nor 
should it be due to how it is accessed by the rest of the system.

This PR does the same deepcopy, but into a new field solely for 
the point of capturing the current reloaded values for display 
purposes. If there has been no reload then the original config is used.
2024-01-12 15:11:21 -06:00
Matt Keeler
326c0ecfbe
In-Memory gRPC (#19942)
* Implement In-Process gRPC for use by controller caching/indexing

This replaces the pipe base listener implementation we were previously using. The new style CAN avoid cloning resources which our controller caching/indexing is taking advantage of to not duplicate resource objects in memory.

To maintain safety for controllers and for them to be able to modify data they get back from the cache and the resource service, the client they are presented in their runtime will be wrapped with an autogenerated client which clones request and response messages as they pass through the client.

Another sizable change in this PR is to consolidate how server specific gRPC services get registered and managed. Before this was in a bunch of different methods and it was difficult to track down how gRPC services were registered. Now its all in one place.

* Fix race in tests

* Ensure the resource service is registered to the multiplexed handler for forwarding from client agents

* Expose peer streaming on the internal handler
2024-01-12 11:54:07 -05:00
John Murret
3fa4a21edd
remove the skipping of slow tests in go-tests-ce and go-test-enterprise (#20139)
* remove the skipping of slow tests in go-tests-ce and go-test-enterprise

* add license header
2024-01-10 20:39:34 -07:00
Dan Stough
d52e80b619
[OSS] feat: add experiments flag for v2 dns and skeleton interfaces (#20115)
feat: add experiments flag for v2 dns and skeleton interfaces
2024-01-10 11:19:20 -05:00
loshz
7724bb88d5
[NET-6593] agent: check for minimum RSA key size (#20112)
* agent: check for minimum RSA key size

* add changelog

* agent: add test for RSA generated key sizes

* use constants in generating priv key func

* update key size error message
2024-01-10 12:15:36 +00:00
Derek Menteer
131ef2a133
Fix broken tests. (#20134) 2024-01-09 14:57:27 -06:00
Derek Menteer
6854e1e90d
Fix broken tests. (#20130)
This fixes some tests that were broken, but not caught, due to the CICD
pipeline only running a subset of the overall tests on PRs.
2024-01-09 13:45:29 -06:00
Nick Cellino
0deebaf637
Add Link resource type and controller skeleton (#19788)
* Add HCCLink resource type

* Register HCCLink resource type with basic validation

* Add validation for required fields

* Add test for default ACLs

* Add no-op controller for HCCLink

* Add resource-apis semantic validation check in hcclink controller

* Add copyright headers

* Rename HCCLink to Link

* Add hcp_cluster_url to link proto

* Update 'disabled' reason with more detail

* Update link status name to consul.io/hcp/link

* Change link version from v1 to v2

* Use feature flag/experiment to enable v2 resources with HCP
2024-01-09 13:57:59 -05:00
John Murret
21e2bb2a67
Make DNS test run across a matrix of dns and catalog versions. (#20114)
* Make DNS test run across a matrix of dns and catalog versions.

* node tests

* add version hcl config to service lookup tests
2024-01-08 13:14:26 -07:00
Melissa Kam
5dc8eabcce
[CC-7041] Update and start the SCADA provider in HCP manager (#19976)
* Update SCADA provider version

Also update mocks for SCADA provider.

* Create SCADA provider w/o HCP config, then update

Adds a placeholder config option to allow us to initialize a SCADA provider
without the HCP configuration. Also adds an update method to then add the
HCP configuration. We need this to be able to eventually always register a
SCADA listener at startup before the HCP config values are known.

* Pass cloud configuration to HCP manager

Save the entire cloud configuration and pass it to the HCP
manager.

* Update and start SCADA provider in HCP manager

Move config updating and starting to the HCP manager. The HCP manager
will eventually be responsible for all processes that contribute
to linking to HCP.
2024-01-08 09:49:29 -06:00
Ganesh S
0d57acc549
Add sameness group references in exported services controller (#20100) 2024-01-08 11:55:52 +05:30
John Murret
c12245be3c
Break up DNS tests into 3 files to help with GH UI and IDE issues. (#20103) 2024-01-05 13:37:27 -07:00
cskh
15b40f36f3
Use safeio to write server metadata file (#20101)
* Use safeio to write server metadata file

* guard the conversion
2024-01-05 14:46:19 -05:00
John Murret
7a410d7c5b
NET-6945 - Replace usage of deprecated Envoy field envoy.config.core.v3.HeaderValueOption.append (#20078)
* NET-6945 - Replace usage of deprecated Envoy field envoy.config.core.v3.HeaderValueOption.append

* update proto for v2 and then update xds v2 logic

* add changelog

* Update 20078.txt to be consistent with existing changelog entries

* swap enum values tomatch envoy.
2024-01-04 00:36:25 +00:00
Dan Stough
073959866d
feat(v2): add consul service and workloads to catalog (#20077) 2024-01-03 15:14:42 -05:00
John Murret
d925e4b812
NET-6946 / NET-6941 - Replace usage of deprecated Envoy fields envoy.config.route.v3.HeaderMatcher.safe_regex_match and envoy.type.matcher.v3.RegexMatcher.google_re2 (#20013)
* NET-6946 - Replace usage of deprecated Envoy field envoy.config.route.v3.HeaderMatcher.safe_regex_match

* removing unrelated changes

* update golden files

* do not set engine type
2024-01-03 09:53:39 -07:00
John Murret
2f335113f8
NET-6943 - Replace usage of deprecated Envoy field envoy.config.router.v3.WeightedCluster.total_weight. (#20011) 2023-12-22 19:49:44 +00:00
John Murret
90cd56c5c3
NET-4774 - replace usage of deprecated Envoy field match_subject_alt_names (#19954) 2023-12-22 18:34:44 +00:00
John Murret
21ea5c92fd
NET-6944 - Replace usage of deprecated Envoy field envoy.extensions.filters.http.lua.v3.Lua.inline_code (#20012) 2023-12-22 17:20:41 +00:00
Nathan Coleman
ab60fec15a
[NET-6426] Add gateway proxy controller that generates empty proxy state template (#19901)
* NET-6426 Create ProxyStateTemplate when reconciling MeshGateway resource

* Add TODO for switching fetch method based on gateway type

* Use gateway-kind in workload metadata instead of owner reference

* Create ProxyStateTemplate builder for gatewayproxy controller

* Update to use new controller interface

* Add copyright headers

* Set correct name for ProxyStateTemplate identity reference

* Generate empty ProxyStateTemplate by fetching MeshGateway

This cheats and looks up the MeshGateway directly. In the future, we will need a Workload => xGateway mapper

* Specify owner reference when writing ProxyStateTemplate

* Update dependency mapper to account for multiple controllers per resource type

* Regenerate v2 resource dependencies map

* Add helpful trace logs, tag TODOs with ticket identifiers
2023-12-21 16:37:47 -05:00
Nitya Dhanushkodi
9975b8bd73
[NET-5455] Allow disabling request and idle timeouts with negative values in service router and service resolver (#19992)
* add coverage for testing these timeouts
2023-12-19 15:36:07 -08:00
cskh
cff872749d
agent: prevent empty server_metadata.json (#19935) 2023-12-19 10:01:56 -05:00
aahel
ae998a698a
added computed failover policy resource (#19975) 2023-12-18 05:52:24 +00:00
Derek Menteer
bbdbf3e4f8
Fix bug with prepared queries using sameness-groups. (#19970)
This commit fixes an issue where the partition was not properly set
on the peering query failover target created from sameness-groups.
Before this change, it was always empty, meaning that the data
would be queried with respect to the default partition always. This
resulted in a situation where a PQ that was attempting to use a
sameness-group for failover would select peers from the default
partition, rather than the partition of the sameness-group itself.
2023-12-15 11:42:13 -06:00
aahel
a6496898de
added tenancy to TestBuildL4TrafficPermissions (#19932) 2023-12-14 10:41:24 +05:30
Matt Keeler
123bc95e1a
Add Common Controller Caching Infrastructure (#19767)
* Add Common Controller Caching Infrastructure
2023-12-13 10:06:39 -05:00
Dhia Ayachi
f2b26ac194
Hash based config entry replication (#19795)
* add a hash to config entries when normalizing

* add GetHash and implement comparing hashes

* only update if the Hash is different

* only update if the Hash is different and not 0

* fix proto to include the Hash

* fix proto gen

* buf format

* add SetHash and fix tests

* fix config load tests

* fix state test and config test

* recalculate hash when restoring config entries

* fix snapshot restore test

* add changelog

* fix missing normalize, fix proto indexes and add normalize test
2023-12-12 08:29:13 -05:00
Ronald
e13fbc743e
Remove warning for consul 1.17 deprecation (#19897) 2023-12-11 23:28:04 +00:00
Derek Menteer
dfab5ade50
Fix ClusterLoadAssignment timeouts dropping endpoints. (#19871)
When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).

This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.

This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
2023-12-11 09:25:11 -06:00
Derek Menteer
0ac958f27b
Fix xDS missing endpoint race condition. (#19866)
This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster

Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.

The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.

An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects. Visit the following thread for some
extra context on why certainty around these race conditions is difficult:
https://github.com/envoyproxy/envoy/issues/13009

This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.

Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
2023-12-08 11:37:12 -06:00
Thomas Eckert
8125a32a4e
Add CE version of Gateway Upstream Disambiguation (#19860)
* Add CE version of gateway-upstream-disambiguation

* Use NamespaceOrDefault and PartitionOrDefault

* Add Changelog entry

* Remove the unneeded reassignment

* Use c.ID()
2023-12-07 17:56:14 -05:00
Dhia Ayachi
d93f7f730d
parse config protocol on write to optimize disco-chain compilation (#19829)
* parse config protocol on write to optimize disco-chain compilation

* add changelog
2023-12-07 13:46:46 -05:00
Jared Kirschner
d3e658b0e7
improve client RPC metrics consistency (#19721)
The client.rpc metric now excludes internal retries for consistency
with client.rpc.exceeded and client.rpc.failed. All of these metrics
now increment at most once per RPC method call, allowing for
accurate calculation of failure / rate limit application occurrence.

Additionally, if an RPC fails because no servers are present,
client.rpc.failed is now incremented.
2023-12-06 13:21:08 -05:00
Matt Keeler
efe279f802
Retry lint fixes (#19151)
* Add a make target to run lint-consul-retry on all the modules
* Cleanup sdk/testutil/retry
* Fix a bunch of retry.Run* usage to not use the outer testing.T
* Fix some more recent retry lint issues and pin to v1.4.0 of lint-consul-retry
* Fix codegen copywrite lint issues
* Don’t perform cleanup after each retry attempt by default.
* Use the common testutil.TestingTB interface in test-integ/tenancy
* Fix retry tests
* Update otel access logging extension test to perform requests within the retry block
2023-12-06 12:11:32 -05:00