5523 Commits

Author SHA1 Message Date
John Murret
21ea5c92fd
NET-6944 - Replace usage of deprecated Envoy field envoy.extensions.filters.http.lua.v3.Lua.inline_code (#20012) 2023-12-22 17:20:41 +00:00
Nathan Coleman
ab60fec15a
[NET-6426] Add gateway proxy controller that generates empty proxy state template (#19901)
* NET-6426 Create ProxyStateTemplate when reconciling MeshGateway resource

* Add TODO for switching fetch method based on gateway type

* Use gateway-kind in workload metadata instead of owner reference

* Create ProxyStateTemplate builder for gatewayproxy controller

* Update to use new controller interface

* Add copyright headers

* Set correct name for ProxyStateTemplate identity reference

* Generate empty ProxyStateTemplate by fetching MeshGateway

This cheats and looks up the MeshGateway directly. In the future, we will need a Workload => xGateway mapper

* Specify owner reference when writing ProxyStateTemplate

* Update dependency mapper to account for multiple controllers per resource type

* Regenerate v2 resource dependencies map

* Add helpful trace logs, tag TODOs with ticket identifiers
2023-12-21 16:37:47 -05:00
Nitya Dhanushkodi
9975b8bd73
[NET-5455] Allow disabling request and idle timeouts with negative values in service router and service resolver (#19992)
* add coverage for testing these timeouts
2023-12-19 15:36:07 -08:00
cskh
cff872749d
agent: prevent empty server_metadata.json (#19935) 2023-12-19 10:01:56 -05:00
aahel
ae998a698a
added computed failover policy resource (#19975) 2023-12-18 05:52:24 +00:00
Derek Menteer
bbdbf3e4f8
Fix bug with prepared queries using sameness-groups. (#19970)
This commit fixes an issue where the partition was not properly set
on the peering query failover target created from sameness-groups.
Before this change, it was always empty, meaning that the data
would be queried with respect to the default partition always. This
resulted in a situation where a PQ that was attempting to use a
sameness-group for failover would select peers from the default
partition, rather than the partition of the sameness-group itself.
2023-12-15 11:42:13 -06:00
aahel
a6496898de
added tenancy to TestBuildL4TrafficPermissions (#19932) 2023-12-14 10:41:24 +05:30
Matt Keeler
123bc95e1a
Add Common Controller Caching Infrastructure (#19767)
* Add Common Controller Caching Infrastructure
2023-12-13 10:06:39 -05:00
Dhia Ayachi
f2b26ac194
Hash based config entry replication (#19795)
* add a hash to config entries when normalizing

* add GetHash and implement comparing hashes

* only update if the Hash is different

* only update if the Hash is different and not 0

* fix proto to include the Hash

* fix proto gen

* buf format

* add SetHash and fix tests

* fix config load tests

* fix state test and config test

* recalculate hash when restoring config entries

* fix snapshot restore test

* add changelog

* fix missing normalize, fix proto indexes and add normalize test
2023-12-12 08:29:13 -05:00
Ronald
e13fbc743e
Remove warning for consul 1.17 deprecation (#19897) 2023-12-11 23:28:04 +00:00
Derek Menteer
dfab5ade50
Fix ClusterLoadAssignment timeouts dropping endpoints. (#19871)
When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).

This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.

This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
2023-12-11 09:25:11 -06:00
Derek Menteer
0ac958f27b
Fix xDS missing endpoint race condition. (#19866)
This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster

Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.

The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.

An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects. Visit the following thread for some
extra context on why certainty around these race conditions is difficult:
https://github.com/envoyproxy/envoy/issues/13009

This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.

Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
2023-12-08 11:37:12 -06:00
Thomas Eckert
8125a32a4e
Add CE version of Gateway Upstream Disambiguation (#19860)
* Add CE version of gateway-upstream-disambiguation

* Use NamespaceOrDefault and PartitionOrDefault

* Add Changelog entry

* Remove the unneeded reassignment

* Use c.ID()
2023-12-07 17:56:14 -05:00
Dhia Ayachi
d93f7f730d
parse config protocol on write to optimize disco-chain compilation (#19829)
* parse config protocol on write to optimize disco-chain compilation

* add changelog
2023-12-07 13:46:46 -05:00
Jared Kirschner
d3e658b0e7
improve client RPC metrics consistency (#19721)
The client.rpc metric now excludes internal retries for consistency
with client.rpc.exceeded and client.rpc.failed. All of these metrics
now increment at most once per RPC method call, allowing for
accurate calculation of failure / rate limit application occurrence.

Additionally, if an RPC fails because no servers are present,
client.rpc.failed is now incremented.
2023-12-06 13:21:08 -05:00
Matt Keeler
efe279f802
Retry lint fixes (#19151)
* Add a make target to run lint-consul-retry on all the modules
* Cleanup sdk/testutil/retry
* Fix a bunch of retry.Run* usage to not use the outer testing.T
* Fix some more recent retry lint issues and pin to v1.4.0 of lint-consul-retry
* Fix codegen copywrite lint issues
* Don’t perform cleanup after each retry attempt by default.
* Use the common testutil.TestingTB interface in test-integ/tenancy
* Fix retry tests
* Update otel access logging extension test to perform requests within the retry block
2023-12-06 12:11:32 -05:00
Ronald
dc02fa695f
[NET-6251] Nomad client templated policy (#19827) 2023-12-06 10:32:12 -05:00
Semir Patel
c1bbda8128
resource: block default namespace deletion + test refactorings (#19822) 2023-12-05 14:00:06 -05:00
lornasong
edf4610ed9
[Cloud][CC-6925] Updates to pushing server state (#19682)
* Upgrade hcp-sdk-go to latest version v0.73

Changes:
- go get github.com/hashicorp/hcp-sdk-go
- go mod tidy

* From upgrade: regenerate protobufs for upgrade from 1.30 to 1.31

Ran: `make proto`

Slack: https://hashicorp.slack.com/archives/C0253EQ5B40/p1701105418579429

* From upgrade: fix mock interface implementation

After upgrading, there is the following compile error:

cannot use &mockHCPCfg{} (value of type *mockHCPCfg) as "github.com/hashicorp/hcp-sdk-go/config".HCPConfig value in return statement: *mockHCPCfg does not implement "github.com/hashicorp/hcp-sdk-go/config".HCPConfig (missing method Logout)

Solution: update the mock to have the missing Logout method

* From upgrade: Lint: remove usage of deprecated req.ServerState.TLS

Due to upgrade, linting is erroring due to usage of a newly deprecated field

22:47:56 [consul]: make lint
--> Running golangci-lint (.)
agent/hcp/testing.go:157:24: SA1019: req.ServerState.TLS is deprecated: use server_tls.internal_rpc instead. (staticcheck)
                time.Until(time.Time(req.ServerState.TLS.CertExpiry)).Hours()/24,
                                     ^

* From upgrade: adjust oidc error message

From the upgrade, this test started failing:

=== FAIL: internal/go-sso/oidcauth TestOIDC_ClaimsFromAuthCode/failed_code_exchange (re-run 2) (0.01s)
    oidc_test.go:393: unexpected error: Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"

Prior to the upgrade, the error returned was:
```
Provider login failed: Error exchanging oidc code: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"invalid_grant\",\"error_description\":\"unexpected auth code\"}\n
```

Now the error returned is as below and does not contain "cannot fetch token"
```
Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"

```

* Update AgentPushServerState structs with new fields

HCP-side changes for the new fields are in:
https://github.com/hashicorp/cloud-global-network-manager-service/pull/1195/files

* Minor refactor for hcpServerStatus to abstract tlsInfo into struct

This will make it easier to set the same tls-info information to both
 - status.TLS (deprecated field)
 - status.ServerTLSMetadata (new field to use instead)

* Update hcpServerStatus to parse out information for new fields

Changes:
 - Improve error message and handling (encountered some issues and was confused)
 - Set new field TLSInfo.CertIssuer
 - Collect certificate authority metadata and set on TLSInfo.CertificateAuthorities
 - Set TLSInfo on both server.TLS and server.ServerTLSMetadata.InternalRPC

* Update serverStatusToHCP to convert new fields to GNM rpc

* Add changelog

* Feedback: connect.ParseCert, caCerts

* Feedback: refactor and unit test server status

* Feedback: test to use expected struct

* Feedback: certificate with intermediate

* Feedback: catch no leaf, remove expectedErr

* Feedback: update todos with jira ticket

* Feedback: mock tlsConfigurator
2023-12-04 10:25:18 -05:00
aahel
7936e55807
added node health resource (#19803) 2023-12-02 11:14:03 +05:30
John Maguire
a0240e3794
[NET-5688] APIGateway UI Topology Fixes (#19657)
* Update catalog and ui endpoints to show APIGateway in gateway service
topology view

* Added initial implementation for service view

* updated ui

* Fix topology view for gateways

* Adding tests for gw controller

* remove unused args

* Undo formatting changes

* Fix call sites for upstream/downstream gw changes

* Add config entry tests

* Fix function calls again

* Move from ServiceKey to ServiceName, cleanup from PR review

* Add additional check for length of services in bound apigateway for
IsSame comparison

* fix formatting for proto

* gofmt

* Add DeepCopy for retrieved BoundAPIGateway

* gofmt

* gofmt

* Rename function to be more consistent
2023-11-28 21:27:14 +00:00
Thomas Eckert
419677cc9e
[NET-6420] Add MeshConfiguration Controller stub (#19745)
* Add meshconfiguration/controller

* Add MeshConfiguration Registration function

* Fix the TODOs on the RegisterMeshGateway function

* Call RegisterMeshConfiguration

* Add comment to MeshConfigurationRegistration

* Add a test for Reconcile and some comments
2023-11-28 18:56:07 +00:00
Semir Patel
5930748cb0
resource: ListByOwner returns empty list on non-existent tenancy (#19742) 2023-11-27 14:56:08 -06:00
Ronald
eded2ff347
[NET-6249] Add templated policies description (#19735) 2023-11-27 10:34:22 -05:00
Ronald
c1dbf00a85
NET-6251 API gateway templated policy (#19728) 2023-11-24 17:55:05 +00:00
Poonam Jadhav
78f918a103
feat: create a default namespace (#19681)
* feat: create a default namespace on leader

* refactor: add comment and move inittenancy to leader file

* refactor: rephrase comment
2023-11-22 14:32:57 -05:00
Mike Nomitch
302f994410
[NET-6640] Adds "Policy" BindType to BindingRule (#19499)
feat: add bind type of policy

Co-authored-by: Ronald Ekambi <ronekambi@gmail.com>
2023-11-20 13:11:08 +00:00
Semir Patel
75c2def1ca
resource: preserve deferred deletion metadata on non-CAS writes (#19674) 2023-11-17 10:51:25 -06:00
Ronald
ea0caa3e0f
[NET-6103] Enable query tokens by service name using templated policy (#19666) 2023-11-16 14:32:06 -05:00
John Murret
2591318c82
Skip tests with p95 greater than 30 seconds outside of main and release branches. (#19628)
Skip tests with p95 greater than 30 seconds.
2023-11-15 13:43:33 -07:00
Semir Patel
1eed205286
resource: freeze resources after marked for deletion (4 of 5) (#19603) 2023-11-15 10:58:27 -06:00
Kumar Kavish
68e7f27fd2
[NET-6438] Add tenancy to xDS Tests (#19551)
* [NET-6438] Add tenancy to xDS Tests

* [NET-6438] Add tenancy to xDS Tests
- Fixing imports

* [NET-6438] Add tenancy to xDS Tests
- Added cleanup post test run

* [NET-6356] Add tenancy to xDS Tests
- Added cleanup post test run

* [NET-6438] Add tenancy to xDS Tests
- using t.Cleanup instead of defer delete

* [NET-6438] Add tenancy to xDS Tests
- rebased

* [NET-6438] Add tenancy to xDS Tests
- rebased
2023-11-10 15:32:36 +05:30
aahel
005e1b9926
added exported svc controller (#19589)
* added exported svc controller

* added license headers
2023-11-10 07:27:53 +05:30
Nathan Coleman
40c57f10a0
NET-6391 Initialize controller for MeshGateway resource (#19552)
* Generate resource_types for MeshGateway by specifying spec option

* Register MeshGateway type w/ TODOs for hooks

* Initialize controller for MeshGateway resources

* Add meshgateway to list of v2 resource dependencies for golden test

* Scope MeshGateway resource to partition
2023-11-09 16:33:14 -05:00
John Murret
780e91688d
Migrate remaining individual resource tests for service mesh to TestAllResourcesFromSnapshot (#19583)
* migrate expose checks and paths  tests to resources_test.go

* fix failing expose paths tests

* fix the way endpoint resources get created to make expose tests pass.

* remove endpoint resources that are already inlined on local_app clusters

* renaiming and comments

* migrate remaining service mesh tests to resources_test.go

* cleanup

* update proxystateconverter to skip ading alpn to clusters and listener filterto match v1 behavior
2023-11-09 20:08:37 +00:00
Kumar Kavish
f09dbb99e9
[NET-6356] Add tenancy to Failover Tests (#19547)
* [NET-6356] Add tenancy to Failover Tests

* [NET-6438] Add tenancy to xDS Tests
- Added cleanup post test run

* [NET-6356] Add tenancy to failover Tests
- using t.Cleanup instead of defer delete
2023-11-10 01:14:09 +05:30
John Murret
f5bf256425
Migrate individual resource tests for API Gateway to TestAllResourcesFromSnapshot (#19584)
migrate individual api gateway tests to resources_test.go
2023-11-09 17:01:54 +00:00
John Murret
a94fa4c3ed
Migrate individual resource tests for Mesh Gateway to TestAllResourcesFromSnapshot (#19502)
migrate mesh-gateway tests to resources_test.go
2023-11-09 16:39:16 +00:00
John Murret
4aa95f3d1f
Migrate individual resource tests for Ingress Gateway to TestAllResourcesFromSnapshot (#19506)
migrate ingress-gateway tests to resources_test.go
2023-11-09 16:08:07 +00:00
John Murret
2553d6e8b9
Migrate individual resource tests for Terminating Gateway to TestAllResourcesFromSnapshot (#19505)
migrate terminating-gateway tests to resources_test.go
2023-11-09 08:38:33 -07:00
John Murret
7de0b45ba4
Fix xds v2 from creating envoy endpoint resources when already inlined in the cluster (#19580)
* migrate expose checks and paths  tests to resources_test.go

* fix failing expose paths tests

* fix the way endpoint resources get created to make expose tests pass.

* wip

* remove endpoint resources that are already inlined on local_app clusters

* renaiming and comments
2023-11-08 22:18:51 +00:00
John Murret
5aff19f9bc
Migrate individual resource tests for JWT Provider to TestAllResourcesFromSnapshot (#19511)
migrate jwt provider tests to resources_test.go
2023-11-08 14:34:40 -07:00
John Murret
903ff7fccb
Migrate individual resource tests for custom configuration to TestAllResourcesFromSnapshot (#19512)
* Configure TestAllResourcesFromSnapshot to run V2 tests

* migrate custom configuration tests to resources_test.go
2023-11-08 10:34:23 -07:00
John Murret
09f73d1abf
Migrate individual resource tests for expose paths and checks to TestAllResourcesFromSnapshot (#19513)
* migrate expose checks and paths  tests to resources_test.go

* fix failing expose paths tests
2023-11-08 14:24:27 +00:00
John Murret
7bc2581c81
Migrate individual resource tests for Discovery Chains to TestAllResourcesFromSnapshot (#19508)
migrate disco chain tests to resources_test.go
2023-11-08 01:34:42 +00:00
John Murret
caaff73337
add DeliverLatest as common function for use by Manager and ProxyTracker Open (#19564)
Open
add DeliverLatest as common function for use by Manager and ProxyTracker
2023-11-07 23:03:37 +00:00
Derek Menteer
393f7a429b
Fix more test flakes (#19533)
Fix flaky snapshot and metrics tests.
2023-11-07 10:15:50 -06:00
John Murret
f115cdb1d5
NET-6385 - Static routes that are inlined in listener filters are also created as a resource. (#19459)
* cover all protocols in local_app golden tests

* fix xds tests

* updating latest

* fix broken test

* add sorting of routers to TestBuildLocalApp to get rid of the flaking

* cover all protocols in local_app golden tests

* cover all protocols in local_app golden tests

* cover all protocols in local_app golden tests

* process envoy resource by walking the map.  use a map rather than array for envoy resource to prevent duplication.

* cleanup.  doc strings.

* update to latest

* fix broken test

* update tests after adding sorting of routers in local_app builder tests

* do not make endpoints for local_app

* fix catalog destinations only by creating clusters for any cluster not already created by walking the graph.

* Configure TestAllResourcesFromSnapshot to run V2 tests

* wip

* fix processing of failover groups

* add endpoints and clusters for any clusters that were not created from walking the listener -> path

* fix xds v2 golden files for clusters to include failover group clusters
2023-11-07 08:00:08 -07:00
Semir Patel
2da7dd077a
v2tenancy: register tenancy controller deps (#19531) 2023-11-07 08:06:10 -06:00
Derek Menteer
6baf695cd9
[NET-6459] Fix issue with wanfed lan ip conflicts. (#19503)
Fix issue with wanfed lan ip conflicts.

Prior to this commit, the connection pools were unaware which datacenter the
connection was associated with. This meant that any time servers with
overlapping LAN IP addresses and node shortnames existed, they would be
incorrectly co-located in the same pool. Whenever this occurred, the servers
would get stuck in an infinite loop of forwarding RPCs to themselves (rather
than the intended remote DC) until they eventually run out of memory.

Most notably, this issue can occur whenever wan federation through mesh
gateways is enabled.

This fix adds extra metadata to specify which DC the connection is associated
with in the pool.
2023-11-06 08:47:12 -06:00