When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).
This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.
This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster
Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.
The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.
An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects. Visit the following thread for some
extra context on why certainty around these race conditions is difficult:
https://github.com/envoyproxy/envoy/issues/13009
This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.
Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
* Add CE version of gateway-upstream-disambiguation
* Use NamespaceOrDefault and PartitionOrDefault
* Add Changelog entry
* Remove the unneeded reassignment
* Use c.ID()
* migrate expose checks and paths tests to resources_test.go
* fix failing expose paths tests
* fix the way endpoint resources get created to make expose tests pass.
* remove endpoint resources that are already inlined on local_app clusters
* renaiming and comments
* migrate remaining service mesh tests to resources_test.go
* cleanup
* update proxystateconverter to skip ading alpn to clusters and listener filterto match v1 behavior
* migrate expose checks and paths tests to resources_test.go
* fix failing expose paths tests
* fix the way endpoint resources get created to make expose tests pass.
* wip
* remove endpoint resources that are already inlined on local_app clusters
* renaiming and comments
* cover all protocols in local_app golden tests
* fix xds tests
* updating latest
* fix broken test
* add sorting of routers to TestBuildLocalApp to get rid of the flaking
* cover all protocols in local_app golden tests
* cover all protocols in local_app golden tests
* cover all protocols in local_app golden tests
* process envoy resource by walking the map. use a map rather than array for envoy resource to prevent duplication.
* cleanup. doc strings.
* update to latest
* fix broken test
* update tests after adding sorting of routers in local_app builder tests
* do not make endpoints for local_app
* fix catalog destinations only by creating clusters for any cluster not already created by walking the graph.
* Configure TestAllResourcesFromSnapshot to run V2 tests
* wip
* fix processing of failover groups
* add endpoints and clusters for any clusters that were not created from walking the listener -> path
* fix xds v2 golden files for clusters to include failover group clusters
* xds: Ensure v2 route match is populated for gRPC
Similar to HTTP, ensure that route match config (which is required by
Envoy) is populated when default values are used.
Because the default matches generated for gRPC contain a single empty
`GRPCRouteMatch`, and that proto does not directly support prefix-based
config, an interpretation of the empty struct is needed to generate the
same output that the `HTTPRouteMatch` is explicitly configured to
provide in internal/mesh/internal/controllers/routes/generate.go.
* xds: Ensure protocol set for gRPC resources
Add explicit protocol in `ProxyStateTemplate` builders and validate it
is always set on clusters. This ensures that HTTP filters and
`http2_protocol_options` are populated in all the necessary places for
gRPC traffic and prevents future unintended omissions of non-TCP
protocols.
Co-authored-by: John Murret <john.murret@hashicorp.com>
---------
Co-authored-by: John Murret <john.murret@hashicorp.com>
Ensure LB policy set for locality-aware routing (CE)
`overprovisioningFactor` should be overridden with the expected value
(100,000) when there are multiple endpoint groups. Update code and
tests to enforce this.
This is an Enterprise feature. This commit represents the CE portions of
the change; tests are added in the corresponding `consul-enterprise`
change.
* xdsv2: support l7 by adding xfcc policy/headers, tweaking routes, and make a bunch of listeners l7 tests pass
* sidecarproxycontroller: add l7 local app support
* trafficpermissions: make l4 traffic permissions work on l7 workloads
* rename route name field for consistency with l4 cluster name field
* resolve conflicts and rebase
* fix: ensure route name is used in l7 destination route name as well. previously it was only in the route names themselves, now the route name and l7 destination route name line up
* Add InboundPeerTrustBundle maps to Terminating Gateway
* Add notify and cancelation of watch for inbound peer trust bundles
* Pass peer trust bundles to the RBAC creation function
* Regenerate Golden Files
* add changelog, also adds another spot that needed peeredTrustBundles
* Add basic test for terminating gateway with peer trust bundle
* Add intention to cluster peered golden test
* rerun codegen
* update changelog
* really update the changelog
---------
Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
Fix issues with empty sources
* Validate that each permission on traffic permissions resources has at least one source.
* Don't construct RBAC policies when there aren't any principals. This resulted in Envoy rejecting xDS updates with a validation error.
```
error=
| rpc error: code = Internal desc = Error adding/updating listener(s) public_listener: Proto constraint validation failed (RBACValidationError.Rules: embedded message failed validation | caused by RBACValidationError.Policies[consul-intentions-layer4-1]: embedded message failed validation | caused by PolicyValidationError.Principals: value must contain at least 1 item(s)): rules {
```
Adding coauthors who mobbed/paired at various points throughout last week.
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Co-authored-by: Iryna Shustava <iryna@hashicorp.com>
Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: Michael Wilkerson <mwilkerson@hashicorp.com>
Configure Envoy to use the same HTTP protocol version used by the
downstream caller when forwarding requests to a local application that
is configured with the protocol set to either `http2` or `grpc`.
This allows upstream applications that support both HTTP/1.1 and
HTTP/2 on a single port to receive requests using either protocol. This
is beneficial when the application primarily communicates using HTTP/2,
but also needs to support HTTP/1.1, such as to respond to Kubernetes
HTTP readiness/liveness probes.
Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
Reworks the sidecar controller to accept ComputedRoutes as an input and use it to generate appropriate ProxyStateTemplate resources containing L4/L7 mesh configuration.
* mesh-controller: handle L4 protocols for a proxy without upstreams
* sidecar-controller: Support explicit destinations for L4 protocols and single ports.
* This controller generates and saves ProxyStateTemplate for sidecar proxies.
* It currently supports single-port L4 ports only.
* It keeps a cache of all destinations to make it easier to compute and retrieve destinations.
* It will update the status of the pbmesh.Upstreams resource if anything is invalid.
* endpoints-controller: add workload identity to the service endpoints resource
* small fixes
* review comments
* Address PR comments
* sidecar-proxy controller: Add support for transparent proxy
This currently does not support inferring destinations from intentions.
* PR review comments
* mesh-controller: handle L4 protocols for a proxy without upstreams
* sidecar-controller: Support explicit destinations for L4 protocols and single ports.
* This controller generates and saves ProxyStateTemplate for sidecar proxies.
* It currently supports single-port L4 ports only.
* It keeps a cache of all destinations to make it easier to compute and retrieve destinations.
* It will update the status of the pbmesh.Upstreams resource if anything is invalid.
* endpoints-controller: add workload identity to the service endpoints resource
* small fixes
* review comments
* Make sure endpoint refs route to mesh port instead of an app port
* Address PR comments
* fixing copyright
* tidy imports
* sidecar-proxy controller: Add support for transparent proxy
This currently does not support inferring destinations from intentions.
* tidy imports
* add copyright headers
* Prefix sidecar proxy test files with source and destination.
* Update controller_test.go
* NET-5132 - Configure multiport routing for connect proxies in TProxy mode
* formatting golden files
* reverting golden files and adding changes in manually. build implicit destinations still has some issues.
* fixing files that were incorrectly repeating the outbound listener
* PR comments
* extract AlpnProtocol naming convention to getAlpnProtocolFromPortName(portName)
* removing address level filtering.
* adding license to resources_test.go
---------
Co-authored-by: Iryna Shustava <iryna@hashicorp.com>
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
* Fixes issues in setting status
* Update golden files for changes to xds generation to not use deprecated
methods
* Fixed default for validation of JWT for route
This PR enables the GetEnvoyBootstrapParams endpoint to construct envoy bootstrap parameters from v2 catalog and mesh resources.
* Make bootstrap request and response parameters less specific to services so that we can re-use them for workloads or service instances.
* Remove ServiceKind from bootstrap params response. This value was unused previously and is not needed for V2.
* Make access logs generation generic so that we can generate them using v1 or v2 resources.
* Add the plumbing for APIGW JWT work
* Remove unneeded import
* Add deep equal function for HTTPMatch
* Added plumbing for status conditions
* Remove unneeded comment
* Fix comments
* Add calls in xds listener for apigateway to setup listener jwt auth