When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).
This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.
This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
* Add InboundPeerTrustBundle maps to Terminating Gateway
* Add notify and cancelation of watch for inbound peer trust bundles
* Pass peer trust bundles to the RBAC creation function
* Regenerate Golden Files
* add changelog, also adds another spot that needed peeredTrustBundles
* Add basic test for terminating gateway with peer trust bundle
* Add intention to cluster peered golden test
* rerun codegen
* update changelog
* really update the changelog
---------
Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
* Adding explicit MPL license for sub-package
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
* Adding explicit MPL license for sub-package
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
* Updating the license from MPL to Business Source License
Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at <Blog URL>, FAQ at www.hashicorp.com/licensing-faq, and details of the license at www.hashicorp.com/bsl.
* add missing license headers
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
---------
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
* xds generation for routes api gateway
* Update gateway.go
* move buildHttpRoute into xds package
* Update agent/consul/discoverychain/gateway.go
* remove unneeded function
* convert http route code to only run for http protocol to future proof code path
* Update agent/consul/discoverychain/gateway.go
Co-authored-by: Mike Morris <mikemorris@users.noreply.github.com>
* fix tests, clean up http check logic
* clean up todo
* Fix casing in docstring
* Fix import block, adjust docstrings
* Rename func
* Consolidate docstring onto single line
* Remove ToIngress() conversion for APIGW, which generates its own xDS now
* update name and comment
* use constant value
* use constant
* rename readyUpstreams to readyListeners to better communicate what that function is doing
---------
Co-authored-by: Mike Morris <mikemorris@users.noreply.github.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* API Gateway XDS Primitives, endpoints and clusters (#17002)
* XDS primitive generation for endpoints and clusters
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* server_test
* deleted extra file
* add missing parents to test
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Routes for API Gateway (#17158)
* XDS primitive generation for endpoints and clusters
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* server_test
* deleted extra file
* add missing parents to test
* checkpoint
* delete extra file
* httproute flattening code
* linting issue
* so close on this, calling for tonight
* unit test passing
* add in header manip to virtual host
* upstream rebuild commented out
* Use consistent upstream name whether or not we're rebuilding
* Start working through route naming logic
* Fix typos in test descriptions
* Simplify route naming logic
* Simplify RebuildHTTPRouteUpstream
* Merge additional compiled discovery chains instead of overwriting
* Use correct chain for flattened route, clean up + add TODOs
* Remove empty conditional branch
* Restore previous variable declaration
Limit the scope of this PR
* Clean up, improve TODO
* add logging, clean up todos
* clean up function
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* checkpoint, skeleton, tests not passing
* checkpoint
* endpoints xds cluster configuration
* resources test fix
* fix reversion in resources_test
* checkpoint
* Update agent/proxycfg/api_gateway.go
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
* unit tests passing
* gofmt
* add deterministic sorting to appease the unit test gods
* remove panic
* Find ready upstream matching listener instead of first in list
* Clean up, improve TODO
* Modify getReadyUpstreams to filter upstreams by listener (#17410)
Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.
* clean up todos, references to api gateway in listeners_ingress
* merge in Nathan's fix
* Update agent/consul/discoverychain/gateway.go
* cleanup current todos, remove snapshot manipulation from generation code
* Update agent/structs/config_entry_gateways.go
Co-authored-by: Thomas Eckert <teckert@hashicorp.com>
* Update agent/consul/discoverychain/gateway.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Update agent/consul/discoverychain/gateway.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Update agent/proxycfg/snapshot.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* clarified header comment for FlattenHTTPRoute, changed RebuildHTTPRouteUpstream to BuildHTTPRouteUpstream
* simplify cert logic
* Delete scratch
* revert route related changes in listener PR
* Update agent/consul/discoverychain/gateway.go
* Update agent/proxycfg/snapshot.go
* clean up uneeded extra lines in endpoints
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Thomas Eckert <teckert@hashicorp.com>
* endpoints xds cluster configuration
* resources test fix
* fix reversion in resources_test
* Update agent/proxycfg/api_gateway.go
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
* gofmt
* Modify getReadyUpstreams to filter upstreams by listener (#17410)
Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.
* Update agent/proxycfg/api_gateway.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Restore import blocking
* Skip to next route if route has no upstreams
* cleanup
* change set from bool to empty struct
---------
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Remove unused are hosts set check
* Remove all traces of unused 'AreHostsSet' parameter
* Remove unused Hosts attribute
* Remove commented out use of snap.APIGateway.Hosts
Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
* Include secret type when building resources from config snapshot
* First pass at generating envoy secrets from api-gateway snapshot
* Update comments for xDS update order
* Add secret type + corresponding golden files to existing tests
* Initialize test helpers for testing api-gateway resource generation
* Generate golden files for new api-gateway xDS resource test
* Support ADS for TLS certificates on api-gateway
* Configure TLS on api-gateway listeners
* Inline TLS cert code
* update tests
* Add SNI support so we can have multiple certificates
* Remove commented out section from helper
* regen deep-copy
* Add tcp tls test
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* [API Gateway] Add integration test for conflicted TCP listeners
* [API Gateway] Update simple test to leverage intentions and multiple listeners
* Fix broken unit test
* [API Gateway] Add integration test for HTTP routes
* [API Gateway] Add integration test for conflicted TCP listeners
* [API Gateway] Update simple test to leverage intentions and multiple listeners
* Fix broken unit test
* PR suggestions
* Stub proxycfg handler for API gateway
* Add Service Kind constants/handling for API Gateway
* Begin stubbing for SDS
* Add new Secret type to xDS order of operations
* Continue stubbing of SDS
* Iterate on proxycfg handler for API gateway
* Handle BoundAPIGateway config entry subscription in proxycfg-glue
* Add API gateway to config snapshot validation
* Add API gateway to config snapshot clone, leaf, etc.
* Subscribe to bound route + cert config entries on bound-api-gateway
* Track routes + certs on API gateway config snapshot
* Generate DeepCopy() for types used in watch.Map
* Watch all active references on api-gateway, unwatch inactive
* Track loading of initial bound-api-gateway config entry
* Use proper proto package for SDS mapping
* Use ResourceReference instead of ServiceName, collect resources
* Fix typo, add + remove TODOs
* Watch discovery chains for TCPRoute
* Add TODO for updating gateway services for api-gateway
* make proto
* Regenerate deep-copy for proxycfg
* Set datacenter on upstream ID from query source
* Watch discovery chains for http-route service backends
* Add ServiceName getter to HTTP+TCP Service structs
* Clean up unwatched discovery chains on API Gateway
* Implement watch for ingress leaf certificate
* Collect upstreams on http-route + tcp-route updates
* Remove unused GatewayServices update handler
* Remove unnecessary gateway services logic for API Gateway
* Remove outdate TODO
* Use .ToIngress where appropriate, including TODO for cleaning up
* Cancel before returning error
* Remove GatewayServices subscription
* Add godoc for handlerAPIGateway functions
* Update terminology from Connect => Consul Service Mesh
Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690
* Add missing TODO
* Remove duplicate switch case
* Rerun deep-copy generator
* Use correct property on config snapshot
* Remove unnecessary leaf cert watch
* Clean up based on code review feedback
* Note handler properties that are initialized but set elsewhere
* Add TODO for moving helper func into structs pkg
* Update generated DeepCopy code
* gofmt
* Begin stubbing for SDS
* Start adding tests
* Remove second BoundAPIGateway case in glue
* TO BE PICKED: fix formatting of str
* WIP
* Fix merge conflict
* Implement HTTP Route to Discovery Chain config entries
* Stub out function to create discovery chain
* Add discovery chain merging code (#16131)
* Test adding TCP and HTTP routes
* Add some tests for the synthesizer
* Run go mod tidy
* Pairing with N8
* Run deep copy
* Clean up GatewayChainSynthesizer
* Fix missing assignment of BoundAPIGateway topic
* Separate out synthesizeChains and toIngressTLS
* Fix build errors
* Ensure synthesizer skips non-matching routes by protocol
* Rebase on N8s work
* Generate DeepCopy() for API gateway listener types
* Improve variable name
* Regenerate DeepCopy() code
* Fix linting issue
* fix protobuf import
* Fix more merge conflict errors
* Fix synthesize test
* Run deep copy
* Add URLRewrite to proto
* Update agent/consul/discoverychain/gateway_tcproute.go
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
* Remove APIGatewayConfigEntry that was extra
* Error out if route kind is unknown
* Fix formatting errors in proto
---------
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>
* Stub proxycfg handler for API gateway
* Add Service Kind constants/handling for API Gateway
* Begin stubbing for SDS
* Add new Secret type to xDS order of operations
* Continue stubbing of SDS
* Iterate on proxycfg handler for API gateway
* Handle BoundAPIGateway config entry subscription in proxycfg-glue
* Add API gateway to config snapshot validation
* Add API gateway to config snapshot clone, leaf, etc.
* Subscribe to bound route + cert config entries on bound-api-gateway
* Track routes + certs on API gateway config snapshot
* Generate DeepCopy() for types used in watch.Map
* Watch all active references on api-gateway, unwatch inactive
* Track loading of initial bound-api-gateway config entry
* Use proper proto package for SDS mapping
* Use ResourceReference instead of ServiceName, collect resources
* Fix typo, add + remove TODOs
* Watch discovery chains for TCPRoute
* Add TODO for updating gateway services for api-gateway
* make proto
* Regenerate deep-copy for proxycfg
* Set datacenter on upstream ID from query source
* Watch discovery chains for http-route service backends
* Add ServiceName getter to HTTP+TCP Service structs
* Clean up unwatched discovery chains on API Gateway
* Implement watch for ingress leaf certificate
* Collect upstreams on http-route + tcp-route updates
* Remove unused GatewayServices update handler
* Remove unnecessary gateway services logic for API Gateway
* Remove outdate TODO
* Use .ToIngress where appropriate, including TODO for cleaning up
* Cancel before returning error
* Remove GatewayServices subscription
* Add godoc for handlerAPIGateway functions
* Update terminology from Connect => Consul Service Mesh
Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690
* Add missing TODO
* Remove duplicate switch case
* Rerun deep-copy generator
* Use correct property on config snapshot
* Remove unnecessary leaf cert watch
* Clean up based on code review feedback
* Note handler properties that are initialized but set elsewhere
* Add TODO for moving helper func into structs pkg
* Update generated DeepCopy code
* gofmt
* Generate DeepCopy() for API gateway listener types
* Improve variable name
* Regenerate DeepCopy() code
* Fix linting issue
* Temporarily remove the secret type from resource generation
Fix configuration merging for implicit tproxy upstreams.
Change the merging logic so that the wildcard upstream has correct proxy-defaults
and service-defaults values combined into it. It did not previously merge all fields,
and the wildcard upstream did not exist unless service-defaults existed (it ignored
proxy-defaults, essentially).
Change the way we fetch upstream configuration in the xDS layer so that it falls back
to the wildcard when no matching upstream is found. This is what allows implicit peer
upstreams to have the correct "merged" config.
Change proxycfg to always watch local mesh gateway endpoints whenever a peer upstream
is found. This simplifies the logic so that we do not have to inspect the "merged"
configuration on peer upstreams to extract the mesh gateway mode.
Replaces the reflection-based implementation of proxycfg's
ConfigSnapshot.Clone with code generated by deep-copy.
While load testing server-based xDS (for consul-dataplane) we discovered
this method is extremely expensive. The ConfigSnapshot struct, directly
or indirectly, contains a copy of many of the structs in the agent/structs
package, which creates a large graph for copystructure.Copy to traverse
at runtime, on every proxy reconfiguration.
This commit adds the xDS resources needed for INBOUND traffic from peer
clusters:
- 1 filter chain for all inbound peering requests.
- 1 cluster for all inbound peering requests.
- 1 endpoint per voting server with the gRPC TLS port configured.
There is one filter chain and cluster because unlike with WAN
federation, peer clusters will not attempt to dial individual servers.
Peer clusters will only dial the local mesh gateway addresses.
* feat(ingress gateway: support configuring limits in ingress-gateway config entry
- a new Defaults field with max_connections, max_pending_connections, max_requests
is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
individual services to overwrite the value in Default
- added unit test and integration test
- updated doc
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Routing peering control plane traffic through mesh gateways can be
enabled or disabled at runtime with the mesh config entry.
This commit updates proxycfg to add or cancel watches for local servers
depending on this central config.
Note that WAN federation over mesh gateways is determined by a service
metadata flag, and any updates to the gateway service registration will
force the creation of a new snapshot. If enabled, WAN-fed over mesh
gateways will trigger a local server watch on initialize().
Because of this we will only add/remove server watches if WAN federation
over mesh gateways is disabled.
Peered upstreams has a separate loop in xds from discovery chain upstreams. This PR adds similar but slightly modified code to add filters for peered upstream listeners, clusters, and endpoints in the case of transparent proxy.
When the protocol is http-like, and an intention has a peered source
then the normal RBAC mTLS SAN field check is replaces with a joint combo
of:
mTLS SAN field must be the service's local mesh gateway leaf cert
AND
the first XFCC header (from the MGW) must have a URI field that matches the original intention source
Also:
- Update the regex program limit to be much higher than the teeny
defaults, since the RBAC regex constructions are more complicated now.
- Fix a few stray panics in xds generation.
This is only configured in xDS when a service with an L7 protocol is
exported.
They also load any relevant trust bundles for the peered services to
eventually use for L7 SPIFFE validation during mTLS termination.
Mesh gateways can use hostnames in their tagged addresses (#7999). This is useful
if you were to expose a mesh gateway using a cloud networking load balancer appliance
that gives you a DNS name but no reliable static IPs.
Envoy cannot accept hostnames via EDS and those must be configured using CDS.
There was already logic when configuring gateways in other locations in the code, but
given the illusions in play for peering the downstream of a peered service wasn't aware
that it should be doing that.
Also:
- ensuring that we always try to use wan-like addresses to cross peer boundaries.
Mesh gateways will now enable tcp connections with SNI names including peering information so that those connections may be proxied.
Note: this does not change the callers to use these mesh gateways.
Envoy's SPIFFE certificate validation extension allows for us to
validate against different root certificates depending on the trust
domain of the dialing proxy.
If there are any trust bundles from peers in the config snapshot then we
use the SPIFFE validator as the validation context, rather than the
usual TrustedCA.
The injected validation config includes the local root certificates as
well.
For mTLS to work between two proxies in peered clusters with different root CAs,
proxies need to configure their outbound listener to use different root certificates
for validation.
Up until peering was introduced proxies would only ever use one set of root certificates
to validate all mesh traffic, both inbound and outbound. Now an upstream proxy
may have a leaf certificate signed by a CA that's different from the dialing proxy's.
This PR makes changes to proxycfg and xds so that the upstream TLS validation
uses different root certificates depending on which cluster is being dialed.