There is a bug in the error handling code for the Agent cache subsystem discovered:
1. NotifyCallback calls notifyBlockingQuery which calls getWithIndex in
a loop (which backs off on-error up to 1 minute)
2. getWithIndex calls fetch if there’s no valid entry in the cache
3. fetch starts a goroutine which calls Fetch on the cache-type, waits
for a while (again with backoff up to 1 minute for errors) and then
calls fetch to trigger a refresh
The end result being that every 1 minute notifyBlockingQuery spawns an
ancestry of goroutines that essentially lives forever.
This PR ensures that the goroutine started by `fetch` cancels any prior
goroutine spawned by the same line for the same key.
In isolated testing where a cache type was tweaked to indefinitely
error, this patch prevented goroutine counts from skyrocketing.
In practice this was masked by #14956 and was only uncovered fixing the
other bug.
go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal
would fail when only #14956 was fixed.
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.
This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
Replaces the reflection-based implementation of proxycfg's
ConfigSnapshot.Clone with code generated by deep-copy.
While load testing server-based xDS (for consul-dataplane) we discovered
this method is extremely expensive. The ConfigSnapshot struct, directly
or indirectly, contains a copy of many of the structs in the agent/structs
package, which creates a large graph for copystructure.Copy to traverse
at runtime, on every proxy reconfiguration.
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.
In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.
In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies).
Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
* Configure Envoy alpn_protocols based on service protocol
* define alpnProtocols in a more standard way
* http2 protocol should be h2 only
* formatting
* add test for getAlpnProtocol()
* create changelog entry
* change scope is connect-proxy
* ignore errors on ParseProxyConfig; fixes linter
* add tests for grpc and http2 public listeners
* remove newlines from PR
* Add alpn_protocol configuration for ingress gateway
* Guard against nil tlsContext
* add ingress gateway w/ TLS tests for gRPC and HTTP2
* getAlpnProtocols: add TCP protocol test
* add tests for ingress gateway with grpc/http2 and per-listener TLS config
* add tests for ingress gateway with grpc/http2 and per-listener TLS config
* add Gateway level TLS config with mixed protocol listeners to validate ALPN
* update changelog to include ingress-gateway
* add http/1.1 to http2 ALPN
* go fmt
* fix test on custom-trace-listener
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric
* update changelog
* update changelog
* correcting changelog
* adding "QueueCheckInterval" for memberlist to test
* updating integration test containers to grab latest api
* feat(ingress gateway: support configuring limits in ingress-gateway config entry
- a new Defaults field with max_connections, max_pending_connections, max_requests
is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
individual services to overwrite the value in Default
- added unit test and integration test
- updated doc
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>