* Fix streaming RPCs for agentless.
This PR fixes an issue where cross-dc RPCs were unable to utilize
the streaming backend due to having the node name set. The result
of this was the agent-cache being utilized, which would cause high
cpu utilization and memory consumption due to the fact that it
keeps queries alive for 72 hours before purging inactive entries.
This resource consumption is compounded by the fact that each pod
in consul-k8s gets a unique token. Since the agent-cache uses the
token as a component of the key, the same query is duplicated for
each pod that is deployed.
* Add changelog.
* Fix xDS deadlock due to syncLoop termination.
This fixes an issue where agentless xDS streams can deadlock permanently until
a server is restarted. When this issue occurs, no new proxies are able to
successfully connect to the server.
Effectively, the trigger for this deadlock stems from the following return
statement:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202
When this happens, the entire `syncLoop()` terminates and stops consuming from
the following channel:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192
Which results in the `ConfigSource.cleanup()` function never receiving a
response and holding a mutex indefinitely:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247
Because this mutex is shared, it effectively deadlocks the server's ability to
process new xDS streams.
----
The fix to this issue involves removing the `chan chan struct{}` used like an
RPC-over-channels pattern and replacing it with two distinct channels:
+ `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon. +
`syncLoopDoneCh` - indicates that the `syncLoop()` has terminated.
Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the
`syncLoop()` function ensures that the deadlock above should no longer occur.
We also now evict xDS connections of all proxies for the corresponding
`syncLoop()` whenever it encounters an irrecoverable error. This is done by
hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta
processing. Prior to this fix, the behavior was to simply orphan them so they
would never receive catalog-registration or service-defaults updates.
* Add changelog.
* Shuffle the list of servers returned by `pbserverdiscovery.WatchServers`.
This randomizes the list of servers to help reduce the chance of clients
all connecting to the same server simultaneously. Consul-dataplane is one
such client that does not randomize its own list of servers.
* Fix potential goroutine leak in xDS recv loop.
This commit ensures that the goroutine which receives xDS messages from
proxies will not block forever if the stream's context is cancelled but
the `processDelta()` function never consumes the message (due to being
terminated).
* Add changelog.
* Revert "feat: add alert to link to hcp modal to ask a user refresh a page; up… (#20682)"
This reverts commit dd833d9a3649402e23ced070121e0d0c131f610e.
* Revert "chor: change cluster name param to have datacenter.name as default value (#20644)"
This reverts commit 8425cd0f9017f640cce711dc32e0fa0d136899d8.
* Revert "chor: adds informative error message when acls disabled and read-only… (#20600)"
This reverts commit 9d712ccfc7a67193423f1a102ac2b9d3c6dc3733.
* Revert "Cc 7147 link to hcp modal (#20474)"
This reverts commit 8c05e57ac1fdc27ea74040e2dfc35192ac6d067b.
* Revert "Add nav bar item to show HCP link status and encourage folks to link (#20370)"
This reverts commit 22e6ce0df10091bc66ee7fbf8e5d1c0f158ab5a9.
* Revert "Cc 7145 hcp link status api (#20330)"
This reverts commit 049ca102c41fbf646b07e34f5f69f652de9fbc6c.
* Revert "💜 Cc 7187/purple banner for linking existing clusters (#20275)"
This reverts commit 5119667cd16c527af111c339594a08354b7a5cb0.
* disable terminating gateway auto host rewrite
* add changelog
* clean up unneeded additional snapshot fields
* add new field to docs
* squash
* fix test
* Update agent/dns.go
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
* PR feedback
* split tests out into multiple files.
* Extract responsibilities from router into discoveryResultsFetcher, messageSerializer, responseGenerator.
* adding recordmaker tests
* add response generator test coverage.
* changing tests case name based on PR feedback
---------
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
When executing the test with multiple tenancies the various mesh controllers remained running across multiple sub-tests. While the resourcetest.Client cleans up resources created with it, resources created by the controllers were not being cleaned up. This could result in subsequent sub tests encountering unexpected values and failing when they should pass.
docs: Update v2 K8s docs to use virtual port references
Now that service virtual port references are supported in xRoutes and
config,
- Call out both formats accepted for service port references
- Update K8s examples to use virtual ports (most likely use case)