Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions
This PR fixes two issues:
1. Not all streams were force closed whenever a snapshot restore happened. This means that anything consuming data from the stream (controllers, queries, etc) were unaware that the data they have is potentially stale / invalid. This first part ensures that all topics are purged.
2. The v1 controllers did not properly handle stream errors (which are likely to appear much more often due to 1 above) and so it introduces a supervisor thread to restart the watches when these errors occur.
Decouple xds capacity controller and autopilot
This prevents a potential bug where autopilot deadlocks while attempting
to execute `AutopilotDelegate.NotifyState()` on an xdscapacity controller
that stopped consuming messages.
Fix issue with persisting proxy-defaults
This resolves an issue introduced in hashicorp/consul#19829
where the proxy-defaults configuration entry with an HTTP protocol
cannot be updated after it has been persisted once and a router
exists. This occurs because the protocol field is not properly
pre-computed before being passed into validation functions.
* Check for ACL write permissions on write
Link eventually will be creating a token, so require acl:write.
* Convert Run to Start, only allow to start once
* Always initialize HCP components at startup
* Support for updating config and client
* Pass HCP manager to controller
* Start HCP manager in link resource
Start as part of link creation rather than always starting. Update
the HCP manager with values from the link before starting as well.
* Fix metrics sink leaked goroutine
* Remove the hardcoded disabled hostname prefix
The HCP metrics sink will always be enabled, so the length of sinks will
always be greater than zero. This also means that we will also always
default to prefixing metrics with the hostname, which is what our
documentation states is the expected behavior anyway.
* Add changelog
* Check and set running status in one method
* Check for primary datacenter, add back test
* Clarify merge reasoning, fix timing issue in test
* Add comment about controller placement
* Expand on breaking change, fix typo in changelog
* Update ui server to include V2 Catalog flag
* Fix typo
* Add route and redirects for the unavailable warning
* Add qualtrics link
* Remove unneccessary check and redirect
* Change logging of registered v2 resource endpoints to add /api prefix
Previous:
agent.http: Registered resource endpoint: endpoint=/demo/v1/executive
New:
agent.http: Registered resource endpoint: endpoint=/api/demo/v1/executive
This reduces confusion when attempting to call the APIs after looking at
the logs.
* Add Link API docs
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Add summary sentence and move api vs config section up
* Add hcp link endpoint to API Overview page
* Update website/content/api-docs/index.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Update note about v1 API endpoint prefix
* Add a period at end of v1 prefix note.
* Add link to HCP Consul Central
---------
Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Adding banner on services page
* Simplified version of setting/unsetting banner
* Translating the text based off of enterprise or not
* Add an integration test
* Adding an acceptance test
* Enable config dismissal as well
* Adding changelog
* Adding some copyrights to the other files
* Revert "Enable config dismissal as well"
This reverts commit e6784c4335bdff99d9183d28571aa6ab4b852cbd.
We'll be doing this in CC-7347
* Exported services api implemented
* Tests added, refactored code
* Adding server tests
* changelog added
* Proto gen added
* Adding codegen changes
* changing url, response object
* Fixing lint error by having namespace and partition directly
* Tests changes
* refactoring tests
* Simplified uniqueness logic for exported services, sorted the response in order of service name
* Fix lint errors, refactored code
Add case insensitive param on service route match
This commit adds in a new feature that allows service routers to specify that
paths and path prefixes should ignore upper / lower casing when matching URLs.
Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Ultimately we will have to rectify wan federation with v2 catalog adjacent
experiments, but for now blanket prevent usage of the resource-apis,
v2dns, and v2tenancy experiments in secondary datacenters.
This add a fix to properly verify the gateway mode before creating a watch specific to mesh gateways. This watch have a high performance cost and when mesh gateways are not used is not used.
This also adds an optimization to only return the nodes when watching the Internal.ServiceDump RPC to avoid unnecessary disco chain compilation. As watches in proxy config only need the nodes.
* Upgrade Go to 1.21
* ci: detect Go backwards compatibility test version automatically
For our submodules and other places we choose to test against previous
Go versions, detect this version automatically from the current one
rather than hard-coding it.
* NET-6945 - Replace usage of deprecated Envoy field envoy.config.core.v3.HeaderValueOption.append
* update proto for v2 and then update xds v2 logic
* add changelog
* Update 20078.txt to be consistent with existing changelog entries
* swap enum values tomatch envoy.
* NET-6946 - Replace usage of deprecated Envoy field envoy.config.route.v3.HeaderMatcher.safe_regex_match
* removing unrelated changes
* update golden files
* do not set engine type
This commit fixes an issue where the partition was not properly set
on the peering query failover target created from sameness-groups.
Before this change, it was always empty, meaning that the data
would be queried with respect to the default partition always. This
resulted in a situation where a PQ that was attempting to use a
sameness-group for failover would select peers from the default
partition, rather than the partition of the sameness-group itself.
* cli: Deprecate the `-admin-access-log-path` flag from `consul connect envoy` command in favor of: `-admin-access-log-config`.
* fix changelog
* add in documentation change.
* updating usage of http2_protocol_options and access_log_path
* add changelog
* update template for AdminAccessLogConfig
* remove mucking with AdminAccessLogConfig
* add a hash to config entries when normalizing
* add GetHash and implement comparing hashes
* only update if the Hash is different
* only update if the Hash is different and not 0
* fix proto to include the Hash
* fix proto gen
* buf format
* add SetHash and fix tests
* fix config load tests
* fix state test and config test
* recalculate hash when restoring config entries
* fix snapshot restore test
* add changelog
* fix missing normalize, fix proto indexes and add normalize test
When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).
This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.
This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster
Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.
The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.
An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects. Visit the following thread for some
extra context on why certainty around these race conditions is difficult:
https://github.com/envoyproxy/envoy/issues/13009
This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.
Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
* Add CE version of gateway-upstream-disambiguation
* Use NamespaceOrDefault and PartitionOrDefault
* Add Changelog entry
* Remove the unneeded reassignment
* Use c.ID()
* Adding cli command to list exported services to a peer
* Changelog added
* Addressing docs comments
* Adding test case for no exported services scenario