* put conditionals are hcp initialization for consul server
* put more things behind configuration flags
* add changelog
* TestServer_hcpManager
* fix TestAgent_scadaProvider
Currently, when a client starts a blocking query and an ACL token expires within
that time, Consul will return ACL not found error with a 403 status code. However,
sometimes if an ACL token is invalidated at the same time as the query's deadline is reached,
Consul will instead return an empty response with a 200 status code.
This is because of the events being executed.
1. Client issues a blocking query request with timeout `t`.
2. ACL is deleted.
3. Server detects a change in ACLs and force closes the gRPC stream.
4. Client resubscribes with the same token and resets its state (view).
5. Client sees "ACL not found" error.
If ACL is deleted before step 4, the client is unaware that the stream was closed due to
an ACL error and will return an empty view (from the reset state) with the 200 status code.
To fix this problem, we introduce another state to the subsciption to indicate when a change
to ACLs has occured. If the server sees that there was an error due to ACL change, it will
re-authenticate the request and return an error if the token is no longer valid.
Fixes#20790
* Fix streaming RPCs for agentless.
This PR fixes an issue where cross-dc RPCs were unable to utilize
the streaming backend due to having the node name set. The result
of this was the agent-cache being utilized, which would cause high
cpu utilization and memory consumption due to the fact that it
keeps queries alive for 72 hours before purging inactive entries.
This resource consumption is compounded by the fact that each pod
in consul-k8s gets a unique token. Since the agent-cache uses the
token as a component of the key, the same query is duplicated for
each pod that is deployed.
* Add changelog.
* Fix xDS deadlock due to syncLoop termination.
This fixes an issue where agentless xDS streams can deadlock permanently until
a server is restarted. When this issue occurs, no new proxies are able to
successfully connect to the server.
Effectively, the trigger for this deadlock stems from the following return
statement:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202
When this happens, the entire `syncLoop()` terminates and stops consuming from
the following channel:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192
Which results in the `ConfigSource.cleanup()` function never receiving a
response and holding a mutex indefinitely:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247
Because this mutex is shared, it effectively deadlocks the server's ability to
process new xDS streams.
----
The fix to this issue involves removing the `chan chan struct{}` used like an
RPC-over-channels pattern and replacing it with two distinct channels:
+ `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon. +
`syncLoopDoneCh` - indicates that the `syncLoop()` has terminated.
Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the
`syncLoop()` function ensures that the deadlock above should no longer occur.
We also now evict xDS connections of all proxies for the corresponding
`syncLoop()` whenever it encounters an irrecoverable error. This is done by
hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta
processing. Prior to this fix, the behavior was to simply orphan them so they
would never receive catalog-registration or service-defaults updates.
* Add changelog.
* Shuffle the list of servers returned by `pbserverdiscovery.WatchServers`.
This randomizes the list of servers to help reduce the chance of clients
all connecting to the same server simultaneously. Consul-dataplane is one
such client that does not randomize its own list of servers.
* Fix potential goroutine leak in xDS recv loop.
This commit ensures that the goroutine which receives xDS messages from
proxies will not block forever if the stream's context is cancelled but
the `processDelta()` function never consumes the message (due to being
terminated).
* Add changelog.
* Revert "feat: add alert to link to hcp modal to ask a user refresh a page; up… (#20682)"
This reverts commit dd833d9a36.
* Revert "chor: change cluster name param to have datacenter.name as default value (#20644)"
This reverts commit 8425cd0f90.
* Revert "chor: adds informative error message when acls disabled and read-only… (#20600)"
This reverts commit 9d712ccfc7.
* Revert "Cc 7147 link to hcp modal (#20474)"
This reverts commit 8c05e57ac1.
* Revert "Add nav bar item to show HCP link status and encourage folks to link (#20370)"
This reverts commit 22e6ce0df1.
* Revert "Cc 7145 hcp link status api (#20330)"
This reverts commit 049ca102c4.
* Revert "💜 Cc 7187/purple banner for linking existing clusters (#20275)"
This reverts commit 5119667cd1.
* disable terminating gateway auto host rewrite
* add changelog
* clean up unneeded additional snapshot fields
* add new field to docs
* squash
* fix test
* NET-7813 - DNS : SERVFAIL when resolving PTR records
* Update agent/dns.go
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
* PR feedback
---------
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions
This PR fixes two issues:
1. Not all streams were force closed whenever a snapshot restore happened. This means that anything consuming data from the stream (controllers, queries, etc) were unaware that the data they have is potentially stale / invalid. This first part ensures that all topics are purged.
2. The v1 controllers did not properly handle stream errors (which are likely to appear much more often due to 1 above) and so it introduces a supervisor thread to restart the watches when these errors occur.
Decouple xds capacity controller and autopilot
This prevents a potential bug where autopilot deadlocks while attempting
to execute `AutopilotDelegate.NotifyState()` on an xdscapacity controller
that stopped consuming messages.
Fix issue with persisting proxy-defaults
This resolves an issue introduced in hashicorp/consul#19829
where the proxy-defaults configuration entry with an HTTP protocol
cannot be updated after it has been persisted once and a router
exists. This occurs because the protocol field is not properly
pre-computed before being passed into validation functions.
* Check for ACL write permissions on write
Link eventually will be creating a token, so require acl:write.
* Convert Run to Start, only allow to start once
* Always initialize HCP components at startup
* Support for updating config and client
* Pass HCP manager to controller
* Start HCP manager in link resource
Start as part of link creation rather than always starting. Update
the HCP manager with values from the link before starting as well.
* Fix metrics sink leaked goroutine
* Remove the hardcoded disabled hostname prefix
The HCP metrics sink will always be enabled, so the length of sinks will
always be greater than zero. This also means that we will also always
default to prefixing metrics with the hostname, which is what our
documentation states is the expected behavior anyway.
* Add changelog
* Check and set running status in one method
* Check for primary datacenter, add back test
* Clarify merge reasoning, fix timing issue in test
* Add comment about controller placement
* Expand on breaking change, fix typo in changelog
* Update ui server to include V2 Catalog flag
* Fix typo
* Add route and redirects for the unavailable warning
* Add qualtrics link
* Remove unneccessary check and redirect
* Change logging of registered v2 resource endpoints to add /api prefix
Previous:
agent.http: Registered resource endpoint: endpoint=/demo/v1/executive
New:
agent.http: Registered resource endpoint: endpoint=/api/demo/v1/executive
This reduces confusion when attempting to call the APIs after looking at
the logs.
* Add Link API docs
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Update website/content/api-docs/hcp-link.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Add summary sentence and move api vs config section up
* Add hcp link endpoint to API Overview page
* Update website/content/api-docs/index.mdx
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Update note about v1 API endpoint prefix
* Add a period at end of v1 prefix note.
* Add link to HCP Consul Central
---------
Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
* Adding banner on services page
* Simplified version of setting/unsetting banner
* Translating the text based off of enterprise or not
* Add an integration test
* Adding an acceptance test
* Enable config dismissal as well
* Adding changelog
* Adding some copyrights to the other files
* Revert "Enable config dismissal as well"
This reverts commit e6784c4335bdff99d9183d28571aa6ab4b852cbd.
We'll be doing this in CC-7347
* Exported services api implemented
* Tests added, refactored code
* Adding server tests
* changelog added
* Proto gen added
* Adding codegen changes
* changing url, response object
* Fixing lint error by having namespace and partition directly
* Tests changes
* refactoring tests
* Simplified uniqueness logic for exported services, sorted the response in order of service name
* Fix lint errors, refactored code
Add case insensitive param on service route match
This commit adds in a new feature that allows service routers to specify that
paths and path prefixes should ignore upper / lower casing when matching URLs.
Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Ultimately we will have to rectify wan federation with v2 catalog adjacent
experiments, but for now blanket prevent usage of the resource-apis,
v2dns, and v2tenancy experiments in secondary datacenters.
This add a fix to properly verify the gateway mode before creating a watch specific to mesh gateways. This watch have a high performance cost and when mesh gateways are not used is not used.
This also adds an optimization to only return the nodes when watching the Internal.ServiceDump RPC to avoid unnecessary disco chain compilation. As watches in proxy config only need the nodes.
* Upgrade Go to 1.21
* ci: detect Go backwards compatibility test version automatically
For our submodules and other places we choose to test against previous
Go versions, detect this version automatically from the current one
rather than hard-coding it.
* NET-6945 - Replace usage of deprecated Envoy field envoy.config.core.v3.HeaderValueOption.append
* update proto for v2 and then update xds v2 logic
* add changelog
* Update 20078.txt to be consistent with existing changelog entries
* swap enum values tomatch envoy.
* NET-6946 - Replace usage of deprecated Envoy field envoy.config.route.v3.HeaderMatcher.safe_regex_match
* removing unrelated changes
* update golden files
* do not set engine type