Commit Graph

1345 Commits

Author SHA1 Message Date
Dan Stough 6026ada0c9
[CE] feat(v2dns): enable v2 dns as default (#20715)
* feat(v2dns): enable v2 dns as default

* changelog
2024-03-25 16:09:01 -04:00
Iryna Shustava d747b51dab
Handle ACL errors consistently when blocking query timeout is reached. (#20876)
Currently, when a client starts a blocking query and an ACL token expires within
that time, Consul will return ACL not found error with a 403 status code. However,
sometimes if an ACL token is invalidated at the same time as the query's deadline is reached,
Consul will instead return an empty response with a 200 status code.

This is because of the events being executed.
1. Client issues a blocking query request with timeout `t`.
2. ACL is deleted.
3. Server detects a change in ACLs and force closes the gRPC stream.
4. Client resubscribes with the same token and resets its state (view).
5. Client sees "ACL not found" error.

If ACL is deleted before step 4, the client is unaware that the stream was closed due to
an ACL error and will return an empty view (from the reset state) with the 200 status code.

To fix this problem, we introduce another state to the subsciption to indicate when a change
to ACLs has occured. If the server sees that there was an error due to ACL change, it will
re-authenticate the request and return an error if the token is no longer valid.

Fixes #20790
2024-03-22 14:59:54 -06:00
Derek Menteer ac83ac1343
Fix streaming RPCs for agentless. (#20868)
* Fix streaming RPCs for agentless.

This PR fixes an issue where cross-dc RPCs were unable to utilize
the streaming backend due to having the node name set. The result
of this was the agent-cache being utilized, which would cause high
cpu utilization and memory consumption due to the fact that it
keeps queries alive for 72 hours before purging inactive entries.

This resource consumption is compounded by the fact that each pod
in consul-k8s gets a unique token. Since the agent-cache uses the
token as a component of the key, the same query is duplicated for
each pod that is deployed.

* Add changelog.
2024-03-15 14:44:51 -05:00
Derek Menteer 0ac8ae6c3b
Fix xDS deadlock due to syncLoop termination. (#20867)
* Fix xDS deadlock due to syncLoop termination.

This fixes an issue where agentless xDS streams can deadlock permanently until
a server is restarted. When this issue occurs, no new proxies are able to
successfully connect to the server.

Effectively, the trigger for this deadlock stems from the following return
statement:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202

When this happens, the entire `syncLoop()` terminates and stops consuming from
the following channel:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192

Which results in the `ConfigSource.cleanup()` function never receiving a
response and holding a mutex indefinitely:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247

Because this mutex is shared, it effectively deadlocks the server's ability to
process new xDS streams.

----

The fix to this issue involves removing the `chan chan struct{}` used like an
RPC-over-channels pattern and replacing it with two distinct channels:

+ `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon.  +
`syncLoopDoneCh` - indicates that the `syncLoop()` has terminated.

Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the
`syncLoop()` function ensures that the deadlock above should no longer occur.

We also now evict xDS connections of all proxies for the corresponding
`syncLoop()` whenever it encounters an irrecoverable error. This is done by
hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta
processing. Prior to this fix, the behavior was to simply orphan them so they
would never receive catalog-registration or service-defaults updates.

* Add changelog.
2024-03-15 13:57:11 -05:00
Derek Menteer eabff257d7
Various bug-fixes and improvements (#20866)
* Shuffle the list of servers returned by `pbserverdiscovery.WatchServers`.

This randomizes the list of servers to help reduce the chance of clients
all connecting to the same server simultaneously. Consul-dataplane is one
such client that does not randomize its own list of servers.

* Fix potential goroutine leak in xDS recv loop.

This commit ensures that the goroutine which receives xDS messages from
proxies will not block forever if the stream's context is cancelled but
the `processDelta()` function never consumes the message (due to being
terminated).

* Add changelog.
2024-03-15 13:10:48 -05:00
Matt Keeler 8fcafb139c
Add `consul snapshot decode` command (#20824)
Add snapshot decoding command
2024-03-14 12:59:06 -04:00
Deniz Onur Duzgun e9029ccd7a
[NET-8368] security: bump Go version to 1.21.8 (#20812)
* [NET-8368] Bump Go version
2024-03-14 09:46:15 -04:00
Chris Hut bfbc0ee4fd
Revert link existing but better 🪦 (#20830)
* Revert "feat: add alert to link to hcp modal to ask a user refresh a page; up… (#20682)"

This reverts commit dd833d9a36.

* Revert "chor: change cluster name param to have datacenter.name as default value (#20644)"

This reverts commit 8425cd0f90.

* Revert "chor: adds informative error message when acls disabled and read-only… (#20600)"

This reverts commit 9d712ccfc7.

* Revert "Cc 7147 link to hcp modal (#20474)"

This reverts commit 8c05e57ac1.

* Revert "Add nav bar item to show HCP link status and encourage folks to link (#20370)"

This reverts commit 22e6ce0df1.

* Revert "Cc 7145 hcp link status api (#20330)"

This reverts commit 049ca102c4.

* Revert "💜 Cc 7187/purple banner for linking existing clusters (#20275)"

This reverts commit 5119667cd1.
2024-03-13 13:59:00 -07:00
sarahalsmiller 262f435800
NET-6821 Disable Terminating Gateway Auto Host Header Rewrite (#20802)
* disable terminating gateway auto host rewrite

* add changelog

* clean up unneeded additional snapshot fields

* add new field to docs

* squash

* fix test
2024-03-12 15:37:20 -05:00
Michael Zalimeni d4761c0ccd
security: upgrade google.golang.org/protobuf to 1.33.0 (#20801)
Resolves CVE-2024-24786.
2024-03-06 23:04:42 +00:00
Matt Keeler abe14f11e6
Remove redundant usage metrics (#20674)
* Remove redundant usage metrics

* Add the changelog

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2024-03-05 14:09:47 -05:00
Matt Keeler 5c936fba33
Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage (#20672)
* Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage

* Add changelog
2024-03-01 13:44:55 -05:00
Chris Hut a58f346f55
Hardcode links to CCM to be false (#20732)
Hardcode links to CCM to be false - due to CCM deprecation

Remove changelog item for breaking Ui change
 - since CCM linking no longer exists
2024-02-26 12:34:01 -08:00
sarahalsmiller 670ee90a77
Use correct enterprise meta on wildcard service update (#20721)
* use correct enterprise meta on wildcard service update

* changelog

* rename changelog file
2024-02-26 12:03:08 -06:00
Matt Keeler 16a0800777
Update API and API Docs regarding disabling gossip for a partition. (#20669) 2024-02-26 12:14:39 -05:00
John Murret 26eed12f04
NET-7813 - DNS : SERVFAIL when resolving PTR records (#20679)
* NET-7813 - DNS : SERVFAIL when resolving PTR records

* Update agent/dns.go

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>

* PR feedback

---------

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
2024-02-21 17:44:04 +00:00
Dan Stough 14efb28086
fix(v2dns): add node ttl to workloads, comment cleanup, and changelog (#20643)
* fix(v2dns): add node ttl to workloads, plus comment cleanup

* docs(v2dns): changelog
2024-02-14 17:38:11 -05:00
Derek Menteer 9f7626d501
Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions (#20642)
Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions

This PR fixes two issues:

1. Not all streams were force closed whenever a snapshot restore happened. This means that anything consuming data from the stream (controllers, queries, etc) were unaware that the data they have is potentially stale / invalid. This first part ensures that all topics are purged.

2. The v1 controllers did not properly handle stream errors (which are likely to appear much more often due to 1 above) and so it introduces a supervisor thread to restart the watches when these errors occur.
2024-02-14 14:17:55 -06:00
Michael Zalimeni 5862c52642
[NET-7948] Bump Envoy version to address multiple CVEs (#20589)
security: Bump Envoy versions to address CVEs
2024-02-12 22:29:50 +00:00
Valeriia Ruban 8c05e57ac1
Cc 7147 link to hcp modal (#20474)
* add link hcp modal component

* integrate modal with SideNav and link to hcp banner
---------

Co-authored-by: Chris Hut <tophernuts@gmail.com>
2024-02-09 18:23:13 +00:00
skpratt 738dc8c89d
use go 1.21.7 (#20545)
* 1.21.7

* changelog
2024-02-08 23:39:11 +00:00
Derek Menteer a1c8d4dd19
Decouple xds capacity controller and raft-autopilot (#20511)
Decouple xds capacity controller and autopilot

This prevents a potential bug where autopilot deadlocks while attempting
to execute `AutopilotDelegate.NotifyState()` on an xdscapacity controller
that stopped consuming messages.
2024-02-08 15:31:44 -06:00
Chris S. Kim 26661a1c3b
Add default intention policy (#20544) 2024-02-08 20:25:42 +00:00
Joshua Timmons 242b777547
Fix logging when we fail to export metrics to hcp (#20514) 2024-02-08 11:00:47 -05:00
Ashesh Vidyut cffb5d7c6e
Fix audit-log encoding issue (CC-7337) (#20345)
* add changes

* added changelog

* change update

* CE chnages

* Removed gzip size fix

* fix changelog

* Update .changelog/20345.txt

Co-authored-by: Hans Hasselberg <hans@hashicorp.com>

* Adding comments

---------

Co-authored-by: Abhishek Sahu <abhishek.sahu@hashicorp.com>
Co-authored-by: Hans Hasselberg <hans@hashicorp.com>
Co-authored-by: srahul3 <rahulsharma@hashicorp.com>
2024-02-06 16:40:07 +05:30
Tauhid Anjum 0c509a60a4
Exported services CLI and docs (#20331)
* Exported services CLI and docs

* Changelog added

* Added format option for pretty print

* Update command/exportedservices/exported_services.go

Co-authored-by: Ashesh Vidyut <134911583+absolutelightning@users.noreply.github.com>

* Addressing PR comments, moving the command under services category

* Add consumer peer and partition filter

* Adding bexpr filter, change format of data

---------

Co-authored-by: Ashesh Vidyut <134911583+absolutelightning@users.noreply.github.com>
2024-02-06 09:01:20 +05:30
Derek Menteer 922844b8e0
Fix issue with persisting proxy-defaults (#20481)
Fix issue with persisting proxy-defaults

This resolves an issue introduced in hashicorp/consul#19829
where the proxy-defaults configuration entry with an HTTP protocol
cannot be updated after it has been persisted once and a router
exists. This occurs because the protocol field is not properly
pre-computed before being passed into validation functions.
2024-02-05 16:00:19 -06:00
natemollica-dev 2b07b326c4
Resolve Consul DNS in OpenShift (#20439) 2024-02-01 14:00:27 -08:00
Chris S. Kim b6f10bc58f
Skip filter chain created by permissive mtls (#20406) 2024-01-31 16:39:12 -05:00
Derek Menteer 3e8ec8d18e
Fix SAN matching on terminating gateways (#20417)
Fixes issue: hashicorp/consul#20360

A regression was introduced in hashicorp/consul#19954 where the SAN validation
matching was reduced from 4 potential types down to just the URI.

Terminating gateways will need to match on many fields depending on user
configuration, since they make egress calls outside of the cluster. Having more
than one matcher behaves like an OR operation, where any match is sufficient to
pass the certificate validation. To maintain backwards compatibility with the
old untyped `match_subject_alt_names` Envoy behavior, we should match on all 4
enum types.

https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#enum-extensions-transport-sockets-tls-v3-subjectaltnamematcher-santype
2024-01-31 12:17:45 -06:00
Melissa Kam 3b9bb8d6f9
[CC-7044] Start HCP manager as part of link creation (#20312)
* Check for ACL write permissions on write

Link eventually will be creating a token, so require acl:write.

* Convert Run to Start, only allow to start once

* Always initialize HCP components at startup

* Support for updating config and client

* Pass HCP manager to controller

* Start HCP manager in link resource

Start as part of link creation rather than always starting. Update
the HCP manager with values from the link before starting as well.

* Fix metrics sink leaked goroutine

* Remove the hardcoded disabled hostname prefix

The HCP metrics sink will always be enabled, so the length of sinks will
always be greater than zero. This also means that we will also always
default to prefixing metrics with the hostname, which is what our
documentation states is the expected behavior anyway.

* Add changelog

* Check and set running status in one method

* Check for primary datacenter, add back test

* Clarify merge reasoning, fix timing issue in test

* Add comment about controller placement

* Expand on breaking change, fix typo in changelog
2024-01-29 16:31:44 -06:00
Tyler Wendlandt b9f3e5e247
NET-5398: V2 unavailable UI message (#20359)
* Update ui server to include V2 Catalog flag

* Fix typo

* Add route and redirects for the unavailable warning

* Add qualtrics link

* Remove unneccessary check and redirect
2024-01-29 14:28:41 -07:00
Tyler Wendlandt 7e08d8988c
NET-5398: Update UI server to include if v2 is enabled (#20353)
* Update ui server to include V2 Catalog flag

* Fix typo
2024-01-26 14:38:51 -07:00
Luke Kysow 840f11a0c5
Change logging of registered v2 resource endpoints to add /api prefix (#20352)
* Change logging of registered v2 resource endpoints to add /api prefix

Previous:

    agent.http: Registered resource endpoint: endpoint=/demo/v1/executive

New:

    agent.http: Registered resource endpoint: endpoint=/api/demo/v1/executive

This reduces confusion when attempting to call the APIs after looking at
the logs.
2024-01-25 14:18:54 -08:00
Nick Cellino 4801c9cbdc
Add Link API docs (#20308)
* Add Link API docs

* Update website/content/api-docs/hcp-link.mdx

Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>

* Update website/content/api-docs/hcp-link.mdx

Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>

* Update website/content/api-docs/hcp-link.mdx

Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>

* Update website/content/api-docs/hcp-link.mdx

Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>

* Update website/content/api-docs/hcp-link.mdx

Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>

* Add summary sentence and move api vs config section up

* Add hcp link endpoint to API Overview page

* Update website/content/api-docs/index.mdx

Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>

* Update note about v1 API endpoint prefix

* Add a period at end of v1 prefix note.

* Add link to HCP Consul Central

---------

Co-authored-by: Melissa Kam <3768460+mkam@users.noreply.github.com>
Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
2024-01-25 10:13:46 -05:00
Chris Hut 5119667cd1
💜 Cc 7187/purple banner for linking existing clusters (#20275)
* Adding banner on services page

* Simplified version of setting/unsetting banner

* Translating the text based off of enterprise or not

* Add an integration test

* Adding an acceptance test

* Enable config dismissal as well

* Adding changelog

* Adding some copyrights to the other files

* Revert "Enable config dismissal as well"

This reverts commit e6784c4335bdff99d9183d28571aa6ab4b852cbd.

We'll be doing this in CC-7347
2024-01-23 14:29:53 -08:00
Tauhid Anjum 5d294b26d3
NET-5824 Exported services api (#20015)
* Exported services api implemented

* Tests added, refactored code

* Adding server tests

* changelog added

* Proto gen added

* Adding codegen changes

* changing url, response object

* Fixing lint error by having namespace and partition directly

* Tests changes

* refactoring tests

* Simplified uniqueness logic for exported services, sorted the response in order of service name

* Fix lint errors, refactored code
2024-01-23 10:06:59 +05:30
Lord-Y 758ddf84e9
Case sensitive route match (#19647)
Add case insensitive param on service route match

This commit adds in a new feature that allows service routers to specify that
paths and path prefixes should ignore upper / lower casing when matching URLs.

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
2024-01-22 09:23:24 -06:00
Nick Cellino 34b343a980
Unconditionally add Access-Control-Expose-Headers HTTP header (#20220)
* Unconditionally add Access-Control-Expose-Headers HTTP header

* Return nil instead of err
2024-01-22 10:18:35 -05:00
R.B. Boyer 2e08a7e1c7
v2: prevent use of the v2 experiments in secondary datacenters for now (#20299)
Ultimately we will have to rectify wan federation with v2 catalog adjacent
experiments, but for now blanket prevent usage of the resource-apis,
v2dns, and v2tenancy experiments in secondary datacenters.
2024-01-19 16:31:49 -06:00
Dhia Ayachi d641998641
Fix to not create a watch to `Internal.ServiceDump` when mesh gateway is not used (#20168)
This add a fix to properly verify the gateway mode before creating a watch specific to mesh gateways. This watch have a high performance cost and when mesh gateways are not used is not used.

This also adds an optimization to only return the nodes when watching the Internal.ServiceDump RPC to avoid unnecessary disco chain compilation. As watches in proxy config only need the nodes.
2024-01-18 16:44:53 -06:00
Michael Zalimeni 76b5de5039
[NET-4968] Upgrade Go to 1.21 (#20062)
* Upgrade Go to 1.21

* ci: detect Go backwards compatibility test version automatically

For our submodules and other places we choose to test against previous
Go versions, detect this version automatically from the current one
rather than hard-coding it.
2024-01-12 09:57:38 -05:00
loshz 7724bb88d5
[NET-6593] agent: check for minimum RSA key size (#20112)
* agent: check for minimum RSA key size

* add changelog

* agent: add test for RSA generated key sizes

* use constants in generating priv key func

* update key size error message
2024-01-10 12:15:36 +00:00
Ashesh Vidyut 69f775da9a
Fixes issue - 20109 (#20111)
* Fixes #20109

* add @hasA11yRefocus false

* add changelog

* Update ui/packages/consul-ui/app/components/hashicorp-consul/index.hbs

Co-authored-by: Tauhid Anjum <tauhidanjum@gmail.com>

---------

Co-authored-by: Tauhid Anjum <tauhidanjum@gmail.com>
2024-01-09 09:47:48 -07:00
John Murret 7a410d7c5b
NET-6945 - Replace usage of deprecated Envoy field envoy.config.core.v3.HeaderValueOption.append (#20078)
* NET-6945 - Replace usage of deprecated Envoy field envoy.config.core.v3.HeaderValueOption.append

* update proto for v2 and then update xds v2 logic

* add changelog

* Update 20078.txt to be consistent with existing changelog entries

* swap enum values tomatch envoy.
2024-01-04 00:36:25 +00:00
John Murret 55d7e95a3e
Clean up and make the changelog entries consistent for the replacement of Envoy deprecated fields. (#20079)
Clean up and make the changelog entries consistent for the replacement of Envoy deprrecated fields.
2024-01-03 13:31:56 -07:00
John Murret d925e4b812
NET-6946 / NET-6941 - Replace usage of deprecated Envoy fields envoy.config.route.v3.HeaderMatcher.safe_regex_match and envoy.type.matcher.v3.RegexMatcher.google_re2 (#20013)
* NET-6946 - Replace usage of deprecated Envoy field envoy.config.route.v3.HeaderMatcher.safe_regex_match

* removing unrelated changes

* update golden files

* do not set engine type
2024-01-03 09:53:39 -07:00
John Murret 2f335113f8
NET-6943 - Replace usage of deprecated Envoy field envoy.config.router.v3.WeightedCluster.total_weight. (#20011) 2023-12-22 19:49:44 +00:00
John Murret 90cd56c5c3
NET-4774 - replace usage of deprecated Envoy field match_subject_alt_names (#19954) 2023-12-22 18:34:44 +00:00
John Murret 21ea5c92fd
NET-6944 - Replace usage of deprecated Envoy field envoy.extensions.filters.http.lua.v3.Lua.inline_code (#20012) 2023-12-22 17:20:41 +00:00
John Murret a19df32fa5
NET-6942 - Replace usage of deprecated Envoy field envoy.config.cluster.v3.Cluster.http_protocol_options. (#20010)
* NET-6942 - Replace usage of deprecated Envoy field envoy.config.cluster.v3.Cluster.http_protocol_options.

* add changelog
2023-12-21 15:41:05 -05:00
Michael Zalimeni fe10339caa
[NET-7009] security: update x/crypto to 0.17.0 (#20023)
security: update x/crypto to 0.17.0

This addresses CVE-2023-48795 (x/crypto/ssh).
2023-12-21 20:11:19 +00:00
David Yu e7c7bc74c4
Dockerfile: bump up to `ubi-minimal:9.3` (#20014)
* Update Dockerfile
2023-12-21 11:55:20 -08:00
Nitya Dhanushkodi 9975b8bd73
[NET-5455] Allow disabling request and idle timeouts with negative values in service router and service resolver (#19992)
* add coverage for testing these timeouts
2023-12-19 15:36:07 -08:00
Derek Menteer bbdbf3e4f8
Fix bug with prepared queries using sameness-groups. (#19970)
This commit fixes an issue where the partition was not properly set
on the peering query failover target created from sameness-groups.
Before this change, it was always empty, meaning that the data
would be queried with respect to the default partition always. This
resulted in a situation where a PQ that was attempting to use a
sameness-group for failover would select peers from the default
partition, rather than the partition of the sameness-group itself.
2023-12-15 11:42:13 -06:00
John Murret 83cbe15b44
cli: Deprecate the `-admin-access-log-path` flag from `consul connect envoy` command in favor of: `-admin-access-log-config`. (#19943)
* cli: Deprecate the `-admin-access-log-path` flag from `consul connect envoy` command in favor of: `-admin-access-log-config`.

* fix changelog

* add in documentation change.
2023-12-14 20:36:47 +00:00
John Murret a995505976
NET-6317 - update usage of deprecated fields: http2_protocol_options and access_log_path (#19940)
* updating usage of http2_protocol_options and access_log_path

* add changelog

* update template for AdminAccessLogConfig

* remove mucking with AdminAccessLogConfig
2023-12-14 13:08:53 -07:00
Valeriia Ruban d7e0fca28b
fix: token list in Role details page is updated with tokens linked to… (#19912) 2023-12-12 09:36:50 -08:00
Tyler Wendlandt e8164c7c04
NET-6900: stop reconciling services when peering is enabled (#19907)
stop reconciling services when peering is enabled
2023-12-12 07:36:35 -07:00
Dhia Ayachi f2b26ac194
Hash based config entry replication (#19795)
* add a hash to config entries when normalizing

* add GetHash and implement comparing hashes

* only update if the Hash is different

* only update if the Hash is different and not 0

* fix proto to include the Hash

* fix proto gen

* buf format

* add SetHash and fix tests

* fix config load tests

* fix state test and config test

* recalculate hash when restoring config entries

* fix snapshot restore test

* add changelog

* fix missing normalize, fix proto indexes and add normalize test
2023-12-12 08:29:13 -05:00
Derek Menteer dfab5ade50
Fix ClusterLoadAssignment timeouts dropping endpoints. (#19871)
When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).

This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.

This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
2023-12-11 09:25:11 -06:00
John Murret 5ec84dbfd8
security: update supported envoy version 1.28.0 in addition to 1.25.11, 1.26.6, 1.27.2, 1.28.0 to address CVE-2023-44487 (#19879)
* update too support envoy 1.28.0

* add changelog

* update docs
2023-12-08 14:42:04 -07:00
Derek Menteer 0ac958f27b
Fix xDS missing endpoint race condition. (#19866)
This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster

Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.

The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.

An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects. Visit the following thread for some
extra context on why certainty around these race conditions is difficult:
https://github.com/envoyproxy/envoy/issues/13009

This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.

Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
2023-12-08 11:37:12 -06:00
Thomas Eckert 8125a32a4e
Add CE version of Gateway Upstream Disambiguation (#19860)
* Add CE version of gateway-upstream-disambiguation

* Use NamespaceOrDefault and PartitionOrDefault

* Add Changelog entry

* Remove the unneeded reassignment

* Use c.ID()
2023-12-07 17:56:14 -05:00
Dhia Ayachi d93f7f730d
parse config protocol on write to optimize disco-chain compilation (#19829)
* parse config protocol on write to optimize disco-chain compilation

* add changelog
2023-12-07 13:46:46 -05:00
Tauhid Anjum ab68ddff91
NET-6784: Adding cli command to list exported services to a peer (#19821)
* Adding cli command to list exported services to a peer

* Changelog added

* Addressing docs comments

* Adding test case for no exported services scenario
2023-12-07 12:55:15 +05:30
Ronald 053367a3b2
[NET-6650] Bump go version to 1.20.12 (#19840) 2023-12-06 13:22:00 -05:00
Jared Kirschner d3e658b0e7
improve client RPC metrics consistency (#19721)
The client.rpc metric now excludes internal retries for consistency
with client.rpc.exceeded and client.rpc.failed. All of these metrics
now increment at most once per RPC method call, allowing for
accurate calculation of failure / rate limit application occurrence.

Additionally, if an RPC fails because no servers are present,
client.rpc.failed is now incremented.
2023-12-06 13:21:08 -05:00
Ronald dc02fa695f
[NET-6251] Nomad client templated policy (#19827) 2023-12-06 10:32:12 -05:00
Ashesh Vidyut 6c88122fdb
NET-3860 - [Supportability] consul troubleshoot CLI for verifying ports (#18329)
* init

* udp

* added support for custom port

* removed grpc

* rename constants

* removed udp

* added change log

* fix synopsis

* pr comment chagnes

* make private

* added tests

* added one more test case

* defer close results channel

* removed unwanted comment

* licence update

* updated docs

* fix indent

* fix path

* example update

* Update website/content/commands/troubleshoot/ports.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/commands/troubleshoot/ports.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update command/troubleshoot/ports/troubleshoot_ports.go

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/commands/troubleshoot/ports.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/commands/troubleshoot/index.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update command/troubleshoot/ports/troubleshoot_ports.go

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update command/troubleshoot/ports/troubleshoot_ports.go

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/commands/troubleshoot/ports.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/commands/troubleshoot/ports.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/commands/troubleshoot/ports.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* pr comment resolved

---------

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2023-12-06 11:12:15 +05:30
lornasong edf4610ed9
[Cloud][CC-6925] Updates to pushing server state (#19682)
* Upgrade hcp-sdk-go to latest version v0.73

Changes:
- go get github.com/hashicorp/hcp-sdk-go
- go mod tidy

* From upgrade: regenerate protobufs for upgrade from 1.30 to 1.31

Ran: `make proto`

Slack: https://hashicorp.slack.com/archives/C0253EQ5B40/p1701105418579429

* From upgrade: fix mock interface implementation

After upgrading, there is the following compile error:

cannot use &mockHCPCfg{} (value of type *mockHCPCfg) as "github.com/hashicorp/hcp-sdk-go/config".HCPConfig value in return statement: *mockHCPCfg does not implement "github.com/hashicorp/hcp-sdk-go/config".HCPConfig (missing method Logout)

Solution: update the mock to have the missing Logout method

* From upgrade: Lint: remove usage of deprecated req.ServerState.TLS

Due to upgrade, linting is erroring due to usage of a newly deprecated field

22:47:56 [consul]: make lint
--> Running golangci-lint (.)
agent/hcp/testing.go:157:24: SA1019: req.ServerState.TLS is deprecated: use server_tls.internal_rpc instead. (staticcheck)
                time.Until(time.Time(req.ServerState.TLS.CertExpiry)).Hours()/24,
                                     ^

* From upgrade: adjust oidc error message

From the upgrade, this test started failing:

=== FAIL: internal/go-sso/oidcauth TestOIDC_ClaimsFromAuthCode/failed_code_exchange (re-run 2) (0.01s)
    oidc_test.go:393: unexpected error: Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"

Prior to the upgrade, the error returned was:
```
Provider login failed: Error exchanging oidc code: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"invalid_grant\",\"error_description\":\"unexpected auth code\"}\n
```

Now the error returned is as below and does not contain "cannot fetch token"
```
Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"

```

* Update AgentPushServerState structs with new fields

HCP-side changes for the new fields are in:
https://github.com/hashicorp/cloud-global-network-manager-service/pull/1195/files

* Minor refactor for hcpServerStatus to abstract tlsInfo into struct

This will make it easier to set the same tls-info information to both
 - status.TLS (deprecated field)
 - status.ServerTLSMetadata (new field to use instead)

* Update hcpServerStatus to parse out information for new fields

Changes:
 - Improve error message and handling (encountered some issues and was confused)
 - Set new field TLSInfo.CertIssuer
 - Collect certificate authority metadata and set on TLSInfo.CertificateAuthorities
 - Set TLSInfo on both server.TLS and server.ServerTLSMetadata.InternalRPC

* Update serverStatusToHCP to convert new fields to GNM rpc

* Add changelog

* Feedback: connect.ParseCert, caCerts

* Feedback: refactor and unit test server status

* Feedback: test to use expected struct

* Feedback: certificate with intermediate

* Feedback: catch no leaf, remove expectedErr

* Feedback: update todos with jira ticket

* Feedback: mock tlsConfigurator
2023-12-04 10:25:18 -05:00
Michael Zalimeni cc14ccf34a
[NET-6617] security: Bump github.com/golang-jwt/jwt/v4 to 4.5.0 (#19705)
security: Bump github.com/golang-jwt/jwt/v4 to 4.5.0

This version is accepted by Prisma/Twistlock, resolving scan results for
issue PRISMA-2022-0270. Chosen over later versions to avoid a major
version with breaking changes that is otherwise unnecessary.

Note that in practice this is a false positive (see
https://github.com/golang-jwt/jwt/issues/258), but we should update the
version to aid customers relying on scanners that flag it.
2023-11-27 11:03:26 -05:00
Ronald eded2ff347
[NET-6249] Add templated policies description (#19735) 2023-11-27 10:34:22 -05:00
Ronald c1dbf00a85
NET-6251 API gateway templated policy (#19728) 2023-11-24 17:55:05 +00:00
Ashvitha bfb3a43648
Default "stats_flush_interval" to 1 minute for Consul Telemetry Collector (#19663)
* Set default of 1m for StatsFlushInterval when the collector is setup

* Add documentation on the stats_flush_interval value

* Do not default in two conditions 1) preconfigured sinks exist 2) preconfigured flush interval exists

* Fix wording of docs

* Add changelog

* Fix docs
2023-11-20 16:18:30 -05:00
Dhia Ayachi f027d61014
fix a panic in the CLI when deleting an acl policy with an unknown name (#19679)
* fix a panic in the CLI when deleting an acl policy with an unknown name

* add changelog
2023-11-20 09:47:44 -05:00
Mike Nomitch 302f994410
[NET-6640] Adds "Policy" BindType to BindingRule (#19499)
feat: add bind type of policy

Co-authored-by: Ronald Ekambi <ronekambi@gmail.com>
2023-11-20 13:11:08 +00:00
Ronald ea0caa3e0f
[NET-6103] Enable query tokens by service name using templated policy (#19666) 2023-11-16 14:32:06 -05:00
Tyler Wendlandt 4d64ef0961
ui: move queries for selectors within the dropdowns (#19594)
Move queries for selectors within the dropdowns
2023-11-10 00:59:21 +00:00
Tyler Wendlandt 7699fb12eb
NET-5414: sameness group service show (#19586)
Fix viewing peered services on different namespaces
2023-11-09 15:25:01 -07:00
Tyler Wendlandt 1f5aa83a9e
ui: clear peer on home link (#19549)
Clear peer on home link
2023-11-07 10:27:20 -07:00
Tyler Wendlandt e5948e8eb4
CC-5545: Side Nav (#19342)
* Initial work for sidenav

* Use HDS::Text

* Add resolution for ember-element-helper

* WIP dc selector

* Update HCP Home link

* DC selector

* Hook up remaining selectors

* Fix settings and tutorial links

* Remove comments

* Remove skip-links

* Replace auth with new dropdown

* Use href-to helper for sidenav links

* Changelog

* Add description to NavSelector

* Wrap version in footer and role

* Fix login tests

* Add data-test selectors for namespaces

* Fix datacenter disclosure menu test

* Stop rendering auth dialog if acls are disabled

* Update disabled selector state and token selector

* Fix logic in ACL selector

* Fix HCP Home integration test

* Remove toggling the sidenav in tests

* Add sidenav to eng docs

* Re-add debug navigation for eng docs

* Remove ember-in-viewport

* Remove unused styles

* Upgrade @hashicorp/design-system-componentseee

* Add translations for side-nav

* Only show back to hcp link if url is present

* Disable responsive due to a11y-dialog issue
2023-11-06 08:18:48 -07:00
Derek Menteer 6baf695cd9
[NET-6459] Fix issue with wanfed lan ip conflicts. (#19503)
Fix issue with wanfed lan ip conflicts.

Prior to this commit, the connection pools were unaware which datacenter the
connection was associated with. This meant that any time servers with
overlapping LAN IP addresses and node shortnames existed, they would be
incorrectly co-located in the same pool. Whenever this occurred, the servers
would get stuck in an infinite loop of forwarding RPCs to themselves (rather
than the intended remote DC) until they eventually run out of memory.

Most notably, this issue can occur whenever wan federation through mesh
gateways is enabled.

This fix adds extra metadata to specify which DC the connection is associated
with in the pool.
2023-11-06 08:47:12 -06:00
Nitya Dhanushkodi 2bc0bc30b9
update v2 changelog (#19446) 2023-11-02 14:59:55 -07:00
Michael Zalimeni 42647de35d
[NET-6138] security: Bump `google.golang.org/grpc` to 1.56.3 (CVE-2023-44487) (#19414)
Bump google.golang.org/grpc to 1.56.3

This resolves [CVE-2023-44487](https://nvd.nist.gov/vuln/detail/CVE-2023-44487).

Co-authored-by: Chris Thain <chris.m.thain@gmail.com>
2023-10-30 08:44:22 -04:00
Ronald ea91e58045
Stop use of templated-policy and templated-policy-file simultaneously (#19389) 2023-10-26 18:15:12 +00:00
Andrew Stucki e414cbee4a
Use strict DNS for mesh gateways with hostnames (#19268)
* Use strict DNS for mesh gateways with hostnames

* Add changelog
2023-10-24 15:04:14 -04:00
Dhia Ayachi 12ef115b61
bump raft-wal version to 0.4.1 (#19314)
* bump raft-wal version to 0.4.1

* changelog

* go mod tidy integration tests

* go mod tidy test-integ
2023-10-24 10:47:46 -04:00
Derek Menteer 48c4a5b736
Add grpc keepalive configuration. (#19339)
Prior to the introduction of this configuration, grpc keepalive messages were
sent after 2 hours of inactivity on the stream. This posed issues in various
scenarios where the server-side xds connection balancing was unaware that envoy
instances were uncleanly killed / force-closed, since the connections would
only be cleaned up after ~5 minutes of TCP timeouts occurred. Setting this
config to a 30 second interval with a 20 second timeout ensures that at most,
it should take up to 50 seconds for a dead xds connection to be closed.
2023-10-24 08:05:31 -05:00
aahel 1280f45485
added ent to ce downgrade changes (#19311)
* added ent to ce downgrade changes

* added changelog

* added busl headers
2023-10-20 22:34:25 +05:30
Chris Thain b1871fd08c
Backout Envoy 1.28.0 (#19306) 2023-10-20 17:03:54 +00:00
Chris S. Kim 9d00b13140
Vault CA bugfixes (#19285)
* Re-add retry logic to Vault token renewal

* Fix goroutine leak

* Add test for detecting goroutine leak

* Add changelog

* Rename tests

* Add comment
2023-10-20 15:03:27 +00:00
Chris Thain 681aef31e9
Update supported Envoy versions (#19276) 2023-10-19 21:08:20 +00:00
Poonam Jadhav 2bd38d8806
fix: allow snake case keys for ip based rate limit config entry (#19277)
* fix: allow snake case keys for ip based rate limit config entry

* chore: add changelog
2023-10-19 10:54:00 -04:00
Michael Zalimeni 8eb074e7c1
[NET-5944] security: Update Go version to 1.20.10 and `x/net` to 0.17.0 (#19225)
* Bump golang.org/x/net to 0.17.0

This resolves [CVE-2023-39325](https://nvd.nist.gov/vuln/detail/CVE-2023-39325)
/ [CVE-2023-44487](https://nvd.nist.gov/vuln/detail/CVE-2023-44487).

* Update Go version to 1.20.10

This resolves [CVE-2023-39325](https://nvd.nist.gov/vuln/detail/CVE-2023-39325)
/ [CVE-2023-44487](https://nvd.nist.gov/vuln/detail/CVE-2023-44487)
(`net/http`).
2023-10-16 17:49:04 -04:00
Semir Patel ad177698f7
resource: enforce lowercase v2 resource names (#19218) 2023-10-16 12:55:30 -05:00
Chris Thain dcdf2fc6ba
Update Vault CA provider namespace configuration (#19095) 2023-10-10 13:53:00 +00:00
Ashesh Vidyut a30ccdf5dc
NET-4135 - Fix NodeMeta filtering Catalog List Services API (#18322)
* logs for debugging

* Init

* white spaces fix

* added change log

* Fix tests

* fix typo

* using queryoptionfilter to populate args.filter

* tests

* fix test

* fix tests

* fix tests

* fix tests

* fix tests

* fix variable name

* fix tests

* fix tests

* fix tests

* Update .changelog/18322.txt

Co-authored-by: Ganesh S <ganesh.seetharaman@hashicorp.com>

* fix change log

* address nits

* removed unused line

* doing join only when filter has nodemeta

* fix tests

* fix tests

* Update agent/consul/catalog_endpoint.go

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix tests

* removed unwanted code

---------

Co-authored-by: Ganesh S <ganesh.seetharaman@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2023-10-08 12:48:31 +00:00
Derek Menteer af3439b53d
Ensure that upstream configuration is properly normalized. (#19076)
This PR fixes an issue where upstreams did not correctly inherit the proper
namespace / partition from the parent service when attempting to fetch the
upstream protocol due to inconsistent normalization.

Some of the merge-service-configuration logic would normalize to default, while
some of the proxycfg logic would normalize to match the parent service. Due to
this mismatch in logic, an incorrect service-defaults configuration entry would
be fetched and have its protocol applied to the upstream.
2023-10-06 13:59:47 -05:00
Thomas Eckert 342306c312
Allow connections through Terminating Gateways from peered clusters NET-3463 (#18959)
* Add InboundPeerTrustBundle maps to Terminating Gateway

* Add notify and cancelation of watch for inbound peer trust bundles

* Pass peer trust bundles to the RBAC creation function

* Regenerate Golden Files

* add changelog, also adds another spot that needed peeredTrustBundles

* Add basic test for terminating gateway with peer trust bundle

* Add intention to cluster peered golden test

* rerun codegen

* update changelog

* really update the changelog

---------

Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
2023-10-05 21:54:23 +00:00