Commit Graph

1311 Commits

Author SHA1 Message Date
John Murret 11bcf521ae
dns v2 - both empty string and default should be allowed for namespace and partition in CE (#21230)
* dns v2 - both empty string and default should be allowed for namespace and partition in Ce

* add changelog

* use default partition constant

* use constants in validation.

---------

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
2024-05-28 16:20:59 -06:00
Dan Stough cf1c030043
feat: update supported envoy to 1.29 (#21142) 2024-05-24 13:26:07 -04:00
Dhia Ayachi 1f4caaedf2
upgrade deep-copy version, upgrade go to 1.22.3 (#21113)
* upgrade deep-copy version, upgrade go to 1.22.3

* add changelog
2024-05-16 13:40:15 -04:00
Michael Zalimeni f56405e745
security: Upgrade Go to 1.21.10 (#21074)
This resolves CVE-2024-24787 and CVE-2024-24788.
2024-05-09 11:11:01 -04:00
Michael Zalimeni 86b0818c1f
[NET-8601] security: upgrade vault/api to remove go-jose.v2 (#20910)
security: upgrade vault/api to remove go-jose.v2

This dependency has an open vulnerability (GO-2024-2631), and is no
longer needed by the latest `vault/api`. This is a follow-up to the
upgrade of `go-jose/v3` in this repository to make all our dependencies
consolidate on v3.

Also remove the recently added security scan triage block for
GO-2024-2631, which was added due to incorrect reports that
`go-jose/v3@3.0.3` was impacted; in reality, is was this indirect
client dependency (not impacted by CVE) that the scanner was flagging. A
bug report has been filed to address the incorrect reporting.
2024-05-04 00:18:51 +00:00
Deniz Onur Duzgun 3a6f2fba18
security: bump envoy version and k8s.io/apimachinery (#21017)
* security: bump envoy version

* add changelog
2024-05-02 13:36:02 -04:00
Dan Stough 03ab7367a6
feat(dataplane): allow token and tenancy information for proxied DNS (#20899)
* feat(dataplane): allow token and tenancy information for proxied DNS

* changelog
2024-04-22 14:30:43 -04:00
sarahalsmiller 08761f16c8
Net 6820 customize mesh gateway limits (#20945)
* add upstream limits to mesh gateway cluster generation

* changelog

* go mod tidy

* readd changelog data

* undo reversion from rebase

* run codegen

* Update .changelog/20945.txt

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* address notes

* gofmt

* clean up

* gofmt

* Update agent/proxycfg/mesh_gateway.go

* gofmt

* nil check

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
2024-04-16 10:59:41 -05:00
Nathan Coleman 5e9f02d4be
[NET-8091] Add file-system-certificate config entry for API gateway (#20873)
* Define file-system-certificate config entry

* Collect file-system-certificate(s) referenced by api-gateway onto snapshot

* Add file-system-certificate to config entry kind allow lists

* Remove inapplicable validation

This validation makes sense for inline certificates since Consul server is holding the certificate; however, for file system certificates, Consul server never actually sees the certificate.

* Support file-system-certificate as source for listener TLS certificate

* Add more required mappings for the new config entry type

* Construct proper TLS context based on certificate kind

* Add support or SDS in xdscommon

* Remove unused param

* Adds back verification of certs for inline-certificates

* Undo tangential changes to TLS config consumption

* Remove stray curly braces

* Undo some more tangential changes

* Improve function name for generating API gateway secrets

* Add changelog entry

* Update .changelog/20873.txt

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Add some nil-checking, remove outdated TODO

* Update test assertions to include file-system-certificate

* Add documentation for file-system-certificate config entry

Add new doc to nav

* Fix grammar mistake

* Rename watchmaps, remove outdated TODO

---------

Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
2024-04-15 16:45:05 -04:00
Michael Zalimeni a8d08e759f
fix: consume ignored entries in CE downgrade via Ent snapshot (#20977)
This operation would previously fail due to unconsumed bytes in the
decoder buffer when reading the Ent snapshot (the first byte of the
record would be misinterpreted as a type indicator, and the remaining
bytes would fail to be deserialized or read as invalid data).

Ensure restore succeeds by decoding the ignored record as an
interface{}, which will consume the record bytes without requiring a
concrete target struct, then moving on to the next record.
2024-04-11 21:08:44 +00:00
Eric Haberkorn e231f0ee9b
Add an agent config option to diable per tenancy usage metrics. (#20976) 2024-04-11 15:20:09 -04:00
John Murret d261a987f1
update go-control-plane envoy dependency to 0.12.0 (#20973)
* update go-control-plane envoy dependency to 0.12.0

* add changelog

* go mod tidy

* fix linting issues

* add agent/grpc-internal to the list of SA1019 ignores
2024-04-10 01:23:04 +00:00
Deniz Onur Duzgun 3152ac3702
security: bump go, x/net and envoy versions (#20956)
* Bump go version

* Bump x/net

* Bump envoy version

* Add changelog

---------

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
2024-04-08 19:18:40 +00:00
Nathan Coleman 9af713ff17
[NET-5772] Make tcp external service registered on terminating gw reachable from peered cluster (#19881)
* Include SNI + root PEMs from peered cluster on terminating gw filter chain

This allows an external service registered on a terminating gateway to be exported to and reachable from a peered cluster

* Abstract existing logic into re-usable function

* Regenerate golden files w/ new listener logic

* Add changelog entry

* Use peering bundles that are stable across test runs
2024-04-03 12:38:09 -04:00
John Murret 39112c7a98
GH-20889 - put conditionals are hcp initialization for consul server (#20926)
* put conditionals are hcp initialization for consul server

* put more things behind configuration flags

* add changelog

* TestServer_hcpManager

* fix TestAgent_scadaProvider
2024-03-28 14:47:11 -06:00
David Yu 4259b7b33c
Update Dockerfile: bump alpine (#20897)
* Update Dockerfile: bump alpine

* Create 20897

* Rename 20897 to 20897.txt
2024-03-27 14:43:14 -07:00
Dan Stough 6026ada0c9
[CE] feat(v2dns): enable v2 dns as default (#20715)
* feat(v2dns): enable v2 dns as default

* changelog
2024-03-25 16:09:01 -04:00
Iryna Shustava d747b51dab
Handle ACL errors consistently when blocking query timeout is reached. (#20876)
Currently, when a client starts a blocking query and an ACL token expires within
that time, Consul will return ACL not found error with a 403 status code. However,
sometimes if an ACL token is invalidated at the same time as the query's deadline is reached,
Consul will instead return an empty response with a 200 status code.

This is because of the events being executed.
1. Client issues a blocking query request with timeout `t`.
2. ACL is deleted.
3. Server detects a change in ACLs and force closes the gRPC stream.
4. Client resubscribes with the same token and resets its state (view).
5. Client sees "ACL not found" error.

If ACL is deleted before step 4, the client is unaware that the stream was closed due to
an ACL error and will return an empty view (from the reset state) with the 200 status code.

To fix this problem, we introduce another state to the subsciption to indicate when a change
to ACLs has occured. If the server sees that there was an error due to ACL change, it will
re-authenticate the request and return an error if the token is no longer valid.

Fixes #20790
2024-03-22 14:59:54 -06:00
Derek Menteer ac83ac1343
Fix streaming RPCs for agentless. (#20868)
* Fix streaming RPCs for agentless.

This PR fixes an issue where cross-dc RPCs were unable to utilize
the streaming backend due to having the node name set. The result
of this was the agent-cache being utilized, which would cause high
cpu utilization and memory consumption due to the fact that it
keeps queries alive for 72 hours before purging inactive entries.

This resource consumption is compounded by the fact that each pod
in consul-k8s gets a unique token. Since the agent-cache uses the
token as a component of the key, the same query is duplicated for
each pod that is deployed.

* Add changelog.
2024-03-15 14:44:51 -05:00
Derek Menteer 0ac8ae6c3b
Fix xDS deadlock due to syncLoop termination. (#20867)
* Fix xDS deadlock due to syncLoop termination.

This fixes an issue where agentless xDS streams can deadlock permanently until
a server is restarted. When this issue occurs, no new proxies are able to
successfully connect to the server.

Effectively, the trigger for this deadlock stems from the following return
statement:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202

When this happens, the entire `syncLoop()` terminates and stops consuming from
the following channel:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192

Which results in the `ConfigSource.cleanup()` function never receiving a
response and holding a mutex indefinitely:
https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247

Because this mutex is shared, it effectively deadlocks the server's ability to
process new xDS streams.

----

The fix to this issue involves removing the `chan chan struct{}` used like an
RPC-over-channels pattern and replacing it with two distinct channels:

+ `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon.  +
`syncLoopDoneCh` - indicates that the `syncLoop()` has terminated.

Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the
`syncLoop()` function ensures that the deadlock above should no longer occur.

We also now evict xDS connections of all proxies for the corresponding
`syncLoop()` whenever it encounters an irrecoverable error. This is done by
hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta
processing. Prior to this fix, the behavior was to simply orphan them so they
would never receive catalog-registration or service-defaults updates.

* Add changelog.
2024-03-15 13:57:11 -05:00
Derek Menteer eabff257d7
Various bug-fixes and improvements (#20866)
* Shuffle the list of servers returned by `pbserverdiscovery.WatchServers`.

This randomizes the list of servers to help reduce the chance of clients
all connecting to the same server simultaneously. Consul-dataplane is one
such client that does not randomize its own list of servers.

* Fix potential goroutine leak in xDS recv loop.

This commit ensures that the goroutine which receives xDS messages from
proxies will not block forever if the stream's context is cancelled but
the `processDelta()` function never consumes the message (due to being
terminated).

* Add changelog.
2024-03-15 13:10:48 -05:00
Matt Keeler 8fcafb139c
Add `consul snapshot decode` command (#20824)
Add snapshot decoding command
2024-03-14 12:59:06 -04:00
Deniz Onur Duzgun e9029ccd7a
[NET-8368] security: bump Go version to 1.21.8 (#20812)
* [NET-8368] Bump Go version
2024-03-14 09:46:15 -04:00
Chris Hut bfbc0ee4fd
Revert link existing but better 🪦 (#20830)
* Revert "feat: add alert to link to hcp modal to ask a user refresh a page; up… (#20682)"

This reverts commit dd833d9a36.

* Revert "chor: change cluster name param to have datacenter.name as default value (#20644)"

This reverts commit 8425cd0f90.

* Revert "chor: adds informative error message when acls disabled and read-only… (#20600)"

This reverts commit 9d712ccfc7.

* Revert "Cc 7147 link to hcp modal (#20474)"

This reverts commit 8c05e57ac1.

* Revert "Add nav bar item to show HCP link status and encourage folks to link (#20370)"

This reverts commit 22e6ce0df1.

* Revert "Cc 7145 hcp link status api (#20330)"

This reverts commit 049ca102c4.

* Revert "💜 Cc 7187/purple banner for linking existing clusters (#20275)"

This reverts commit 5119667cd1.
2024-03-13 13:59:00 -07:00
sarahalsmiller 262f435800
NET-6821 Disable Terminating Gateway Auto Host Header Rewrite (#20802)
* disable terminating gateway auto host rewrite

* add changelog

* clean up unneeded additional snapshot fields

* add new field to docs

* squash

* fix test
2024-03-12 15:37:20 -05:00
Michael Zalimeni d4761c0ccd
security: upgrade google.golang.org/protobuf to 1.33.0 (#20801)
Resolves CVE-2024-24786.
2024-03-06 23:04:42 +00:00
Matt Keeler abe14f11e6
Remove redundant usage metrics (#20674)
* Remove redundant usage metrics

* Add the changelog

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2024-03-05 14:09:47 -05:00
Matt Keeler 5c936fba33
Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage (#20672)
* Enable callers to control whether per-tenant usage metrics are included in calls to store.ServiceUsage

* Add changelog
2024-03-01 13:44:55 -05:00
Chris Hut a58f346f55
Hardcode links to CCM to be false (#20732)
Hardcode links to CCM to be false - due to CCM deprecation

Remove changelog item for breaking Ui change
 - since CCM linking no longer exists
2024-02-26 12:34:01 -08:00
sarahalsmiller 670ee90a77
Use correct enterprise meta on wildcard service update (#20721)
* use correct enterprise meta on wildcard service update

* changelog

* rename changelog file
2024-02-26 12:03:08 -06:00
Matt Keeler 16a0800777
Update API and API Docs regarding disabling gossip for a partition. (#20669) 2024-02-26 12:14:39 -05:00
John Murret 26eed12f04
NET-7813 - DNS : SERVFAIL when resolving PTR records (#20679)
* NET-7813 - DNS : SERVFAIL when resolving PTR records

* Update agent/dns.go

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>

* PR feedback

---------

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
2024-02-21 17:44:04 +00:00
Dan Stough 14efb28086
fix(v2dns): add node ttl to workloads, comment cleanup, and changelog (#20643)
* fix(v2dns): add node ttl to workloads, plus comment cleanup

* docs(v2dns): changelog
2024-02-14 17:38:11 -05:00
Derek Menteer 9f7626d501
Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions (#20642)
Ensure all topics are refreshed on FSM restore and add supervisor loop to v1 controller subscriptions

This PR fixes two issues:

1. Not all streams were force closed whenever a snapshot restore happened. This means that anything consuming data from the stream (controllers, queries, etc) were unaware that the data they have is potentially stale / invalid. This first part ensures that all topics are purged.

2. The v1 controllers did not properly handle stream errors (which are likely to appear much more often due to 1 above) and so it introduces a supervisor thread to restart the watches when these errors occur.
2024-02-14 14:17:55 -06:00
Michael Zalimeni 5862c52642
[NET-7948] Bump Envoy version to address multiple CVEs (#20589)
security: Bump Envoy versions to address CVEs
2024-02-12 22:29:50 +00:00
Valeriia Ruban 8c05e57ac1
Cc 7147 link to hcp modal (#20474)
* add link hcp modal component

* integrate modal with SideNav and link to hcp banner
---------

Co-authored-by: Chris Hut <tophernuts@gmail.com>
2024-02-09 18:23:13 +00:00
skpratt 738dc8c89d
use go 1.21.7 (#20545)
* 1.21.7

* changelog
2024-02-08 23:39:11 +00:00
Derek Menteer a1c8d4dd19
Decouple xds capacity controller and raft-autopilot (#20511)
Decouple xds capacity controller and autopilot

This prevents a potential bug where autopilot deadlocks while attempting
to execute `AutopilotDelegate.NotifyState()` on an xdscapacity controller
that stopped consuming messages.
2024-02-08 15:31:44 -06:00
Chris S. Kim 26661a1c3b
Add default intention policy (#20544) 2024-02-08 20:25:42 +00:00
Joshua Timmons 242b777547
Fix logging when we fail to export metrics to hcp (#20514) 2024-02-08 11:00:47 -05:00
Ashesh Vidyut cffb5d7c6e
Fix audit-log encoding issue (CC-7337) (#20345)
* add changes

* added changelog

* change update

* CE chnages

* Removed gzip size fix

* fix changelog

* Update .changelog/20345.txt

Co-authored-by: Hans Hasselberg <hans@hashicorp.com>

* Adding comments

---------

Co-authored-by: Abhishek Sahu <abhishek.sahu@hashicorp.com>
Co-authored-by: Hans Hasselberg <hans@hashicorp.com>
Co-authored-by: srahul3 <rahulsharma@hashicorp.com>
2024-02-06 16:40:07 +05:30
Tauhid Anjum 0c509a60a4
Exported services CLI and docs (#20331)
* Exported services CLI and docs

* Changelog added

* Added format option for pretty print

* Update command/exportedservices/exported_services.go

Co-authored-by: Ashesh Vidyut <134911583+absolutelightning@users.noreply.github.com>

* Addressing PR comments, moving the command under services category

* Add consumer peer and partition filter

* Adding bexpr filter, change format of data

---------

Co-authored-by: Ashesh Vidyut <134911583+absolutelightning@users.noreply.github.com>
2024-02-06 09:01:20 +05:30
Derek Menteer 922844b8e0
Fix issue with persisting proxy-defaults (#20481)
Fix issue with persisting proxy-defaults

This resolves an issue introduced in hashicorp/consul#19829
where the proxy-defaults configuration entry with an HTTP protocol
cannot be updated after it has been persisted once and a router
exists. This occurs because the protocol field is not properly
pre-computed before being passed into validation functions.
2024-02-05 16:00:19 -06:00
natemollica-dev 2b07b326c4
Resolve Consul DNS in OpenShift (#20439) 2024-02-01 14:00:27 -08:00
Chris S. Kim b6f10bc58f
Skip filter chain created by permissive mtls (#20406) 2024-01-31 16:39:12 -05:00
Derek Menteer 3e8ec8d18e
Fix SAN matching on terminating gateways (#20417)
Fixes issue: hashicorp/consul#20360

A regression was introduced in hashicorp/consul#19954 where the SAN validation
matching was reduced from 4 potential types down to just the URI.

Terminating gateways will need to match on many fields depending on user
configuration, since they make egress calls outside of the cluster. Having more
than one matcher behaves like an OR operation, where any match is sufficient to
pass the certificate validation. To maintain backwards compatibility with the
old untyped `match_subject_alt_names` Envoy behavior, we should match on all 4
enum types.

https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#enum-extensions-transport-sockets-tls-v3-subjectaltnamematcher-santype
2024-01-31 12:17:45 -06:00
Melissa Kam 3b9bb8d6f9
[CC-7044] Start HCP manager as part of link creation (#20312)
* Check for ACL write permissions on write

Link eventually will be creating a token, so require acl:write.

* Convert Run to Start, only allow to start once

* Always initialize HCP components at startup

* Support for updating config and client

* Pass HCP manager to controller

* Start HCP manager in link resource

Start as part of link creation rather than always starting. Update
the HCP manager with values from the link before starting as well.

* Fix metrics sink leaked goroutine

* Remove the hardcoded disabled hostname prefix

The HCP metrics sink will always be enabled, so the length of sinks will
always be greater than zero. This also means that we will also always
default to prefixing metrics with the hostname, which is what our
documentation states is the expected behavior anyway.

* Add changelog

* Check and set running status in one method

* Check for primary datacenter, add back test

* Clarify merge reasoning, fix timing issue in test

* Add comment about controller placement

* Expand on breaking change, fix typo in changelog
2024-01-29 16:31:44 -06:00
Tyler Wendlandt b9f3e5e247
NET-5398: V2 unavailable UI message (#20359)
* Update ui server to include V2 Catalog flag

* Fix typo

* Add route and redirects for the unavailable warning

* Add qualtrics link

* Remove unneccessary check and redirect
2024-01-29 14:28:41 -07:00
Tyler Wendlandt 7e08d8988c
NET-5398: Update UI server to include if v2 is enabled (#20353)
* Update ui server to include V2 Catalog flag

* Fix typo
2024-01-26 14:38:51 -07:00
Luke Kysow 840f11a0c5
Change logging of registered v2 resource endpoints to add /api prefix (#20352)
* Change logging of registered v2 resource endpoints to add /api prefix

Previous:

    agent.http: Registered resource endpoint: endpoint=/demo/v1/executive

New:

    agent.http: Registered resource endpoint: endpoint=/api/demo/v1/executive

This reduces confusion when attempting to call the APIs after looking at
the logs.
2024-01-25 14:18:54 -08:00