Commit Graph

4961 Commits

Author SHA1 Message Date
Dhia Ayachi 225ae55e83
Leadership transfer cmd (#14132)
* add leadership transfer command

* add RPC call test (flaky)

* add missing import

* add changelog

* add command registration

* Apply suggestions from code review

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* add the possibility of providing an id to raft leadership transfer. Add few tests.

* delete old file from cherry pick

* rename changelog filename to PR #

* rename changelog and fix import

* fix failing test

* check for OperatorWrite

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* rename from leader-transfer to transfer-leader

* remove version check and add test for operator read

* move struct to operator.go

* first pass

* add code for leader transfer in the grpc backend and tests

* wire the http endpoint to the new grpc endpoint

* remove the RPC endpoint

* remove non needed struct

* fix naming

* add mog glue to API

* fix comment

* remove dead code

* fix linter error

* change package name for proto file

* remove error wrapping

* fix failing test

* add command registration

* add grpc service mock tests

* fix receiver to be pointer

* use defined values

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* reuse MockAclAuthorizer

* add documentation

* remove usage of external.TokenFromContext

* fix failing tests

* fix proto generation

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

* add more context in doc for the reason

* Apply suggestions from docs code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* regenerate proto

* fix linter errors

Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2022-11-14 15:35:12 -05:00
Freddy 706866fa00
Ensure that NodeDump imported nodes are filtered (#15356) 2022-11-14 12:35:20 -07:00
Freddy c58f86a00f
Fixup authz for data imported from peers (#15347)
There are a few changes that needed to be made to to handle authorizing
reads for imported data:

- If the data was imported from a peer we should not attempt to read the
  data using the traditional authz rules. This is because the name of
  services/nodes in a peer cluster are not equivalent to those of the
  importing cluster.

- If the data was imported from a peer we need to check whether the
  token corresponds to a service, meaning that it has service:write
  permissions, or to a local read only token that can read all
  nodes/services in a namespace.

This required changes at the policyAuthorizer level, since that is the
only view available to OSS Consul, and at the enterprise
partition/namespace level.
2022-11-14 11:36:27 -07:00
Kyle Havlovitz dde5c524ad
connect: strip port from DNS SANs for ingress gateway leaf cert (#15320)
* connect: strip port from DNS SANs for ingress gateway leaf cert

* connect: format DNS SANs in CreateCSR

* connect: Test wildcard case when formatting SANs
2022-11-14 10:27:03 -08:00
Derek Menteer 931cec42b3
Prevent serving TLS via ports.grpc (#15339)
Prevent serving TLS via ports.grpc

We remove the ability to run the ports.grpc in TLS mode to avoid
confusion and to simplify configuration. This breaking change
ensures that any user currently using ports.grpc in an encrypted
mode will receive an error message indicating that ports.grpc_tls
must be explicitly used.

The suggested action for these users is to simply swap their ports.grpc
to ports.grpc_tls in the configuration file. If both ports are defined,
or if the user has not configured TLS for grpc, then the error message
will not be printed.
2022-11-11 14:29:22 -06:00
Dan Stough 626249fbf5
[OSS] fix: wait and try longer to peer through mesh gw (#15328) 2022-11-10 13:54:00 -05:00
Kyle Schochenmaier bf0f61a878
removes ioutil usage everywhere which was deprecated in go1.16 (#15297)
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-11-10 10:26:01 -06:00
malizz b51f0e25e9
update ACLs for cluster peering (#15317)
* update ACLs for cluster peering

* add changelog

* Update .changelog/15317.txt

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
2022-11-09 13:02:58 -08:00
malizz b9a9e1219c
update config defaults, add docs (#15302)
* update config defaults, add docs

* update grpc tls port for non-default values

* add changelog

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>

* Update website/content/docs/agent/config/config-files.mdx

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>

* update logic for setting grpc tls port value

* move default config to default.go, update changelog

* update docs

* Fix config tests.

* Fix linter error.

* Fix ConnectCA tests.

* Cleanup markdown on upgrade notes.

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
2022-11-09 09:29:55 -08:00
Eric Haberkorn c340922991
Log Warnings When Peering With Mesh Gateway Mode None (#15304)
warn when mesh gateway mode is set to none for peering
2022-11-09 11:48:58 -05:00
Derek Menteer 418bd62c44
Fix mesh gateway configuration with proxy-defaults (#15186)
* Fix mesh gateway proxy-defaults not affecting upstreams.

* Clarify distinction with upstream settings

Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.

* Fix mgw mode usage for peered upstreams

There were a couple issues with how mgw mode was being handled for
peered upstreams.

For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.

Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.

This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.

Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.

* Fix upstream mesh gateway mode handling in xds

This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.

In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.

* Merge service/proxy defaults in cfg resolver

Previously the mesh gateway mode for connect proxies would be
merged at three points:

1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.

The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.

The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.

The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.

This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.

* Ensure that proxy-defaults is considered in wc

Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.

* Add changelog.

Co-authored-by: freddygv <freddy@hashicorp.com>
2022-11-09 10:14:29 -06:00
Dan Upton 7b2d08d461
chore: remove unused argument from MergeNodeServiceWithCentralConfig (#15024)
Previously, the MergeNodeServiceWithCentralConfig method accepted a
ServiceSpecificRequest argument, of which only the Datacenter and
QueryOptions fields were used.

Digging a little deeper, it turns out these fields were only passed
down to the ComputeResolvedServiceConfig method (through the
ServiceConfigRequest struct) which didn't actually use them.

As such, not all call-sites passed a valid ServiceSpecificRequest
so it's safer to remove the argument altogether to prevent future
changes from depending on it.
2022-11-09 14:54:57 +00:00
Derek Menteer b64972d486
Bring back parameter ServerExternalAddresses in GenerateToken endpoint (#15267)
Re-add ServerExternalAddresses parameter in GenerateToken endpoint

This reverts commit 5e156772f6
and adds extra functionality to support newer peering behaviors.
2022-11-08 14:55:18 -06:00
cskh a3f57cc5e8
fix(mesh-gateway): remove deregistered service from mesh gateway (#15272)
* fix(mesh-gateway): remove deregistered service from mesh gateway

* changelog

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Co-authored-by: Evan Culver <eculver@users.noreply.github.com>
2022-11-07 20:30:15 -05:00
Freddy 7f5f7e9cf9
Avoid blocking child type updates on parent ack (#15083) 2022-11-07 18:10:42 -07:00
Derek Menteer c064ddf606
Backport test fix from ent. (#15279) 2022-11-07 12:17:46 -06:00
Chris S. Kim 985a4ee1b1
Update hcp-scada-provider to fix diamond dependency problem with go-msgpack (#15185) 2022-11-07 11:34:30 -05:00
Eric Haberkorn 1804b58799
Fix a bug in mesh gateway proxycfg where ACL tokens aren't passed. (#15273) 2022-11-07 10:00:11 -05:00
Dan Stough 553312ef61
fix: persist peering CA updates to dialing clusters (#15243)
fix: persist peering CA updates to dialing clusters
2022-11-04 12:53:20 -04:00
Derek Menteer 18d6c338f4
Backport tests from ent. (#15260)
* Backport agent tests.

Original commit: 0710b2d12fb51a29cedd1119b5fb086e5c71f632
Original commit: aaedb3c28bfe247266f21013d500147d8decb7cd (partial)

* Backport test fix and reduce flaky failures.
2022-11-04 10:19:24 -05:00
Derek Menteer 0834fe349b
Backport test from ENT: "Fix missing test fields" (#15258)
* Backport test from ENT: "Fix missing test fields"

Original Author: Sarah Pratt
Original Commit: a5c88bef7a969ea5d06ed898d142ab081ba65c69

* Update with proper linting.
2022-11-04 09:29:16 -05:00
Derek Menteer f4cb2f82bf
Backport various fixes from ENT. (#15254)
* Regenerate golden files.

* Backport from ENT: "Avoid race"

Original commit: 5006c8c858b0e332be95271ef9ba35122453315b
Original author: freddygv

* Backport from ENT: "chore: fix flake peerstream test"

Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8
Original author: DanStough
2022-11-03 16:34:57 -05:00
malizz 617a5f2dc2
convert stream status time fields to pointers (#15252) 2022-11-03 11:51:22 -07:00
sarahalsmiller 436160e155
Added check for empty peeringsni in restrictPeeringEndpoints (#15239)
Add check for empty peeringSNI in restrictPeeringEndpoints

Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
2022-11-02 17:20:52 -05:00
Derek Menteer bd1019fadb
Prevent peering acceptor from subscribing to addr updates. (#15214) 2022-11-02 07:55:41 -05:00
Dan Stough 05e93f7569
test: refactor testcontainers and add peering integ tests (#15084) 2022-11-01 15:03:23 -04:00
Derek Menteer fa5d87c116 Decrease retry time for failed peering connections. 2022-10-31 14:30:27 -05:00
R.B. Boyer 97b9fcbf48
test: fix flaky TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages test (#15195)
Allow for some message duplication in subscription events during assertions.

I'm pretty sure the subscriptions machinery allows for messages to occasionally
be duplicated instead of dropping them, as a once-and-only-once queue is a pipe
dream and you have to pick one of the other two options.
2022-10-31 12:10:43 -05:00
Evan Culver 62d4517f9e
connect: Add Envoy 1.24 to integration tests, remove Envoy 1.20 (#15093) 2022-10-31 10:50:45 -05:00
Derek Menteer 693c8a4706 Allow peering endpoints to bypass verify_incoming. 2022-10-31 09:56:30 -05:00
Derek Menteer 2d4b62be3c Add tests. 2022-10-31 08:45:00 -05:00
Derek Menteer 1483c94531 Fix peered service protocols using proxy-defaults. 2022-10-31 08:45:00 -05:00
Eric Haberkorn cf50bdbe20
Fix peering metrics bug (#15178)
This bug was caused by the peering health metric being set to NaN.
2022-10-28 10:51:12 -04:00
Chris S. Kim 0e176dd6aa
Allow consul debug on non-ACL consul servers (#15155) 2022-10-27 09:25:18 -04:00
cskh a9427e1310
fix(peering): nil pointer in calling handleUpdateService (#15160)
* fix(peering): nil pointer in calling handleUpdateService

* changelog
2022-10-26 11:50:34 -04:00
Eric Haberkorn 1bdad89026
fix bug that resulted in generating Envoy configs that use CDS with an EDS configuration (#15140) 2022-10-25 14:49:57 -04:00
Luke Kysow d3aa2bd9c5
ingress-gateways: don't log error when registering gateway (#15001)
* ingress-gateways: don't log error when registering gateway

Previously, when an ingress gateway was registered without a
corresponding ingress gateway config entry, an error was logged
because the watch on the config entry returned a nil result.
This is expected so don't log an error.
2022-10-25 10:55:44 -07:00
Luke Kysow 9999672fd7
autoencrypt: helpful error for clients with wrong dc (#14832)
* autoencrypt: helpful error for clients with wrong dc

If clients have set a different datacenter than the servers they're
connecting with for autoencrypt, give a helpful error message.
2022-10-25 10:13:41 -07:00
R.B. Boyer 3c44116a8f
cache: refactor agent cache fetching to prevent unnecessary fetches on error (#14956)
This continues the work done in #14908 where a crude solution to prevent a
goroutine leak was implemented. The former code would launch a perpetual
goroutine family every iteration (+1 +1) and the fixed code simply caused a
new goroutine family to first cancel the prior one to prevent the
leak (-1 +1 == 0).

This PR refactors this code completely to:

- make it more understandable
- remove the recursion-via-goroutine strangeness
- prevent unnecessary RPC fetches when the prior one has errored.

The core issue arose from a conflation of the entry.Fetching field to mean:

- there is an RPC (blocking query) in flight right now
- there is a goroutine running to manage the RPC fetch retry loop

The problem is that the goroutine-leak-avoidance check would treat
Fetching like (2), but within the body of a goroutine it would flip that
boolean back to false before the retry sleep. This would cause a new
chain of goroutines to launch which #14908 would correct crudely.

The refactored code uses a plain for-loop and changes the semantics
to track state for "is there a goroutine associated with this cache entry"
instead of the former.

We use a uint64 unique identity per goroutine instead of a boolean so
that any orphaned goroutines can tell when they've been replaced when
the expiry loop deletes a cache entry while the goroutine is still running
and is later replaced.
2022-10-25 10:27:26 -05:00
R.B. Boyer da70daba43
test: ensure that all dependencies in a test agent use the test logger (#14996) 2022-10-24 17:02:38 -05:00
Chris S. Kim 9f0ed81cfd Remove invalid 1xx HTTP codes
These tests started failing in go1.19, presumably due to
support for valid 1xx responses being added.

https://github.com/golang/go/issues/56346
2022-10-24 16:12:08 -04:00
Chris S. Kim bde57c0dd0 Regenerate files according to 1.19.2 formatter 2022-10-24 16:12:08 -04:00
cskh db82ffe503
fix(peering): replicating wan address (#15108)
* fix(peering): replicating wan address

* add changelog

* unit test
2022-10-24 15:44:57 -04:00
Iryna Shustava 176abb5ff2
proxycfg: watch service-defaults config entries (#15025)
To support Destinations on the service-defaults (for tproxy with terminating gateway), we need to now also make servers watch service-defaults config entries.
2022-10-24 12:50:28 -06:00
Chris S. Kim b236e86030 Move oss-only test to its own file 2022-10-24 14:17:43 -04:00
R.B. Boyer d04cf25fa8
test: fix flaky TestHealthServiceNodes_NodeMetaFilter by waiting until the streaming subsystem has a valid grpc connection (#15019)
Also potentially unflakes TestHealthIngressServiceNodes for similar
reasons.
2022-10-24 13:09:53 -05:00
R.B. Boyer 300860412c
chore: update golangci-lint to v1.50.1 (#15022) 2022-10-24 11:48:02 -05:00
Venu Yanamandra efc813e92d
Update error message when restoring ENT snapshot in OSS (#15066) 2022-10-24 11:40:26 -04:00
freddygv d65e60de86 Return forbidden on permission denied
This commit updates the establish endpoint to bubble up a 403 status
code to callers when the establishment secret from the token is invalid.
This is a signal that a new peering token must be generated.
2022-10-20 17:11:49 -06:00
Chris S. Kim a7ea26192b Update expected encoding in test
go-memdb was updated in v1.3.3 to make integers in indexes sortable, which changed how integers were encoded.
2022-10-20 14:32:42 -04:00
freddygv 6d9be5fb15 Use plain TaggedAddressWAN 2022-10-19 16:32:44 -06:00
freddygv 8d211cc9cc Add unit test 2022-10-19 16:26:15 -06:00
cskh 058ee4fb84 fix: wan address isn't used by peering token 2022-10-19 16:33:25 -04:00
Nitya Dhanushkodi 5e156772f6
Remove ability to specify external addresses in GenerateToken endpoint (#14930)
* Reverts "update generate token endpoint to take external addresses (#13844)"

This reverts commit f47319b7c6.
2022-10-19 09:31:36 -07:00
Kyle Havlovitz 5c3427608b
Merge pull request #15035 from hashicorp/vault-ttl-update-warn
Warn instead of returning error when missing intermediate mount tune permissions
2022-10-18 15:41:52 -07:00
cskh d562d363fc
peering: skip registering duplicate node and check from the peer (#14994)
* peering: skip register duplicate node and check from the peer

* Prebuilt the nodes map and checks map to avoid repeated for loop

* use key type to struct: node id, service id, and check id
2022-10-18 16:19:24 -04:00
Chris S. Kim 29a297d3e9
Refactor client RPC timeouts (#14965)
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.

Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
2022-10-18 15:05:09 -04:00
Kyle Havlovitz d122108992 Warn instead of returning an error when intermediate mount tune permission is missing 2022-10-18 12:01:25 -07:00
R.B. Boyer 0cca4c088d
test: possibly fix flake in TestIntentionGetExact (#15021)
Restructure test setup to be similar to TestAgent_ServerCertificate
and see if that's enough to avoid flaking after join.
2022-10-18 10:51:20 -05:00
R.B. Boyer fe2d41ddad
cache: prevent goroutine leak in agent cache (#14908)
There is a bug in the error handling code for the Agent cache subsystem discovered:

1. NotifyCallback calls notifyBlockingQuery which calls getWithIndex in
   a loop (which backs off on-error up to 1 minute)

2. getWithIndex calls fetch if there’s no valid entry in the cache

3. fetch starts a goroutine which calls Fetch on the cache-type, waits
   for a while (again with backoff up to 1 minute for errors) and then
   calls fetch to trigger a refresh

The end result being that every 1 minute notifyBlockingQuery spawns an
ancestry of goroutines that essentially lives forever.

This PR ensures that the goroutine started by `fetch` cancels any prior
goroutine spawned by the same line for the same key.

In isolated testing where a cache type was tweaked to indefinitely
error, this patch prevented goroutine counts from skyrocketing.
2022-10-17 14:38:10 -05:00
R.B. Boyer 02a858efa0
ca: fix a masked bug in leaf cert generation that would not be notified of root cert rotation after the first one (#15005)
In practice this was masked by #14956 and was only uncovered fixing the
other bug.

  go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal

would fail when only #14956 was fixed.
2022-10-17 13:24:27 -05:00
Chris S. Kim 3d2dffff16
Merge pull request #13388 from deblasis/feature/health-checks_windows_service
Feature: Health checks windows service
2022-10-17 09:26:19 -04:00
Dan Upton f8b4b41205
proxycfg: fix goroutine leak when service is re-registered (#14988)
Fixes a bug where we'd leak a goroutine in state.run when the given
context was canceled while there was a pending update.
2022-10-17 11:31:10 +01:00
Kyle Havlovitz aaf892a383 Extend tcp keepalive settings to work for terminating gateways as well 2022-10-14 17:05:46 -07:00
Kyle Havlovitz 2c569f6b9c Update docs and add tcp_keepalive_probes setting 2022-10-14 17:05:46 -07:00
Kyle Havlovitz 2242d1ec4a Add TCP keepalive settings to proxy config for mesh gateways 2022-10-14 17:05:46 -07:00
Derek Menteer 2a33d0ff96 Fix issue with incorrect method signature on test. 2022-10-14 11:04:57 -05:00
Freddy 24d0c8801a
Merge pull request #14981 from hashicorp/peering/dial-through-gateways 2022-10-14 09:44:56 -06:00
Dan Upton 328e3ff563
proxycfg: rate-limit delivery of config snapshots (#14960)
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.

This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
2022-10-14 15:52:00 +01:00
Derek Menteer 29ebcf5ff0 Add tests for peering state snapshots / restores. 2022-10-14 09:48:04 -05:00
Derek Menteer e3ff9912d0 Add test for ExportedServicesForAllPeersByName 2022-10-14 09:48:04 -05:00
Dan Upton e6b55d1d81
perf: remove expensive reflection from xDS hot path (#14934)
Replaces the reflection-based implementation of proxycfg's
ConfigSnapshot.Clone with code generated by deep-copy.

While load testing server-based xDS (for consul-dataplane) we discovered
this method is extremely expensive. The ConfigSnapshot struct, directly
or indirectly, contains a copy of many of the structs in the agent/structs
package, which creates a large graph for copystructure.Copy to traverse
at runtime, on every proxy reconfiguration.
2022-10-14 10:26:42 +01:00
freddygv c77123a2aa Use split var in tests 2022-10-13 17:12:47 -06:00
freddygv bf51021c07 Use split wildcard partition name
This way OSS avoids passing a non-empty label, which will be rejected in
OSS consul.
2022-10-13 16:55:28 -06:00
Freddy ee4cdc4985
Merge pull request #14935 from hashicorp/fix/alias-leak 2022-10-13 16:31:15 -06:00
freddygv 573aa408a1 Lint 2022-10-13 15:55:55 -06:00
Derek Menteer 0f424e3cdf Reset wait on ensureServerAddrSubscription 2022-10-13 15:58:26 -05:00
freddygv 96fdd3728a Fix CA init error code 2022-10-13 14:58:11 -06:00
freddygv 2c99a21596 Update leader routine to maybe use gateways 2022-10-13 14:58:00 -06:00
freddygv e69bc727ec Update peering establishment to maybe use gateways
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.

Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.

This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.

When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.

Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
2022-10-13 14:57:55 -06:00
malizz b0b0cbb8ee
increase protobuf size limit for cluster peering (#14976) 2022-10-13 13:46:51 -07:00
Derek Menteer 4e140c98bc Address PR comments. 2022-10-13 14:11:02 -05:00
Derek Menteer 1e394da400 Disallow peering to the same cluster. 2022-10-13 14:11:02 -05:00
Derek Menteer 8742fbe14f Prevent consul peer-exports by discovery chain. 2022-10-13 12:45:09 -05:00
Derek Menteer f366edcb8d Prevent the "consul" service from being exported. 2022-10-13 12:45:09 -05:00
Derek Menteer caa1396255 Add remote peer partition and datacenter info. 2022-10-13 10:37:41 -05:00
Dan Upton cbb4a030c4
xds: properly merge central config for "agentless" services (#14962) 2022-10-13 12:04:59 +01:00
Dan Upton 0af9f16343
bug: fix goroutine leaks caused by incorrect usage of `WatchCh` (#14916)
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.

In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.

In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
2022-10-13 12:04:27 +01:00
Hans Hasselberg 0d5935ab83
adding configuration option cloud.scada_address (#14936)
* adding scada_address

* config tests

* add changelog entry
2022-10-13 11:31:28 +02:00
Paul Glass bcda205f88
Add consul.xds.server.streamStart metric (#14957)
This adds a new consul.xds.server.streamStart metric to measure the time taken to first generate xDS resources after an xDS stream is opened.
2022-10-12 14:17:58 -05:00
Riddhi Shah 345191a0df
Service http checks data source for agentless proxies (#14924)
Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies).
Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.
2022-10-12 07:49:56 -07:00
Freddy 9ca8bb8ec4
Merge pull request #14958 from hashicorp/peering/nonce 2022-10-12 08:18:15 -06:00
freddygv 1b46b35041 Actually track nonce in test 2022-10-12 07:50:17 -06:00
Derek Menteer f330438a45 Fix incorrect backoff-wait logic. 2022-10-12 08:01:10 -05:00
freddygv 7f9a5d0f58 Add basic nonce management
This commit adds a monotonically increasing nonce to include in peering
replication response messages. Every ack/nack from the peer handling a
response will include this nonce, allowing to correlate the ack/nack
with a specific resource.

At the moment nothing is done with the nonce when it is received. In the
future we may want to add functionality such as retries on NACKs,
depending on the class of error.
2022-10-11 19:02:04 -06:00
Paul Glass d17af23641
gRPC server metrics (#14922)
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
2022-10-11 17:00:32 -05:00
cskh e0356e1502
fix(peering): add missing grpc_tls_port for server address reconciliation (#14944) 2022-10-11 10:56:29 -04:00
freddygv f4cc4577ca Fix alias check leak
Preivously when alias check was removed it would not be stopped nor
cleaned up from the associated aliasChecks map.

This means that any time an alias check was deregistered we would
leak a goroutine for CheckAlias.run() because the stopCh would never
be closed.

This issue mostly affects service mesh deployments on platforms where
the client agent is mostly static but proxy services come and go
regularly, since by default sidecars are registered with an alias check.
2022-10-10 16:42:29 -06:00
James Oulman b8bd7a3058
Configure Envoy alpn_protocols based on service protocol (#14356)
* Configure Envoy alpn_protocols based on service protocol

* define alpnProtocols in a more standard way

* http2 protocol should be h2 only

* formatting

* add test for getAlpnProtocol()

* create changelog entry

* change scope is connect-proxy

* ignore errors on ParseProxyConfig; fixes linter

* add tests for grpc and http2 public listeners

* remove newlines from PR

* Add alpn_protocol configuration for ingress gateway

* Guard against nil tlsContext

* add ingress gateway w/ TLS tests for gRPC and HTTP2

* getAlpnProtocols: add TCP protocol test

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add Gateway level TLS config with mixed protocol listeners to validate ALPN

* update changelog to include ingress-gateway

* add http/1.1 to http2 ALPN

* go fmt

* fix test on custom-trace-listener
2022-10-10 13:13:56 -07:00
freddygv bf72df7b0e Fixup test 2022-10-10 13:20:14 -06:00
Chris S. Kim 4f4112662e Fix nil pointer 2022-10-10 13:20:14 -06:00
Chris S. Kim b0a4c5c563 Include stream-related information in peering endpoints 2022-10-10 13:20:14 -06:00
Paul Glass c0c187f1c5
Merge central config for GetEnvoyBootstrapParams (#14869)
This fixes GetEnvoyBootstrapParams to merge in proxy-defaults and service-defaults.

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-10-10 12:40:27 -05:00
Freddy 4abad02abd
Merge pull request #14796 from hashicorp/peering/use-connect-ca 2022-10-07 10:37:37 -06:00
freddygv 7d4da6eb22 Fixup test 2022-10-07 09:34:16 -06:00
freddygv 3034df6a5c Require Connect and TLS to generate peering tokens
By requiring Connect and a gRPC TLS listener we can automatically
configure TLS for all peering control-plane traffic.
2022-10-07 09:06:29 -06:00
freddygv fac3ddc857 Use internal server certificate for peering TLS
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.

Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA

Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.

Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
2022-10-07 09:05:32 -06:00
freddygv 5f97223822 Simplify mgw watch mgmt 2022-10-07 08:54:37 -06:00
freddygv d54db25421 Use existing query options to build ctx 2022-10-07 08:46:53 -06:00
DanStough 77ab28c5c7 feat: xDS updates for peerings control plane through mesh gw 2022-10-07 08:46:42 -06:00
Eric Haberkorn 1633cf20ea
Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic (#14817)
Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic
2022-10-06 09:54:14 -04:00
cskh c1b5f34fb7
fix: missing UDP field in checkType (#14885)
* fix: missing UDP field in checkType

* Add changelog

* Update doc
2022-10-05 15:57:21 -04:00
Derek Menteer a279d2d329
Fix explicit tproxy listeners with discovery chains. (#14751)
Fix explicit tproxy listeners with discovery chains.
2022-10-05 14:38:25 -05:00
Alex Oskotsky 13da2c5fad
Add the ability to retry on reset connection to service-routers (#12890) 2022-10-05 13:06:44 -04:00
John Murret 79a541fd7d
Upgrade serf to v0.10.1 and memberlist to v0.5.0 to get memberlist size metrics and broadcast queue depth metric (#14873)
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric

* update changelog

* update changelog

* correcting changelog

* adding "QueueCheckInterval" for memberlist to test

* updating integration test containers to grab latest api
2022-10-04 17:51:37 -06:00
Evan Culver a3be5a5a82
connect: Bump Envoy 1.20 to 1.20.7, 1.21 to 1.21.5 and 1.22 to 1.22.5 (#14831) 2022-10-04 13:15:01 -07:00
Eric Haberkorn 1b565444be
Rename `PeerName` to `Peer` on prepared queries and exported services (#14854) 2022-10-04 14:46:15 -04:00
Freddy d9fe3578ac
Merge pull request #14734 from hashicorp/NET-643-update-mesh-gateway-envoy-config-for-inbound-peering-control-plane-traffic 2022-10-03 12:54:11 -06:00
freddygv b15d41534f Update xds generation for peering over mesh gws
This commit adds the xDS resources needed for INBOUND traffic from peer
clusters:

- 1 filter chain for all inbound peering requests.
- 1 cluster for all inbound peering requests.
- 1 endpoint per voting server with the gRPC TLS port configured.

There is one filter chain and cluster because unlike with WAN
federation, peer clusters will not attempt to dial individual servers.
Peer clusters will only dial the local mesh gateway addresses.
2022-10-03 12:42:27 -06:00
freddygv a8c4d6bc55 Share mgw addrs in peering stream if needed
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.

The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
2022-10-03 11:42:20 -06:00
freddygv 4ff9d475b0 Return mesh gateway addrs if peering through mgw 2022-10-03 11:35:10 -06:00
chappie ad7295e5d9
Merge pull request #14811 from hashicorp/chappie/dns
Add DNS gRPC proxying support
2022-10-03 08:02:48 -07:00
Chris Chapman d7b5351b66
Making suggested comments 2022-09-30 15:03:33 -07:00
Chris Chapman 46bea72212
Making suggested changes 2022-09-30 14:51:12 -07:00
Chris Chapman a05563b788
Update comment 2022-09-30 09:35:01 -07:00
DanStough 7f8971d77f chore: fix flakey scada provider test 2022-09-30 11:56:40 -04:00
Chris Chapman 81e267171b
Bind a dns mux handler to gRPC proxy 2022-09-29 21:44:45 -07:00
Chris Chapman 7bc9cad180
Adding grpc handler for dns proxy 2022-09-29 21:19:51 -07:00
Eric Haberkorn 80e51ff907
Add exported services event to cluster peering replication. (#14797) 2022-09-29 15:37:19 -04:00
Ashwin Venkatesh 4ba260958c
bug: watch local mesh gateways in non-default partitions with agentless (#14799) 2022-09-29 13:19:04 -04:00
cskh 69f40df548
feat(ingress gateway: support configuring limits in ingress-gateway c… (#14749)
* feat(ingress gateway: support configuring limits in ingress-gateway config entry

- a new Defaults field with max_connections, max_pending_connections, max_requests
  is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
  individual services to overwrite the value in Default
- added unit test and integration test
- updated doc

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
2022-09-28 14:56:46 -04:00
malizz 84b0f408fa
Support Stale Queries for Trust Bundle Lookups (#14724)
* initial commit

* add tags, add conversations

* add test for query options utility functions

* update previous tests

* fix test

* don't error out on empty context

* add changelog

* update decode config
2022-09-28 09:56:59 -07:00
Eric Haberkorn 6570d5f004
Enable outbound peered requests to go through local mesh gateway (#14763) 2022-09-27 09:49:28 -04:00
Nick Ethier 1c1b0994b8
add HCP integration component (#14723)
* add HCP integration

* lint: use non-deprecated logging interface
2022-09-26 14:58:15 -04:00
Derek Menteer aa4709ab74
Add envoy connection balancing. (#14616)
Add envoy connection balancing config.
2022-09-26 11:29:06 -05:00
Chris S. Kim 2203cdc4db Add new internal endpoint to list exported services to a peer 2022-09-23 09:43:56 -04:00
freddygv d818d7b096 Manage local server watches depending on mesh cfg
Routing peering control plane traffic through mesh gateways can be
enabled or disabled at runtime with the mesh config entry.

This commit updates proxycfg to add or cancel watches for local servers
depending on this central config.

Note that WAN federation over mesh gateways is determined by a service
metadata flag, and any updates to the gateway service registration will
force the creation of a new snapshot. If enabled, WAN-fed over mesh
gateways will trigger a local server watch on initialize().

Because of this we will only add/remove server watches if WAN federation
over mesh gateways is disabled.
2022-09-22 19:32:10 -06:00
Alessandro De Blasis 461b42ed48 fix(check): added missing OSService props 2022-09-21 13:10:21 +01:00
Alessandro De Blasis 5719fd6560 fix(checks): os_service OK message in output 2022-09-21 09:27:33 +01:00
Alessandro De Blasis f440966a38 fix(checks): os_service lifecycle bugfix 2022-09-21 09:26:47 +01:00
Alessandro De Blasis fc0dd92dcf fix(agent): uninitialized map panic error 2022-09-21 09:25:54 +01:00
malizz 1a0aa38a82
increase the size of txn to support vault (#14599)
* increase the size of txn to support vault

* add test, revert change to acl endpoint

* add changelog

* update test, add passing test case

* Update .changelog/14599.txt

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-09-19 09:07:19 -07:00
freddygv 5fbb26525b Add awareness of server mode to TLS configurator
Preivously the TLS configurator would default to presenting auto TLS
certificates as client certificates.

Server agents should not have this behavior and should instead present
the manually configured certs. The autoTLS certs for servers are
exclusively used for peering and should not be used as the default for
outbound communication.
2022-09-16 17:57:10 -06:00
freddygv f30bc96239 Test fixes
- Pulls in CLI test fix from main
- Updates psutils to fix TestAgent_Host on M1 Mac
2022-09-16 17:57:10 -06:00
freddygv 02d3ce1039 Add server certificate manager
This certificate manager will request a leaf certificate for server
agents and then keep them up to date.
2022-09-16 17:57:10 -06:00
freddygv 0e5131bd33 Generate ACL token for server management
This commit introduces a new ACL token used for internal server
management purposes.

It has a few key properties:
- It has unlimited permissions.
- It is persisted through Raft as System Metadata rather than in the
ACL tokens table. This is to avoid users seeing or modifying it.
- It is re-generated on leadership establishment.
2022-09-16 17:54:34 -06:00
freddygv 0ea3353537 Add handling in agent cache for server leaf certs 2022-09-16 17:54:34 -06:00
Kyle Havlovitz 0d9ae52643
Merge pull request #14598 from hashicorp/root-removal-fix
connect/ca: Don't discard old roots on primaryInitialize
2022-09-15 14:36:01 -07:00
Kyle Havlovitz 6105a7fd9f connect/ca: don't discard old roots on primaryInitialize 2022-09-15 12:59:09 -07:00
Gabriel Santos e53af28bd7
Middleware: `RequestRecorder` reports calls below 1ms as decimal value (#12905)
* Typos

* Test failing

* Convert values <1ms to decimal

* Fix test

* Update docs and test error msg

* Applied suggested changes to test case

* Changelog file and suggested changes

* Update .changelog/12905.txt

Co-authored-by: Chris S. Kim <kisunji92@gmail.com>

* suggested change - start duration with microseconds instead of nanoseconds

* fix error

* suggested change - floats

Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: Chris S. Kim <kisunji92@gmail.com>
2022-09-15 13:04:37 -04:00
Daniel Graña 8c98172f53
[BUGFIX] Do not use interval as timeout (#14619)
Do not use interval as timeout
2022-09-15 12:39:48 -04:00
Evan Culver d0416f593c
connect: Bump latest Envoy to 1.23.1 in test matrix (#14573) 2022-09-14 13:20:16 -07:00
DanStough 485e1b5d4e fix(peering): generate token metrics only for leader 2022-09-14 11:37:30 -04:00
DanStough 2a2debee64 feat(peering): validate server name conflicts on establish 2022-09-14 11:37:30 -04:00
Kyle Havlovitz 60cee76746
Merge pull request #14516 from hashicorp/ca-ttl-fixes
Fix inconsistent TTL behavior in CA providers
2022-09-13 16:07:36 -07:00
Kyle Havlovitz d67bccd210 Update intermediate pki mount/role when reconfiguring Vault provider 2022-09-13 15:42:26 -07:00
Kyle Havlovitz f46955101a connect/ca: Clarify behavior around IntermediateCertTTL in CA config 2022-09-13 15:42:26 -07:00
DanStough 0150e88200 feat: add PeerThroughMeshGateways to mesh config 2022-09-13 17:19:54 -04:00
Derek Menteer 0aa13733a0
Add CSR check for number of URIs. (#14579)
Add CSR check for number of URIs.
2022-09-13 14:21:47 -05:00
Derek Menteer db83ff4fa6 Add input validation for auto-config JWT authorization checks. 2022-09-13 11:16:36 -05:00
cskh f22685b969
Config-entry: Support proxy config in service-defaults (#14395)
* Config-entry: Support proxy config in service-defaults

* Update website/content/docs/connect/config-entries/service-defaults.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2022-09-12 10:41:58 -04:00
Eric Haberkorn aa8268e50c
Implement Cluster Peering Redirects (#14445)
implement cluster peering redirects
2022-09-09 13:58:28 -04:00
skpratt b761589340
add non-double-prefixed metrics (#14193) 2022-09-09 12:13:43 -05:00
skpratt 19f79aa9a6
PR #14057 follow up fix: service id parsing from sidecar id (#14541)
* fix service id parsing from sidecar id

* simplify suffix trimming
2022-09-09 09:47:10 -05:00
Dan Upton 1c2c975b0b
xDS Load Balancing (#14397)
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.

In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.

This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.

If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.

Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.

The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
2022-09-09 15:02:01 +01:00
Derek Menteer f7c884f0af Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-08 14:53:08 -05:00
Derek Menteer bfe7c5e8af Remove rebuilding grpc server. 2022-09-08 13:45:44 -05:00
Derek Menteer 80d31458e5 Various cleanups. 2022-09-08 10:51:50 -05:00
Chris S. Kim 03df6c3ac6
Reuse http.DefaultTransport in UIMetricsProxy (#14521)
http.Transport keeps a pool of connections and should be reused when possible. We instantiate a new http.DefaultTransport for every metrics request, making large numbers of concurrent requests inefficiently spin up new connections instead of reusing open ones.
2022-09-08 11:02:05 -04:00
Chris S. Kim 1c4a6eef4f
Merge pull request #14285 from hashicorp/NET-638-push-server-address-updates-to-the-peer
peering: Subscribe to server address changes and push updates to peers
2022-09-07 09:30:45 -04:00
skpratt 3bf1edfb3f
move port and default check logic to locked step (#14057) 2022-09-06 19:35:31 -05:00
Freddy f4dfd42e0a
Add SpiffeID for Consul server agents (#14485)
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

By adding a SpiffeID for server agents, servers can now request a leaf
certificate from the Connect CA.

This new Spiffe ID has a key property: servers are identified by their
datacenter name and trust domain. All servers that share these
attributes will share a ServerURI.

The aim is to use these certificates to verify the server name of ANY
server in a Consul datacenter.
2022-09-06 17:58:13 -06:00
Daniel Upton 8c46e48e0d proxycfg-glue: server-local implementation of IntentionUpstreamsDestination
This is the OSS portion of enterprise PR 2463.

Generalises the serverIntentionUpstreams type to support matching on a
service or destination.
2022-09-06 23:27:25 +01:00
Daniel Upton f8dba7e9ac proxycfg-glue: server-local implementation of InternalServiceDump
This is the OSS portion of enterprise PR 2489.

This PR introduces a server-local implementation of the
proxycfg.InternalServiceDump interface that sources data from a blocking query
against the server's state store.

For simplicity, it only implements the subset of the Internal.ServiceDump RPC
handler actually used by proxycfg - as such the result type has been changed
to IndexedCheckServiceNodes to avoid confusion.
2022-09-06 23:27:25 +01:00
Daniel Upton a31738f76f proxycfg-glue: server-local implementation of ResolvedServiceConfig
This is the OSS portion of enterprise PR 2460.

Introduces a server-local implementation of the proxycfg.ResolvedServiceConfig
interface that sources data from a blocking query against the server's state
store.

It moves the service config resolution logic into the agent/configentry package
so that it can be used in both the RPC handler and data source.

I've also done a little re-arranging and adding comments to call out data
sources for which there is to be no server-local equivalent.
2022-09-06 23:27:25 +01:00
Derek Menteer bf769daae4 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-06 10:51:04 -05:00
Derek Menteer 02ae66bda8 Add kv txn get-not-exists operation. 2022-09-06 10:28:59 -05:00
Chris S. Kim 953808e899 PR feedback on terminated state checking 2022-09-06 10:28:20 -04:00
Chris S. Kim ddb9375cb6 Add testcase for parsing grpc_port 2022-09-06 10:17:44 -04:00
Kyle Havlovitz d97ccccdd5
Merge pull request #14429 from hashicorp/ca-prune-intermediates
Prune old expired intermediate certs when appending a new one
2022-09-02 15:34:33 -07:00
cskh 0f7d4efac3
fix(txn api): missing proxy config in registering proxy service (#14471)
* fix(txn api): missing proxy config in registering proxy service
2022-09-02 14:28:05 -04:00
Chris S. Kim ec36755cc0 Properly assert for ServerAddresses replication request 2022-09-02 11:44:54 -04:00
Chris S. Kim d1d9dbff8e Fix terminate not returning early 2022-09-02 11:44:38 -04:00
Derek Menteer f64771c707 Address PR comments. 2022-09-01 16:54:24 -05:00
Kyle Havlovitz 0c2fb7252d Prune intermediates before appending new one 2022-09-01 14:24:30 -07:00
Luke Kysow 81d7cc41dc
Use proxy address for default check (#14433)
When a sidecar proxy is registered, a check is automatically added.
Previously, the address this check used was the underlying service's
address instead of the proxy's address, even though the check is testing
if the proxy is up.

This worked in most cases because the proxy ran on the same IP as the
underlying service but it's not guaranteed and so the proper default
address should be the proxy's address.
2022-09-01 14:03:35 -07:00
malizz f1054dada9
fix TestProxyConfigEntry (#14435) 2022-09-01 11:37:47 -07:00
malizz b3ac8f48ca
Add additional parameters to envoy passive health check config (#14238)
* draft commit

* add changelog, update test

* remove extra param

* fix test

* update type to account for nil value

* add test for custom passive health check

* update comments and tests

* update description in docs

* fix missing commas
2022-09-01 09:59:11 -07:00
Chris S. Kim f2b147e575 Add Internal.ServiceDump support for querying by PeerName 2022-09-01 10:32:59 -04:00
Chris S. Kim e62f830fa8
Merge pull request #13998 from jorgemarey/f-new-tracing-envoy
Add new envoy tracing configuration
2022-09-01 08:57:23 -04:00
Derek Menteer cf7f24a6ec Change serf-tag references to field references. 2022-08-31 16:38:42 -05:00
malizz a80e0bcd00
validate args before deleting proxy defaults (#14290)
* validate args before deleting proxy defaults

* add changelog

* validate name when normalizing proxy defaults

* add test for proxyConfigEntry

* add comments
2022-08-31 13:03:38 -07:00
Kyle Havlovitz 113454645d Prune old expired intermediate certs when appending a new one 2022-08-31 11:41:58 -07:00
Alessandro De Blasis 60c7c831c6 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service 2022-08-30 18:49:20 +01:00
Eric Haberkorn 3726a0ab7a
Finish up cluster peering failover (#14396) 2022-08-30 11:46:34 -04:00
Chris S. Kim 560d410c6d Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-30 11:09:25 -04:00
Jorge Marey 3f3bb8831e Fix typos. Add test. Add documentation 2022-08-30 16:59:02 +02:00
Jorge Marey ed7b34128f Add new tracing configuration 2022-08-30 16:59:02 +02:00
Freddy 97d1db759f
Merge pull request #13496 from maxb/fix-kv_entries-metric 2022-08-29 15:35:11 -06:00
Freddy 829a2a8722
Merge pull request #14364 from hashicorp/peering/term-delete 2022-08-29 15:33:18 -06:00
Max Bowsher decc9231ee Merge branch 'main' into fix-kv_entries-metric 2022-08-29 22:22:10 +01:00
Chris S. Kim 5010fa5c03
Merge pull request #14371 from hashicorp/kisunji/peering-metrics-update
Adjust metrics reporting for peering tracker
2022-08-29 17:16:19 -04:00
Chris S. Kim 74ddf040dd Add heartbeat timeout grace period when accounting for peering health 2022-08-29 16:32:26 -04:00
Derek Menteer 0ceec9017b Expose `grpc_tls` via serf for cluster peering. 2022-08-29 13:43:49 -05:00
Derek Menteer 1255a8a20d Add separate grpc_tls port.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.

The resulting behavior from this commit is:
  `ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
  `ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
  `ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
2022-08-29 13:43:43 -05:00
freddygv 310608fb19 Add validation to prevent switching dialing mode
This prevents unexpected changes to the output of ShouldDial, which
should never change unless a peering is deleted and recreated.
2022-08-29 12:31:13 -06:00
Eric Haberkorn 72f90754ae
Update max_ejection_percent on outlier detection for peered clusters to 100% (#14373)
We can't trust health checks on peered services when service resolvers,
splitters and routers are used.
2022-08-29 13:46:41 -04:00
Alessandro De Blasis 26cc56bc68 fix(agent): removed redundant code in docker check as well 2022-08-29 18:15:59 +01:00
Alessandro De Blasis c0d647d11e fix(agent): removed redundant check on prev. running check 2022-08-29 17:53:39 +01:00
Chris S. Kim def529edd3 Rename test 2022-08-29 10:34:50 -04:00
Chris S. Kim 93271f649c Fix test 2022-08-29 10:20:30 -04:00
Eric Haberkorn 1099665473
Update the structs and discovery chain for service resolver redirects to cluster peers. (#14366) 2022-08-29 09:51:32 -04:00
Alessandro De Blasis f3437eaf05 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2022-08-28 18:09:31 +01:00
Alessandro De Blasis f634e36811 fix(OSServiceCheck): fixes following code-review 2022-08-28 17:56:30 +01:00
Chris S. Kim 4d97e2f936 Adjust metrics reporting for peering tracker 2022-08-26 17:34:17 -04:00
freddygv 650e48624d Allow terminated peerings to be deleted
Peerings are terminated when a peer decides to delete the peering from
their end. Deleting a peering sends a termination message to the peer
and triggers them to mark the peering as terminated but does NOT delete
the peering itself. This is to prevent peerings from disappearing from
both sides just because one side deleted them.

Previously the Delete endpoint was skipping the deletion if the peering
was not marked as active. However, terminated peerings are also
inactive.

This PR makes some updates so that peerings marked as terminated can be
deleted by users.
2022-08-26 10:52:47 -06:00
Chris S. Kim 937a8ec742 Fix casing 2022-08-26 11:56:26 -04:00
Chris S. Kim 87962b9713 Merge branch 'main' into catalog-service-list-filter 2022-08-26 11:16:06 -04:00
Chris S. Kim e2fe8b8d65 Fix tests for enterprise 2022-08-26 11:14:02 -04:00
Chris S. Kim 1c43a1a7b4 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-26 10:43:56 -04:00
Chris S. Kim 6ddcc04613
Replace ring buffer with async version (#14314)
We need to watch for changes to peerings and update the server addresses which get served by the ring buffer.

Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.
2022-08-26 10:27:13 -04:00
alex 30ff2e9a35
peering: add peer health metric (#14004)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-08-25 16:32:59 -07:00
Chris S. Kim 181063cd23 Exit loop when context is cancelled 2022-08-25 11:48:25 -04:00
cskh 41aea65214
Fix: the inboundconnection limit filter should be placed in front of http co… (#14325)
* fix: the inboundconnection limit should be placed in front of http connection manager

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-08-24 14:13:10 -04:00
Chris S. Kim 8c94d1a80c Update test comment 2022-08-24 13:50:24 -04:00
Chris S. Kim 5f2959329f Add check for zero-length server addresses 2022-08-24 13:30:52 -04:00
skpratt 919da33331
no-op: refactor usagemetrics tests for clarity and DRY cases (#14313) 2022-08-24 12:00:09 -05:00
Pablo Ruiz García 1f293e5244
Added new auto_encrypt.grpc_server_tls config option to control AutoTLS enabling of GRPC Server's TLS usage
Fix for #14253

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-08-24 12:31:38 -04:00
Dan Upton 3b993f2da7
dataplane: update envoy bootstrap params for consul-dataplane (#14017)
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.

Exposing node_name and node_id:

consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).

Properly setting service for gateways:

To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
2022-08-24 12:03:15 +01:00
Daniel Upton 13c04a13af proxycfg: terminate stream on irrecoverable errors
This is the OSS portion of enterprise PR 2339.

It improves our handling of "irrecoverable" errors in proxycfg data sources.

The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.

Materializers would also sit burning resources retrying something that could
never succeed.

Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
2022-08-23 20:17:49 +01:00
Chris S. Kim 81e965479b PR feedback to specify Node name in test mock 2022-08-23 11:51:04 -04:00
Eric Haberkorn 58901ad7df
Cluster peering failover disco chain changes (#14296) 2022-08-23 09:13:43 -04:00
Chris S. Kim cdc8b0634d Fix flakes 2022-08-22 14:45:31 -04:00
Chris S. Kim 03e92826aa Increase heartbeat rate to reduce test flakes 2022-08-22 14:24:05 -04:00
Chris S. Kim 06ba9775ee Remove check for ResponseNonce 2022-08-22 13:55:01 -04:00
Chris S. Kim 547fb9570e Add missing mock assertions 2022-08-22 13:55:01 -04:00
Chris S. Kim adff2eef16 Fix data race
newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.
2022-08-22 13:55:01 -04:00
cskh 060531a29a
Fix: add missing ent meta for test (#14289) 2022-08-22 13:51:04 -04:00
Chris S. Kim 4e40e1d222 Handle server addresses update as client 2022-08-22 13:42:12 -04:00
Chris S. Kim 584d3409c4 Send server addresses on update from server 2022-08-22 13:41:44 -04:00
Chris S. Kim c9d8ad3939 Add new subscription for server addresses 2022-08-22 13:40:25 -04:00
Chris S. Kim 028b87d51f Cleanup unused logger 2022-08-22 13:40:23 -04:00
Chris S. Kim df951bd601 Expose external gRPC port in autopilot
The grpc_port was added to a NodeService's meta in ea58f235f5
2022-08-22 10:07:00 -04:00
cskh 527ebd068a
fix: missing MaxInboundConnections field in service-defaults config entry (#14072)
* fix:  missing max_inbound_connections field in merge config
2022-08-19 14:11:21 -04:00
cskh e84e4b8868
Fix: upgrade pkg imdario/merg to prevent merge config panic (#14237)
* upgrade imdario/merg to prevent merge config panic

* test: service definition takes precedence over service-defaults in merged results
2022-08-17 21:14:04 -04:00
James Hartig f92883bbce Use the maximum jitter when calculating the timeout
The timeout should include the maximum possible
jitter since the server will randomly add to it's
timeout a jitter. If the server's timeout is less
than the client's timeout then the client will
return an i/o deadline reached error.

Before:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
rpc error making call: i/o deadline reached
real    10m11.469s
user    0m0.018s
sys     0m0.023s
```

After:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
[...]
real    10m35.835s
user    0m0.021s
sys     0m0.021s
```
2022-08-17 10:24:09 -04:00
Eric Haberkorn 1a73b0ca20
Add `Targets` field to service resolver failovers. (#14162)
This field will be used for cluster peering failover.
2022-08-15 09:20:25 -04:00
Alessandro De Blasis 5dee555888 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2022-08-15 08:26:55 +01:00
Alessandro De Blasis ab611eabc3 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2022-08-15 08:09:56 +01:00
cskh d46b515b64
fix: missing segment and partition (#14194) 2022-08-12 15:21:39 -04:00