4782 Commits

Author SHA1 Message Date
freddygv
e69bc727ec Update peering establishment to maybe use gateways
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.

Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.

This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.

When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.

Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
2022-10-13 14:57:55 -06:00
malizz
b0b0cbb8ee
increase protobuf size limit for cluster peering (#14976) 2022-10-13 13:46:51 -07:00
Derek Menteer
4e140c98bc Address PR comments. 2022-10-13 14:11:02 -05:00
Derek Menteer
1e394da400 Disallow peering to the same cluster. 2022-10-13 14:11:02 -05:00
Derek Menteer
8742fbe14f Prevent consul peer-exports by discovery chain. 2022-10-13 12:45:09 -05:00
Derek Menteer
f366edcb8d Prevent the "consul" service from being exported. 2022-10-13 12:45:09 -05:00
Derek Menteer
caa1396255 Add remote peer partition and datacenter info. 2022-10-13 10:37:41 -05:00
Dan Upton
cbb4a030c4
xds: properly merge central config for "agentless" services (#14962) 2022-10-13 12:04:59 +01:00
Dan Upton
0af9f16343
bug: fix goroutine leaks caused by incorrect usage of WatchCh (#14916)
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.

In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.

In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
2022-10-13 12:04:27 +01:00
Hans Hasselberg
0d5935ab83
adding configuration option cloud.scada_address (#14936)
* adding scada_address

* config tests

* add changelog entry
2022-10-13 11:31:28 +02:00
Paul Glass
bcda205f88
Add consul.xds.server.streamStart metric (#14957)
This adds a new consul.xds.server.streamStart metric to measure the time taken to first generate xDS resources after an xDS stream is opened.
2022-10-12 14:17:58 -05:00
Riddhi Shah
345191a0df
Service http checks data source for agentless proxies (#14924)
Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies).
Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.
2022-10-12 07:49:56 -07:00
Freddy
9ca8bb8ec4
Merge pull request #14958 from hashicorp/peering/nonce 2022-10-12 08:18:15 -06:00
freddygv
1b46b35041 Actually track nonce in test 2022-10-12 07:50:17 -06:00
Derek Menteer
f330438a45 Fix incorrect backoff-wait logic. 2022-10-12 08:01:10 -05:00
freddygv
7f9a5d0f58 Add basic nonce management
This commit adds a monotonically increasing nonce to include in peering
replication response messages. Every ack/nack from the peer handling a
response will include this nonce, allowing to correlate the ack/nack
with a specific resource.

At the moment nothing is done with the nonce when it is received. In the
future we may want to add functionality such as retries on NACKs,
depending on the class of error.
2022-10-11 19:02:04 -06:00
Paul Glass
d17af23641
gRPC server metrics (#14922)
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
2022-10-11 17:00:32 -05:00
cskh
e0356e1502
fix(peering): add missing grpc_tls_port for server address reconciliation (#14944) 2022-10-11 10:56:29 -04:00
freddygv
f4cc4577ca Fix alias check leak
Preivously when alias check was removed it would not be stopped nor
cleaned up from the associated aliasChecks map.

This means that any time an alias check was deregistered we would
leak a goroutine for CheckAlias.run() because the stopCh would never
be closed.

This issue mostly affects service mesh deployments on platforms where
the client agent is mostly static but proxy services come and go
regularly, since by default sidecars are registered with an alias check.
2022-10-10 16:42:29 -06:00
James Oulman
b8bd7a3058
Configure Envoy alpn_protocols based on service protocol (#14356)
* Configure Envoy alpn_protocols based on service protocol

* define alpnProtocols in a more standard way

* http2 protocol should be h2 only

* formatting

* add test for getAlpnProtocol()

* create changelog entry

* change scope is connect-proxy

* ignore errors on ParseProxyConfig; fixes linter

* add tests for grpc and http2 public listeners

* remove newlines from PR

* Add alpn_protocol configuration for ingress gateway

* Guard against nil tlsContext

* add ingress gateway w/ TLS tests for gRPC and HTTP2

* getAlpnProtocols: add TCP protocol test

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add Gateway level TLS config with mixed protocol listeners to validate ALPN

* update changelog to include ingress-gateway

* add http/1.1 to http2 ALPN

* go fmt

* fix test on custom-trace-listener
2022-10-10 13:13:56 -07:00
freddygv
bf72df7b0e Fixup test 2022-10-10 13:20:14 -06:00
Chris S. Kim
4f4112662e Fix nil pointer 2022-10-10 13:20:14 -06:00
Chris S. Kim
b0a4c5c563 Include stream-related information in peering endpoints 2022-10-10 13:20:14 -06:00
Paul Glass
c0c187f1c5
Merge central config for GetEnvoyBootstrapParams (#14869)
This fixes GetEnvoyBootstrapParams to merge in proxy-defaults and service-defaults.

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-10-10 12:40:27 -05:00
Freddy
4abad02abd
Merge pull request #14796 from hashicorp/peering/use-connect-ca 2022-10-07 10:37:37 -06:00
freddygv
7d4da6eb22 Fixup test 2022-10-07 09:34:16 -06:00
freddygv
3034df6a5c Require Connect and TLS to generate peering tokens
By requiring Connect and a gRPC TLS listener we can automatically
configure TLS for all peering control-plane traffic.
2022-10-07 09:06:29 -06:00
freddygv
fac3ddc857 Use internal server certificate for peering TLS
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.

Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA

Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.

Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
2022-10-07 09:05:32 -06:00
freddygv
5f97223822 Simplify mgw watch mgmt 2022-10-07 08:54:37 -06:00
freddygv
d54db25421 Use existing query options to build ctx 2022-10-07 08:46:53 -06:00
DanStough
77ab28c5c7 feat: xDS updates for peerings control plane through mesh gw 2022-10-07 08:46:42 -06:00
Eric Haberkorn
1633cf20ea
Make the mesh gateway changes to allow local mode for cluster peering data plane traffic (#14817)
Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic
2022-10-06 09:54:14 -04:00
cskh
c1b5f34fb7
fix: missing UDP field in checkType (#14885)
* fix: missing UDP field in checkType

* Add changelog

* Update doc
2022-10-05 15:57:21 -04:00
Derek Menteer
a279d2d329
Fix explicit tproxy listeners with discovery chains. (#14751)
Fix explicit tproxy listeners with discovery chains.
2022-10-05 14:38:25 -05:00
Alex Oskotsky
13da2c5fad
Add the ability to retry on reset connection to service-routers (#12890) 2022-10-05 13:06:44 -04:00
John Murret
79a541fd7d
Upgrade serf to v0.10.1 and memberlist to v0.5.0 to get memberlist size metrics and broadcast queue depth metric (#14873)
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric

* update changelog

* update changelog

* correcting changelog

* adding "QueueCheckInterval" for memberlist to test

* updating integration test containers to grab latest api
2022-10-04 17:51:37 -06:00
Evan Culver
a3be5a5a82
connect: Bump Envoy 1.20 to 1.20.7, 1.21 to 1.21.5 and 1.22 to 1.22.5 (#14831) 2022-10-04 13:15:01 -07:00
Eric Haberkorn
1b565444be
Rename PeerName to Peer on prepared queries and exported services (#14854) 2022-10-04 14:46:15 -04:00
Freddy
d9fe3578ac
Merge pull request #14734 from hashicorp/NET-643-update-mesh-gateway-envoy-config-for-inbound-peering-control-plane-traffic 2022-10-03 12:54:11 -06:00
freddygv
b15d41534f Update xds generation for peering over mesh gws
This commit adds the xDS resources needed for INBOUND traffic from peer
clusters:

- 1 filter chain for all inbound peering requests.
- 1 cluster for all inbound peering requests.
- 1 endpoint per voting server with the gRPC TLS port configured.

There is one filter chain and cluster because unlike with WAN
federation, peer clusters will not attempt to dial individual servers.
Peer clusters will only dial the local mesh gateway addresses.
2022-10-03 12:42:27 -06:00
freddygv
a8c4d6bc55 Share mgw addrs in peering stream if needed
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.

The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
2022-10-03 11:42:20 -06:00
freddygv
4ff9d475b0 Return mesh gateway addrs if peering through mgw 2022-10-03 11:35:10 -06:00
chappie
ad7295e5d9
Merge pull request #14811 from hashicorp/chappie/dns
Add DNS gRPC proxying support
2022-10-03 08:02:48 -07:00
Chris Chapman
d7b5351b66
Making suggested comments 2022-09-30 15:03:33 -07:00
Chris Chapman
46bea72212
Making suggested changes 2022-09-30 14:51:12 -07:00
Chris Chapman
a05563b788
Update comment 2022-09-30 09:35:01 -07:00
DanStough
7f8971d77f chore: fix flakey scada provider test 2022-09-30 11:56:40 -04:00
Chris Chapman
81e267171b
Bind a dns mux handler to gRPC proxy 2022-09-29 21:44:45 -07:00
Chris Chapman
7bc9cad180
Adding grpc handler for dns proxy 2022-09-29 21:19:51 -07:00
Eric Haberkorn
80e51ff907
Add exported services event to cluster peering replication. (#14797) 2022-09-29 15:37:19 -04:00