Commit Graph

162 Commits

Author SHA1 Message Date
Freddy 00b5b0a0a2
Update filter chain creation for sidecar/ingress listeners (#11245)
The duo of `makeUpstreamFilterChainForDiscoveryChain` and `makeListenerForDiscoveryChain` were really hard to reason about, and led to concealing a bug in their branching logic. There were several issues here:

- They tried to accomplish too much: determining filter name, cluster name, and whether RDS should be used. 
- They embedded logic to handle significantly different kinds of upstream listeners (passthrough, prepared query, typical services, and catch-all)
- They needed to coalesce different data sources (Upstream and CompiledDiscoveryChain)

Rather than handling all of those tasks inside of these functions, this PR pulls out the RDS/clusterName/filterName logic.

This refactor also fixed a bug with the handling of [UpstreamDefaults](https://www.consul.io/docs/connect/config-entries/service-defaults#defaults). These defaults get stored as UpstreamConfig in the proxy snapshot with a DestinationName of "*", since they apply to all upstreams. However, this wildcard destination name must not be used when creating the name of the associated upstream cluster. The coalescing logic in the original functions here was in some situations creating clusters with a `*.` prefix, which is not a valid destination.
2021-11-09 14:43:51 -07:00
Daniel Upton 50a1f20ff9
xds: prefer fed state gateway definitions if they're fresher (#11522)
Fixes an issue described in #10132, where if two DCs are WAN federated
over mesh gateways, and the gateway in the non-primary DC is terminated
and receives a new IP address (as is commonly the case when running them
on ephemeral compute instances) the primary DC is unable to re-establish
its connection until the agent running on its own gateway is restarted.

This was happening because we always preferred gateways discovered by
the `Internal.ServiceDump` RPC (which would fail because there's no way
to dial the remote DC) over those discovered in the federation state,
which is replicated as long as the primary DC's gateway is reachable.
2021-11-09 16:45:36 +00:00
Evan Culver 61be9371f5
connect: Remove support for Envoy 1.16 (#11354) 2021-10-27 18:51:35 -07:00
Evan Culver bec08f4ec3
connect: Add support for Envoy 1.20 (#11277) 2021-10-27 18:38:10 -07:00
freddygv e1691d1627 Update XDS for sidecars dialing through gateways 2021-10-26 23:35:48 -06:00
Paul Banks c891f30c24 Rebase and rebuild golden files for Envoy version bump 2021-10-19 21:37:58 +01:00
Paul Banks 78a00f2e1c Add support for enabling connect-based ingress TLS per listener. 2021-10-19 20:58:28 +01:00
Evan Culver fdbb742ffd
regenerate more envoy golden files 2021-09-30 10:57:47 -07:00
Evan Culver 585d9363ed
Merge branch 'main' into eculver/envoy-1.19.1 2021-09-28 11:54:33 -07:00
Paul Banks a9119e36a5 Fix merge conflict in xds tests 2021-09-23 10:12:37 +01:00
Paul Banks 2a3d3d3c23 Update xDS routes to support ingress services with different TLS config 2021-09-23 10:08:02 +01:00
Paul Banks 16b3b1c737 Update xDS Listeners with SDS support 2021-09-23 10:08:02 +01:00
Chris S. Kim f972048ebc
connect: Allow upstream listener escape hatch for prepared queries (#11109) 2021-09-22 15:27:10 -04:00
Evan Culver 2798383dbc
regenerate envoy golden files 2021-09-21 16:21:00 -07:00
Paul Banks e22cc9c53a Header manip for split legs plumbing 2021-09-10 21:09:24 +01:00
Paul Banks 83fc8723a3 Header manip for service-router plumbed through 2021-09-10 21:09:24 +01:00
Paul Banks f439dfc04f Ingress gateway header manip plumbing 2021-09-10 21:09:24 +01:00
Dhia Ayachi bc0e4f2f46
partition dicovery chains (#10983)
* partition dicovery chains

* fix default partition for OSS
2021-09-07 16:29:32 -04:00
freddygv af52d21884 Update prepared query cluster SAN validation
Previously SAN validation for prepared queries was broken because we
validated against the name, namespace, and datacenter for prepared
queries.

However, prepared queries can target:

- Services with a name that isn't their own
- Services in multiple datacenters

This means that the SpiffeID to validate needs to be based on the
prepared query endpoints, and not the prepared query's upstream
definition.

This commit updates prepared query clusters to account for that.
2021-08-20 17:40:33 -06:00
freddygv 85878685b7 Fixup proxy config test fixtures
- The TestNodeService helper created services with the fixed name "web",
and now that name is overridable.

- The discovery chain snapshot didn't have prepared query endpoints so
the endpoints tests were missing data for prepared queries
2021-08-20 17:38:57 -06:00
Freddy 12b7e07d5c
Merge pull request #10621 from hashicorp/vuln/validate-sans 2021-07-15 09:43:55 -06:00
R.B. Boyer 20feb42d3a
xds: ensure single L7 deny intention with default deny policy does not result in allow action (CVE-2021-36213) (#10619) 2021-07-15 10:09:00 -05:00
freddygv 5a82656510 Update golden files 2021-07-14 22:21:55 -06:00
freddygv 5454147c09 Update golden files to account for SAN validation 2021-07-14 22:21:55 -06:00
freddygv 924a5ba642 Regen golden files 2021-06-15 14:18:25 -06:00
freddygv 0aec6761dc Update ingress gateway stats labeling
In the absence of stats_tags to handle this pattern, when we pass
"ingress_upstream.$port" as the stat_prefix, Envoy splits up that prefix
and makes the port a part of the metric name.

For example:
- stat_prefix: ingress_upstream.8080

This leads to metric names like envoy_http_8080_no_route. Changing the
stat_prefix to ingress_upstream_80880 yields the expected metric names
such as envoy_http_no_route.

Note that we don't encode the destination's name/ns/dc in this
stat_prefix because for HTTP services ingress gateways use a single
filter chain. Only cluster metrics are available on a per-upstream
basis.
2021-06-15 08:52:18 -06:00
freddygv 6f8c6043b6 Update terminating gateway stats labeling
This change makes it so that the stat prefix for terminating gateways
matches that of connect proxies. By using the structure of
"upstream.svc.ns.dc" we can extract labels for the destination service,
namespace, and datacenter.
2021-06-15 08:52:18 -06:00
Freddy 429f9d8bb8
Add flag for transparent proxies to dial individual instances (#10329) 2021-06-09 14:34:17 -06:00
Freddy 7577f0e991
Revert "Avoid adding original_dst filter when not needed" (#10365) 2021-06-08 13:18:41 -06:00
Freddy 353280660f
Ensure passthrough clusters can be created (#10301) 2021-05-26 15:05:14 -06:00
Freddy 19334e8abf
Avoid adding original_dst filter when not needed (#10302) 2021-05-26 15:04:45 -06:00
Mark Anderson ff7fca756b Add simple test for downstream sockets
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-05-04 12:41:43 -07:00
Mark Anderson 6be9cebad0 Add tests for xds/listeners
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-05-04 12:41:43 -07:00
Freddy 2ca3f481f8
Only consider virtual IPs for transparent proxies (#10162)
Initially we were loading every potential upstream address into Envoy
and then routing traffic to the logical upstream service. The downside
of this behavior is that traffic meant to go to a specific instance
would be load balanced across ALL instances.

Traffic to specific instance IPs should be forwarded to the original
destination and if it's a destination in the mesh then we should ensure
the appropriate certificates are used.

This PR makes transparent proxying a Kubernetes-only feature for now
since support for other environments requires generating virtual IPs,
and Consul does not do that at the moment.
2021-05-03 14:15:22 -06:00
R.B. Boyer abc1dc0fe9
connect: update supported envoy versions to 1.18.2, 1.17.2, 1.16.3, and 1.15.4 (#10101)
The only thing that needed fixing up pertained to this section of the 1.18.x release notes:

> grpc_stats: the default value for stats_for_all_methods is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default.

For now to maintain status-quo I'm explicitly setting `stats_for_all_methods=true` in all versions to avoid relying upon the default.

Additionally the naming of the emitted metrics for these gRPC requests changed slightly so the integration test assertions for `case-grpc` needed adjusting.
2021-04-29 15:22:03 -05:00
R.B. Boyer 06848ce67e fix broken golden tests 2021-04-14 11:36:47 -05:00
Freddy e385e5992f
Merge pull request #9042 from lawliet89/tg-rewrite 2021-04-08 11:49:23 -06:00
R.B. Boyer 499fee73b3
connect: add toggle to globally disable wildcard outbound network access when transparent proxy is enabled (#9973)
This adds a new config entry kind "cluster" with a single special name "cluster" where this can be controlled.
2021-04-06 13:19:59 -05:00
Yong Wen Chua 409768d6e5
Merge branch 'master' of github.com:hashicorp/consul into tg-rewrite 2021-04-06 17:05:26 +08:00
freddygv ce964f8ea5 Update xds for transparent proxy 2021-03-17 13:40:49 -06:00
R.B. Boyer 398b766532
xds: default to speaking xDS v3, but allow for v2 to be spoken upon request (#9658)
- Also add support for envoy 1.17.0
2021-02-26 16:23:15 -06:00
R.B. Boyer be89557fb4
test: omit envoy golden test files that differ from the latest version (#9807)
Since we currently do no version switching this removes 75% of the PR
noise.

To generate all *.golden files were removed and then I ran:

    go test ./agent/xds -update
2021-02-24 14:04:31 -06:00
Yong Wen Chua 58b553704a
Update test fixtures 2021-02-24 16:24:32 +08:00
R.B. Boyer 3b6ffc447b
xds: remove deprecated usages of xDS (#9602)
Note that this does NOT upgrade to xDS v3. That will come in a future PR.

Additionally:

- Ignored staticcheck warnings about how github.com/golang/protobuf is deprecated.
- Shuffled some agent/xds imports in advance of a later xDS v3 upgrade.
- Remove support for envoy 1.13.x but don't add in 1.17.x yet. We have to wait until the xDS v3 support is added in a follow-up PR.

Fixes #8425
2021-02-22 15:00:15 -06:00
R.B. Boyer 39effd620c
xds: only try to create an ipv6 expose checks listener if ipv6 is supported by the kernel (#9765)
Fixes #9311

This only fails if the kernel has ipv6 hard-disabled. It is not sufficient to merely not provide an ipv6 address for a network interface.
2021-02-19 14:38:43 -06:00
R.B. Boyer 6eeccc93ce
connect: update supported envoy point releases to 1.16.2, 1.15.3, 1.14.6, 1.13.7 (#9737) 2021-02-10 13:11:15 -06:00
Chris Boulton 8a35df81c7
connect: add local_request_timeout_ms to configure local_app http timeouts (#9554) 2021-01-25 13:50:00 -06:00
Freddy fe728855ed
Add DC and NS support for Envoy metrics (#9207)
This PR updates the tags that we generate for Envoy stats.

Several of these come with breaking changes, since we can't keep two stats prefixes for a filter.
2020-11-16 16:37:19 -07:00
R.B. Boyer 8baf158ea8
Revert "Add namespace support for metrics (OSS) (#9117)" (#9124)
This reverts commit 06b3b017d3.
2020-11-06 10:24:32 -06:00
Freddy 06b3b017d3
Add namespace support for metrics (OSS) (#9117) 2020-11-05 18:24:29 -07:00
R.B. Boyer a2c50d3303
connect: add support for envoy 1.16.0, drop support for 1.12.x, and bump point releases as well (#8944)
Supported versions will be: "1.16.0", "1.15.2", "1.14.5", "1.13.6"
2020-10-22 13:46:19 -05:00
R.B. Boyer 1b413b0444
connect: support defining intentions using layer 7 criteria (#8839)
Extend Consul’s intentions model to allow for request-based access control enforcement for HTTP-like protocols in addition to the existing connection-based enforcement for unspecified protocols (e.g. tcp).
2020-10-06 17:09:13 -05:00
freddygv 768dbaa68d Add session flag to cookie config 2020-09-11 18:34:03 -06:00
freddygv 403a180430 Set tgw filter router config name to cluster name 2020-09-04 12:45:05 -06:00
freddygv 00f2794bfa Update golden files after default route fix for tgw 2020-09-03 12:35:11 -06:00
freddygv 30ba080d25 Add explicit protocol overrides in tgw xds test cases 2020-09-03 08:57:48 -06:00
freddygv 63f79e5f9b Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
freddygv 28d0602fc1 Pass LB config to Envoy via xDS 2020-08-28 14:27:40 -06:00
R.B. Boyer 74d5df7c7a
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
R.B. Boyer c599a2f5f4
xds: add support for envoy 1.15.0 and drop support for 1.11.x (#8424)
Related changes:

- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
2020-07-31 15:52:49 -05:00
R.B. Boyer 1eef096dfe
xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8222)
- cut down on extra node metadata transmission
- split the golden file generation to compare all envoy version
2020-07-09 17:04:51 -05:00
Chris Piraino 735337b170
Append port number to ingress host domain (#8190)
A port can be sent in the Host header as defined in the HTTP RFC, so we
take any hosts that we want to match traffic to and also add another
host with the listener port added.

Also fix an issue with envoy integration tests not running the
case-ingress-gateway-tls test.
2020-07-07 10:43:04 -05:00
Freddy 5baa7b1b04
Always return a gateway cluster (#8158) 2020-06-19 13:31:39 -06:00
Freddy 166a8b2a58
Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 13:46:17 -06:00
Chris Piraino 1a853fc954
Always require Host header values for http services (#7990)
Previously, we did not require the 'service-name.*' host header value
when on a single http service was exposed. However, this allows a user
to get into a situation where, if they add another service to the
listener, suddenly the previous service's traffic might not be routed
correctly. Thus, we always require the Host header, even if there is
only 1 service.

Also, we add the make the default domain matching more restrictive by
matching "service-name.ingress.*" by default. This lines up better with
the namespace case and more accurately matches the Consul DNS value we
expect people to use in this case.
2020-06-08 13:16:24 -05:00
Freddy 9ed325ba8b
Enable gateways to resolve hostnames to IPv4 addresses (#7999)
The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option.

If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS.

Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.
2020-06-03 15:28:45 -06:00
Raphaël Rondeau 0d2f178b7b
connect: fix endpoints clusterName when using cluster escape hatch (#7319)
```changelog
* fix(connect): fix endpoints clusterName when using cluster escape hatch
```
2020-05-26 10:57:22 +02:00
Kyle Havlovitz b14696e32a
Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924)
* Standardize support for Tagged and BindAddresses in Ingress Gateways

This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.

* Update command/connect/envoy/envoy.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* PR comments

* Check to see if address is an actual IP address

* Update agent/xds/listeners.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix whitespace

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-05-21 09:08:12 -05:00
Kyle Havlovitz 136549205c
Merge pull request #7759 from hashicorp/ingress/tls-hosts
Add TLS option for Ingress Gateway listeners
2020-05-11 09:18:43 -07:00
Freddy c32a4f1ece
Fix up enterprise compatibility for gateways (#7813) 2020-05-08 09:44:34 -06:00
Kyle Havlovitz f14c54e25e Add TLS option and DNS SAN support to ingress config
xds: Only set TLS context for ingress listener when requested
2020-05-06 15:12:02 -05:00
Chris Piraino 881760f701 xds: Use only the port number as the configured route name
This removes duplication of protocol from the stats_prefix
2020-05-06 15:06:13 -05:00
Chris Piraino f40833d094 Allow Hosts field to be set on an ingress config entry
- Validate that this cannot be set on a 'tcp' listener nor on a wildcard
service.
- Add Hosts field to api and test in consul config write CLI
- xds: Configure envoy with user-provided hosts from ingress gateways
2020-05-06 15:06:13 -05:00
Kyle Havlovitz 711d1389aa Support multiple listeners referencing the same service in gateway definitions 2020-05-06 15:06:13 -05:00
Kyle Havlovitz 247f9eaf13 Allow ingress gateways to route traffic based on Host header
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.

To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.

- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
  route configuration when on an ingress gateway. This allows default
  services to easily use host header routing
- ingress-gateways have a single route config for each listener
  that utilizes domain matching to route to different services.
2020-05-06 15:06:13 -05:00
Freddy 137a2c32c6
TLS Origination for Terminating Gateways (#7671) 2020-04-27 16:25:37 -06:00
freddygv 6abc71f915 Skip filter chain creation if no client cert 2020-04-27 11:08:41 -06:00
freddygv 09a8e5f36d Use golden files for gateway certs and fix listener test flakiness 2020-04-27 11:08:41 -06:00
freddygv 913b13f31f Add subset support 2020-04-27 11:08:40 -06:00
freddygv 219c78e586 Add xds cluster/listener/endpoint management 2020-04-27 11:08:40 -06:00
Chris Piraino ecc8a2d6f7 Allow ingress gateways to route through mesh gateways
- Adds integration test for mesh gateways local + remote modes with ingress
- ingress golden files updated for mesh gateway endpoints
2020-04-24 09:31:32 -05:00
Chris Piraino cb9df538d5 Add all the xds ingress tests
This commit copies many of the connect-proxy xds testcases and reuses
for ingress gateways. This allows us to more easily see changes to the
envoy configuration when make updates to ingress gateways.
2020-04-24 09:31:32 -05:00
Kyle Havlovitz e9e8c0e730
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
Andy Lindeman c1cb18c648
proxycfg: support path exposed with non-HTTP2 protocol (#7510)
If a proxied service is a gRPC or HTTP2 service, but a path is exposed
using the HTTP1 or TCP protocol, Envoy should not be configured with
`http2ProtocolOptions` for the cluster backing the path.

A situation where this comes up is a gRPC service whose healthcheck or
metrics route (e.g. for Prometheus) is an HTTP1 service running on
a different port. Previously, if these were exposed either using
`Expose: { Checks: true }` or `Expose: { Paths: ... }`, Envoy would
still be configured to communicate with the path over HTTP2, which would
not work properly.
2020-04-02 09:35:04 +02:00
Kim Ngo bef693df9c
agent/xds: Update mesh gateway to use service router timeout (#7444)
* website/connect/proxy/envoy: specify timeout precedence for services behind mesh gateway
2020-03-17 14:50:14 -05:00
R.B. Boyer 6adad71125
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Matt Keeler 4c9577678e
xDS Mesh Gateway Resolver Subset Fixes (#7294)
* xDS Mesh Gateway Resolver Subset Fixes

The first fix was that clusters were being generated for every service resolver subset regardless of there being any service instances of the associated service in that dc. The previous logic didn’t care at all but now it will omit generating those clusters unless we also have service instances that should be proxied.

The second fix was to respect the DefaultSubset of a service resolver so that mesh-gateways would configure the endpoints of the unnamed subset cluster to only those endpoints matched by the default subsets filters.

* Refactor the gateway endpoint generation to be a little easier to read
2020-02-19 11:57:55 -05:00
Chris Piraino 47ff532735
Fixes envoy config when both RetryOn* values are set (#7280) 2020-02-18 09:25:47 -06:00
Chris Piraino f3b54fa535
Allow configuration of upstream connection limits in Envoy (#6829)
* Adds 'limits' field to the upstream configuration of a connect proxy

This allows a user to configure the envoy connect proxy with
'max_connections', 'max_queued_requests', and 'max_concurrent_requests'. These
values are defined in the local proxy on a per-service instance basis
and should thus NOT be thought of as a global-level or even service-level value.
2019-12-03 14:13:33 -06:00
R.B. Boyer 97aa050c20
agent: allow mesh gateways to initialize even if there are no connect services registered yet (#6576)
Fixes #6543

Also improved some of the proxycfg tests to cover snapshot validity
better.
2019-10-17 16:46:49 -05:00
R.B. Boyer 8dcba472a2
xds: tcp services using the discovery chain should not assume RDS during LDS (#6623)
Previously the logic for configuring RDS during LDS for L7 upstreams was
overapplied to TCP proxies resulting in a cluster name of <emptystring>
being used incorrectly.

Fixes #6621
2019-10-17 16:44:59 -05:00
Freddy fdd10dd8b8
Expose HTTP-based paths through Connect proxy (#6446)
Fixes: #5396

This PR adds a proxy configuration stanza called expose. These flags register
listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only
listening on the loopback interface, while still accepting traffic from non
Connect-enabled services.

Under expose there is a boolean checks flag that would automatically expose all
registered HTTP and gRPC check paths.

This stanza also accepts a paths list to expose individual paths. The primary
use case for this functionality would be to expose paths for third parties like
Prometheus or the kubelet.

Listeners for requests to exposed paths are be configured dynamically at run
time. Any time a proxy, or check can be registered, a listener can also be
created.

In this initial implementation requests to these paths are not
authenticated/encrypted.
2019-09-25 20:55:52 -06:00
R.B. Boyer dfcdc41ef8
connect: allow 'envoy_cluster_json' escape hatch to continue to function (#6378) 2019-08-22 15:11:56 -05:00
R.B. Boyer ae79cdab1b
connect: introduce ExternalSNI field on service-defaults (#6324)
Compiling this will set an optional SNI field on each DiscoveryTarget.
When set this value should be used for TLS connections to the instances
of the target. If not set the default should be used.

Setting ExternalSNI will disable mesh gateway use for that target. It also 
disables several service-resolver features that do not make sense for an 
external service.
2019-08-19 12:19:44 -05:00
R.B. Boyer 72207256b9
xds: improve how envoy metrics are emitted (#6312)
Since generated envoy clusters all are named using (mostly) SNI syntax
we can have envoy read the various fields out of that structure and emit
it as stats labels to the various telemetry backends.

I changed the delimiter for the 'customization hash' from ':' to '~'
because ':' is always reencoded by envoy as '_' when generating metrics
keys.
2019-08-16 09:30:17 -05:00
R.B. Boyer 8e22d80e35
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer c395affc93
connect: expose an API endpoint to compile the discovery chain (#6248)
In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand.

Also removed ability to configure the envoy OverprovisioningFactor.
2019-08-02 15:34:54 -05:00
R.B. Boyer 6393edba53
connect: reconcile how upstream configuration works with discovery chains (#6225)
* connect: reconcile how upstream configuration works with discovery chains

The following upstream config fields for connect sidecars sanely
integrate into discovery chain resolution:

- Destination Namespace/Datacenter: Compilation occurs locally but using
different default values for namespaces and datacenters. The xDS
clusters that are created are named as they normally would be.

- Mesh Gateway Mode (single upstream): If set this value overrides any
value computed for any resolver for the entire discovery chain. The xDS
clusters that are created may be named differently (see below).

- Mesh Gateway Mode (whole sidecar): If set this value overrides any
value computed for any resolver for the entire discovery chain. If this
is specifically overridden for a single upstream this value is ignored
in that case. The xDS clusters that are created may be named differently
(see below).

- Protocol (in opaque config): If set this value overrides the value
computed when evaluating the entire discovery chain. If the normal chain
would be TCP or if this override is set to TCP then the result is that
we explicitly disable L7 Routing and Splitting. The xDS clusters that
are created may be named differently (see below).

- Connect Timeout (in opaque config): If set this value overrides the
value for any resolver in the entire discovery chain. The xDS clusters
that are created may be named differently (see below).

If any of the above overrides affect the actual result of compiling the
discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op
override to "tcp") then the relevant parameters are hashed and provided
to the xDS layer as a prefix for use in naming the Clusters. This is to
ensure that if one Upstream discovery chain has no overrides and
tangentially needs a cluster named "api.default.XXX", and another
Upstream does have overrides for "api.default.XXX" that they won't
cross-pollinate against the operator's wishes.

Fixes #6159
2019-08-01 22:03:34 -05:00
Matt Keeler fcc18c1675
Fix prepared query upstream endpoint generation (#6236)
Use the correct SNI value for prepared query upstreams
2019-07-29 11:15:55 -04:00
R.B. Boyer ad9e7b6ae9
connect: allow L7 routers to match on http methods (#6164)
Fixes #6158
2019-07-23 20:56:39 -05:00