Commit Graph

356 Commits

Author SHA1 Message Date
R.B. Boyer e79ce8ab03
xds: adding control of the mesh-wide min/max TLS versions and cipher suites from the mesh config entry (#12601)
- `tls.incoming`: applies to the inbound mTLS targeting the public
  listener on `connect-proxy` and `terminating-gateway` envoy instances

- `tls.outgoing`: applies to the outbound mTLS dialing upstreams from
  `connect-proxy` and `ingress-gateway` envoy instances

Fixes #11966
2022-03-30 13:43:59 -05:00
R.B. Boyer ac5bea862a
server: ensure that service-defaults meta is incorporated into the discovery chain response (#12511)
Also add a new "Default" field to the discovery chain response to clients
2022-03-30 10:04:18 -05:00
Eric cf3e517d0e Create and wire up the serverless patcher 2022-03-15 10:12:57 -04:00
R.B. Boyer 2a56e0055b
proxycfg: change how various proxycfg test helpers for making ConfigSnapshot copies works to be more correct and less error prone (#12531)
Prior to this PR for the envoy xDS golden tests in the agent/xds package we
were hand-creating a proxycfg.ConfigSnapshot structure in the proper format for
input to the xDS generator. Over time this intermediate structure has gotten
trickier to build correctly for the various tests.

This PR proposes to switch to using the existing mechanism for turning a
structs.NodeService and a sequence of cache.UpdateEvent copies into a
proxycfg.ConfigSnapshot, as that is less error prone to construct and aligns
more with how the data arrives.

NOTE: almost all of this is in test-related code. I tried super hard to craft
correct event inputs to get the golden files to be the same, or similar enough
after construction to feel ok that i recreated the spirit of the original test
cases.
2022-03-07 11:47:14 -06:00
freddygv ceb52d649a Account for upstream targets in another DC.
Transparent proxies typically cannot dial upstreams in remote
datacenters. However, if their upstream configures a redirect to a
remote DC then the upstream targets will be in another datacenter.

In that sort of case we should use the WAN address for the passthrough.
2022-02-10 17:01:57 -07:00
freddygv cbea3d203c Fix race of upstreams with same passthrough ip
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.

For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.

To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
2022-02-10 17:01:57 -07:00
freddygv 659ebc05a9 Ensure passthrough addresses get cleaned up
Transparent proxies can set up filter chains that allow direct
connections to upstream service instances. Services that can be dialed
directly are stored in the PassthroughUpstreams map of the proxycfg
snapshot.

Previously these addresses were not being cleaned up based on new
service health data. The list of addresses associated with an upstream
service would only ever grow.

As services scale up and down, eventually they will have instances
assigned to an IP that was previously assigned to a different service.
When IP addresses are duplicated across filter chain match rules the
listener config will be rejected by Envoy.

This commit updates the proxycfg snapshot management so that passthrough
addresses can get cleaned up when no longer associated with a given
upstream.

There is still the possibility of a race condition here where due to
timing an address is shared between multiple passthrough upstreams.
That concern is mitigated by #12195, but will be further addressed
in a follow-up.
2022-02-10 17:01:57 -07:00
freddygv c31c1158a6 Add failing test
The updated test fails because passthrough upstream addresses are not
being cleaned up.
2022-01-27 18:56:47 -07:00
R.B. Boyer b60d89e7ef bulk rewrite using this script
set -euo pipefail

    unset CDPATH

    cd "$(dirname "$0")"

    for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== require: $f ==="
        sed -i '/require := require.New(t)/d' $f
        # require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
        # require.XXX(tblah) but not require.XXX(t, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
        # require.XXX(rblah) but not require.XXX(r, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
        gofmt -s -w $f
    done

    for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== assert: $f ==="
        sed -i '/assert := assert.New(t)/d' $f
        # assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
        # assert.XXX(tblah) but not assert.XXX(t, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
        # assert.XXX(rblah) but not assert.XXX(r, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
        gofmt -s -w $f
    done
2022-01-20 10:46:23 -06:00
R.B. Boyer 424f3cdd2c
proxycfg: introduce explicit UpstreamID in lieu of bare string (#12125)
The gist here is that now we use a value-type struct proxycfg.UpstreamID
as the map key in ConfigSnapshot maps where we used to use "upstream
id-ish" strings. These are internal only and used just for bidirectional
trips through the agent cache keyspace (like the discovery chain target
struct).

For the few places where the upstream id needs to be projected into xDS,
that's what (proxycfg.UpstreamID).EnvoyID() is for. This lets us ALWAYS
inject the partition and namespace into these things without making
stuff like the golden testdata diverge.
2022-01-20 10:12:04 -06:00
Dhia Ayachi e653f81919
reset `coalesceTimer` to nil as soon as the event is consumed (#11924)
* reset `coalesceTimer` to nil as soon as the event is consumed

* add change log

* refactor to add relevant test.

* fix linter

* Apply suggestions from code review

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* remove non needed check

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-01-05 12:17:47 -05:00
freddygv 21f2c2e68d Purge chain if it shouldn't be there 2021-12-13 18:56:44 -07:00
freddygv d26b4860fd Account for new upstreams constraint in tests 2021-12-13 18:56:28 -07:00
freddygv 2fe27b748d Check ingress upstreams when gating chain watches 2021-12-13 18:56:28 -07:00
freddygv 6af9a0d8cf Avoid storing chain without an upstream 2021-12-13 18:56:14 -07:00
freddygv ba12dc215b Clean up chains separately from their watches 2021-12-13 18:56:14 -07:00
freddygv 70d6358426 Store intention upstreams in snapshot 2021-12-13 18:56:13 -07:00
R.B. Boyer 81ea8129d7
proxycfg: ensure all of the watches are canceled if they are cancelable (#11824) 2021-12-13 15:56:17 -06:00
R.B. Boyer 4aabbe529c
proxycfg: use external addresses in tproxy when crossing partition boundaries (#11823) 2021-12-13 14:34:49 -06:00
R.B. Boyer 631c649291
various partition related todos (#11822) 2021-12-13 11:43:33 -06:00
R.B. Boyer 1e02460bd1
re-run gofmt on 1.17 (#11579)
This should let freshly recompiled golangci-lint binaries using Go 1.17
pass 'make lint'
2021-11-16 12:04:01 -06:00
freddygv 0e507492d0 Update proxycfg for ingress service partitions 2021-11-12 14:33:31 -07:00
Freddy 00b5b0a0a2
Update filter chain creation for sidecar/ingress listeners (#11245)
The duo of `makeUpstreamFilterChainForDiscoveryChain` and `makeListenerForDiscoveryChain` were really hard to reason about, and led to concealing a bug in their branching logic. There were several issues here:

- They tried to accomplish too much: determining filter name, cluster name, and whether RDS should be used. 
- They embedded logic to handle significantly different kinds of upstream listeners (passthrough, prepared query, typical services, and catch-all)
- They needed to coalesce different data sources (Upstream and CompiledDiscoveryChain)

Rather than handling all of those tasks inside of these functions, this PR pulls out the RDS/clusterName/filterName logic.

This refactor also fixed a bug with the handling of [UpstreamDefaults](https://www.consul.io/docs/connect/config-entries/service-defaults#defaults). These defaults get stored as UpstreamConfig in the proxy snapshot with a DestinationName of "*", since they apply to all upstreams. However, this wildcard destination name must not be used when creating the name of the associated upstream cluster. The coalescing logic in the original functions here was in some situations creating clusters with a `*.` prefix, which is not a valid destination.
2021-11-09 14:43:51 -07:00
Daniel Upton 50a1f20ff9
xds: prefer fed state gateway definitions if they're fresher (#11522)
Fixes an issue described in #10132, where if two DCs are WAN federated
over mesh gateways, and the gateway in the non-primary DC is terminated
and receives a new IP address (as is commonly the case when running them
on ephemeral compute instances) the primary DC is unable to re-establish
its connection until the agent running on its own gateway is restarted.

This was happening because we always preferred gateways discovered by
the `Internal.ServiceDump` RPC (which would fail because there's no way
to dial the remote DC) over those discovered in the federation state,
which is replicated as long as the primary DC's gateway is reachable.
2021-11-09 16:45:36 +00:00
freddygv 60066e5154 Exclude default partition from GatewayKey string
This will behave the way we handle SNI and SPIFFE IDs, where the default
partition is excluded.

Excluding the default ensures that don't attempt to compare default.dc2
to dc2 in OSS.
2021-11-01 14:45:52 -06:00
freddygv e3666b0bc4 Update GatewayKeys deduplication
Federation states data is only keyed on datacenter, so it cannot be
directly compared against keys for gateway groups.
2021-11-01 13:58:53 -06:00
freddygv 90ce897456 Store GatewayKey in proxycfg snapshot for re-use 2021-11-01 13:58:53 -06:00
freddygv 4d4ccedb3a Update locality check in proxycfg 2021-11-01 13:58:53 -06:00
freddygv 3a2061544d Fixup partitions assertion 2021-10-27 11:15:25 -06:00
freddygv d28b9052b2 Move the exportingpartitions constant to enterprise 2021-10-27 11:15:25 -06:00
freddygv 448701dbd8 Replace default partition check 2021-10-27 11:15:25 -06:00
freddygv 12923f5ebc PR comments 2021-10-27 11:15:25 -06:00
freddygv a33b6923e0 Account for partitions in xds gen for mesh gw
This commit avoids skipping gateways in remote partitions of the local
DC when generating listeners/clusters/endpoints.
2021-10-27 11:15:25 -06:00
freddygv 110fae820a Update xds pkg to account for GatewayKey 2021-10-27 09:03:56 -06:00
freddygv 7e65678c52 Update mesh gateway proxy watches for partitions
This commit updates mesh gateway watches for cross-partitions
communication.

* Mesh gateways are keyed by partition and datacenter.

* Mesh gateways will now watch gateways in partitions that export
services to their partition.

* Mesh gateways in non-default partitions will not have cross-datacenter
watches. They are not involved in traditional WAN federation.
2021-10-27 09:03:56 -06:00
freddygv 37a16e9487 Replace Split with SplitN 2021-10-26 23:36:01 -06:00
freddygv b9b6447977 Finish removing useInDatacenter 2021-10-26 23:36:01 -06:00
freddygv 62e0fc62c1 Configure sidecars to watch gateways in partitions
Previously the datacenter of the gateway was the key identifier, now it
is the datacenter and partition.

When dialing services in other partitions or datacenters we now watch
the appropriate partition.
2021-10-26 23:35:37 -06:00
Paul Banks 78a00f2e1c Add support for enabling connect-based ingress TLS per listener. 2021-10-19 20:58:28 +01:00
Daniel Nephin eb632c53a2 structs: rename the last helper method.
This one gets used a bunch, but we can rename it to make the behaviour more obvious.
2021-09-29 11:48:38 -04:00
Daniel Nephin 6d72517682 structs: remove two methods that were only used once each.
These methods only called a single function. Wrappers like this end up making code harder to read
because it adds extra ways of doing things.

We already have many helper functions for constructing these types, we don't need additional methods.
2021-09-29 11:47:03 -04:00
Paul Banks 136928a90f Minor PR typo and cleanup fixes 2021-09-23 10:13:19 +01:00
Paul Banks 20d0bf81f7 Revert abandonned changes to proxycfg for Ent test consistency 2021-09-23 10:13:19 +01:00
Paul Banks 659321d008 Handle namespaces in route names correctly; add tests for enterprise 2021-09-23 10:09:11 +01:00
Paul Banks ccbda0c285 Update proxycfg to hold more ingress config state 2021-09-23 10:08:02 +01:00
Paul Banks 4e39f03d5b Add ingress-gateway config for SDS 2021-09-23 10:08:02 +01:00
freddygv 49248a0802 Fixup proxycfg tproxy case 2021-09-16 15:05:28 -06:00
freddygv 95a6db9cfa Account for partitions in ixn match/decision 2021-09-16 14:39:01 -06:00
freddygv 3f3a61c6e1 Fixup manager tests 2021-09-15 17:24:05 -06:00
freddygv 77681b9f6c Pass partition to intention match query 2021-09-15 17:23:52 -06:00
Paul Banks e22cc9c53a Header manip for split legs plumbing 2021-09-10 21:09:24 +01:00
Paul Banks 83fc8723a3 Header manip for service-router plumbed through 2021-09-10 21:09:24 +01:00
Paul Banks f439dfc04f Ingress gateway header manip plumbing 2021-09-10 21:09:24 +01:00
Dhia Ayachi bc0e4f2f46
partition dicovery chains (#10983)
* partition dicovery chains

* fix default partition for OSS
2021-09-07 16:29:32 -04:00
Dhia Ayachi 09197c989c
add partition to SNI when partition is non default (#10917) 2021-09-01 10:35:39 -04:00
freddygv f52bd80f6d Update comment for test function 2021-08-20 17:40:33 -06:00
freddygv af52d21884 Update prepared query cluster SAN validation
Previously SAN validation for prepared queries was broken because we
validated against the name, namespace, and datacenter for prepared
queries.

However, prepared queries can target:

- Services with a name that isn't their own
- Services in multiple datacenters

This means that the SpiffeID to validate needs to be based on the
prepared query endpoints, and not the prepared query's upstream
definition.

This commit updates prepared query clusters to account for that.
2021-08-20 17:40:33 -06:00
freddygv 85878685b7 Fixup proxy config test fixtures
- The TestNodeService helper created services with the fixed name "web",
and now that name is overridable.

- The discovery chain snapshot didn't have prepared query endpoints so
the endpoints tests were missing data for prepared queries
2021-08-20 17:38:57 -06:00
Dhia Ayachi 1950ebbe1f
oss portion of ent #1069 (#10883) 2021-08-20 12:57:45 -04:00
R.B. Boyer 097e1645e3
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00
Daniel Nephin 0575498d0d proxycfg: Lookup the agent token as a default
When no ACL token is provided with the service registration.
2021-08-12 15:51:34 -04:00
Daniel Nephin b313f495b8 proxycfg: Add a test to show the bug
When a token is not provided at registration, the agent token is not being used.
2021-08-12 15:47:59 -04:00
Freddy 19f6e1ca31
Log the correlation ID when blocking queries fire (#10689)
Knowing that blocking queries are firing does not provide much
information on its own. If we know the correlation IDs we can
piece together which parts of the snapshot have been populated.

Some of these responses might be empty from the blocking
query timing out. But if they're returning quickly I think we
can reasonably assume they contain data.
2021-07-23 16:36:17 -06:00
R.B. Boyer 188e8dc51f
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
freddygv b4c5c58c9b Add TODOs about partition handling 2021-07-14 22:21:55 -06:00
freddygv 47da00d3c7 Validate SANs for passthrough clusters and failovers 2021-07-14 22:21:55 -06:00
Daniel Nephin 10051cf6d3 proxycfg: remove unused method
This method was accidentally re-introduced in an earlier rebase. It was
removed in ed1082510d as part of the tproxy work.
2021-06-21 15:54:40 -04:00
Daniel Nephin 6bc5255028 proxycfg: move each handler into a seprate file
There is no interaction between these handlers, so splitting them into separate files
makes it easier to discover the full implementation of each kindHandler.
2021-06-21 15:48:40 -04:00
Daniel Nephin 19d3eeff3c
Merge pull request #9489 from hashicorp/dnephin/proxycfg-state-2
proxycfg: split state into a handler for each kind
2021-06-18 13:57:28 -04:00
Nitya Dhanushkodi 52043830b4 proxycfg: reference to entry in map should not panic 2021-06-17 11:49:04 -07:00
Daniel Nephin e738fa3b80 Replace type conversion with embedded structs 2021-06-17 13:23:35 -04:00
Daniel Nephin 32c15d9a88 proxycfg: split state into kind-specific types
This commit extracts all the kind-specific logic into handler types, and
keeps the generic parts on the state struct. This change should make it
easier to add new kinds, and see the implementation of each kind more
clearly.
2021-06-16 14:04:01 -04:00
Daniel Nephin cd05df7157 proxycfg: unmethod hostnameEndpoints
the method receiver can be replaced by the first argument.

This will allow us to extract more from the state struct in the future.
2021-06-16 14:03:30 -04:00
Daniel Nephin 97c6ee00d7 Remove duplicate import
because two PRs crossed paths.
2021-06-16 13:19:54 -04:00
Daniel Nephin 0547d0c046
Merge pull request #9466 from hashicorp/dnephin/proxycfg-state
proxycfg: prepare state for split by kind
2021-06-16 13:14:26 -04:00
Nitya Dhanushkodi b8b44419a0
proxycfg: Ensure that endpoints for explicit upstreams in other datacenters are watched in transparent mode (#10391)
Co-authored-by: Freddy Vallenilla <freddy@hashicorp.com>
2021-06-15 11:00:26 -07:00
Daniel Nephin 016c5611d1 proxycfg: extract two types from state struct
These two new struct types will allow us to make polymorphic handler for each kind, instad of
having all the logic for each proxy kind on the state struct.
2021-06-10 17:42:17 -04:00
Daniel Nephin 9c40aa729f proxycfg: pass context around where it is needed
context.Context should never be stored on a struct (as it says in the godoc) because it is easy to
to end up with the wrong context when it is stored.

Also see https://blog.golang.org/context-and-structs

This change is also in preparation for splitting state into kind-specific handlers so that the
implementation of each kind is grouped together.
2021-06-10 17:34:50 -04:00
Freddy 429f9d8bb8
Add flag for transparent proxies to dial individual instances (#10329) 2021-06-09 14:34:17 -06:00
freddygv c73703c08b Ensure entmeta is encoded in test correlationID 2021-05-05 12:31:23 -06:00
Daniel Nephin 347f3d2128
Merge pull request #10155 from hashicorp/dnephin/config-entry-remove-fields
config-entry: remove Kind and Name field from Mesh config entry
2021-05-04 17:27:56 -04:00
Mark Anderson 6be9cebad0 Add tests for xds/listeners
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-05-04 12:41:43 -07:00
Mark Anderson 06f0f79218 Continue working through proxy and agent
Rework/listeners, rename makeListener

Refactor, tests pass

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-05-04 12:41:43 -07:00
Freddy ed1082510d
Fixup discovery chain handling in transparent mode (#10168)
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

Previously we would associate the address of a discovery chain target
with the discovery chain's filter chain. This was broken for a few reasons:

- If the upstream is a virtual service, the client proxy has no way of
dialing it because virtual services are not targets of their discovery
chains. The targets are distinct services. This is addressed by watching
the endpoints of all upstream services, not just their discovery chain
targets.

- If multiple discovery chains resolve to the same target, that would
lead to multiple filter chains attempting to match on the target's
virtual IP. This is addressed by only matching on the upstream's virtual
IP.

NOTE: this implementation requires an intention to the redirecting
virtual service and not just to the final destination. This is how
we can know that the virtual service is an upstream to watch.

A later PR will look into traversing discovery chains when computing
upstreams so that intentions are only required to the discovery chain
targets.
2021-05-04 08:45:19 -06:00
Daniel Nephin 62efaaab21 config-entry: remove Kind and Name field from Mesh config entry
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.

In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
2021-04-29 17:11:21 -04:00
R.B. Boyer 71d45a3460
Support Incremental xDS mode (#9855)
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.

Union of all commit messages follows to give an overarching summary:

xds: exclusively support incremental xDS when using xDS v3

Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support

Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit

xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings

In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.

This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.

xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
2021-04-29 13:54:05 -05:00
Freddy 078c40425f
Rename "cluster" config entry to "mesh" (#10127)
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.

Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
2021-04-28 16:13:29 -06:00
Daniel Nephin 2a26085b2c connect: do not set QuerySource.Node
Setting this field to a value is equivalent to using the 'near' query paramter.
The intent is to sort the results by proximity to the node requesting
them. However with connect we send the results to envoy, which doesn't
care about the order, so setting this field is increasing the work
performed for no gain.

It is necessary to unset this field now because we would like connect
to use streaming, but streaming does not support sorting by proximity.
2021-04-27 19:03:16 -04:00
Freddy 439a7fce2d
Split Upstream.Identifier() so non-empty namespace is always prepended in ent (#10031) 2021-04-15 13:54:40 -06:00
freddygv 8857195437 Fixup wildcard ent assertion 2021-04-12 17:04:33 -06:00
freddygv 7bd51ff536 Replace TransparentProxy bool with ProxyMode
This PR replaces the original boolean used to configure transparent
proxy mode. It was replaced with a string mode that can be set to:

- "": Empty string is the default for when the setting should be
defaulted from other configuration like config entries.
- "direct": Direct mode is how applications originally opted into the
mesh. Proxy listeners need to be dialed directly.
- "transparent": Transparent mode enables configuring Envoy as a
transparent proxy. Traffic must be captured and redirected to the
inbound and outbound listeners.

This PR also adds a struct for transparent proxy specific configuration.
Initially this is not stored as a pointer. Will revisit that decision
before GA.
2021-04-12 09:35:14 -06:00
freddygv b21224a4c8 PR comments 2021-04-08 11:16:03 -06:00
freddygv 49a4a78fd5 Ensure mesh gateway mode override is set for upstreams for intentions 2021-04-07 09:32:48 -06:00
freddygv 5140c3e51f Finish resolving upstream defaults in proxycfg 2021-04-07 09:32:48 -06:00
R.B. Boyer 499fee73b3
connect: add toggle to globally disable wildcard outbound network access when transparent proxy is enabled (#9973)
This adds a new config entry kind "cluster" with a single special name "cluster" where this can be controlled.
2021-04-06 13:19:59 -05:00
freddygv 098b9af901 Fixup enterprise tests from tproxy changes 2021-03-17 23:05:00 -06:00
freddygv eb1e0a1751 Cancel watch on all errors 2021-03-17 21:44:14 -06:00
freddygv f4f45af6d0 Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv 0da8702f34 PR comments 2021-03-17 16:18:56 -06:00
freddygv a54d6a9010 Update proxycfg for transparent proxy 2021-03-17 13:40:39 -06:00
Daniel Nephin f40b76af2d proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin 906834ce8e proxycfg: Use streaming in connect state 2021-03-12 11:35:42 -05:00
Freddy 82c269a7c5
Avoid potential proxycfg/xDS deadlock using non-blocking send 2021-02-08 16:14:06 -07:00
freddygv ec5f75776b Update comments on avoiding proxycfg deadlock 2021-02-08 09:45:45 -07:00
R.B. Boyer 43193a35c6
xds: prevent LDS flaps in mesh gateways due to unstable datacenter lists (#9651)
Also fix a similar issue in Terminating Gateways that was masked by an overzealous test.
2021-02-08 10:19:57 -06:00
freddygv 6e443e5536 Retry send after timer fires, in case no updates occur 2021-02-05 18:00:59 -07:00
freddygv 95e7641faa Update proxycfg logging, labels were already attached 2021-02-05 15:14:49 -07:00
freddygv 5ba14ad41d Add trace logs to proxycfg state runner and xds srv 2021-02-02 12:26:38 -07:00
freddygv 37190c0d0d Avoid potential deadlock using non-blocking send
Deadlock scenario:
    1. Due to scheduling, the state runner sends one snapshot into
    snapCh and then attempts to send a second. The first send succeeds
    because the channel is buffered, but the second blocks.
    2. Separately, Manager.Watch is called by the xDS server after
    getting a discovery request from Envoy. This function acquires the
    manager lock and then blocks on receiving the CurrentSnapshot from
    the state runner.
    3. Separately, there is a Manager goroutine that reads the snapshots
    from the channel in step 1. These reads are done to notify proxy
    watchers, but they require holding the manager lock. This goroutine
    goes to acquire that lock, but can't because it is held by step 2.

Now, the goroutine from step 3 is waiting on the one from step 2 to
release the lock. The goroutine from step 2 won't release the lock until
the goroutine in step 1 advances. But the goroutine in step 1 is waiting
for the one in step 3. Deadlock.

By making this send non-blocking step 1 above can proceed. The coalesce
timer will be reset and a new valid snapshot will be delivered after it
elapses or when one is requested by xDS.
2021-02-02 11:31:14 -07:00
Daniel Nephin b9e60c0775 testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
freddygv 856d5a25ee Fix text type assertion 2020-09-14 16:28:40 -06:00
freddygv 7fd518ff1d Merge master 2020-09-14 16:17:43 -06:00
freddygv 87541ab80a Fix type assertion 2020-09-14 16:12:21 -06:00
freddygv 768dbaa68d Add session flag to cookie config 2020-09-11 18:34:03 -06:00
freddygv eab90ea9fa Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
freddygv 30ba080d25 Add explicit protocol overrides in tgw xds test cases 2020-09-03 08:57:48 -06:00
freddygv f81fe6a1a1 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
freddygv 63f79e5f9b Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
freddygv 28d0602fc1 Pass LB config to Envoy via xDS 2020-08-28 14:27:40 -06:00
R.B. Boyer 74d5df7c7a
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
Matt Keeler be01c4241d
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Pierre Souchay 505de6dc29
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler 12acdd7481
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Daniel Nephin 010a609912 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Freddy 5baa7b1b04
Always return a gateway cluster (#8158) 2020-06-19 13:31:39 -06:00
Daniel Nephin 5afcf5c1bc
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Daniel Nephin 068b43df90 Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin cb050b280c ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
freddygv 19e3954603 Move compound service names to use ServiceName type 2020-06-12 13:47:43 -06:00
Freddy 166a8b2a58
Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 13:46:17 -06:00
Freddy 9ed325ba8b
Enable gateways to resolve hostnames to IPv4 addresses (#7999)
The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option.

If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS.

Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.
2020-06-03 15:28:45 -06:00
Daniel Nephin c88fae0aac ci: Add staticcheck and fix most errors
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.

In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
2020-05-28 11:59:58 -04:00
Kyle Havlovitz b14696e32a
Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924)
* Standardize support for Tagged and BindAddresses in Ingress Gateways

This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.

* Update command/connect/envoy/envoy.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* PR comments

* Check to see if address is an actual IP address

* Update agent/xds/listeners.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix whitespace

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-05-21 09:08:12 -05:00
Chris Piraino 9d9e23cc44 Add service id context to the proxycfg logger
This is especially useful when multiple proxies are all querying the
same Consul agent.
2020-05-18 09:08:05 -05:00
Kyle Havlovitz 136549205c
Merge pull request #7759 from hashicorp/ingress/tls-hosts
Add TLS option for Ingress Gateway listeners
2020-05-11 09:18:43 -07:00
Chris Piraino a0e1f57ac2 Remove development log line 2020-05-08 20:24:18 -07:00
Chris Piraino 26f92e74f6 Compute all valid DNSSANs for ingress gateways
For DNSSANs we take into account the following and compute the
appropriate wildcard values:
- source datacenter
- namespaces
- alt domains
2020-05-08 20:23:17 -07:00
Freddy c32a4f1ece
Fix up enterprise compatibility for gateways (#7813) 2020-05-08 09:44:34 -06:00
Chris Piraino 0bd5618cb2 Cleanup proxycfg for TLS
- Use correct enterprise metadata for finding config entry
- nil out cancel functions on config snapshot copy
- Look at HostsSet when checking validity
2020-05-07 10:22:57 -05:00
Freddy b069887b2a
Remove timeout and call to Fatal from goroutine (#7797) 2020-05-06 14:33:17 -06:00
Kyle Havlovitz f14c54e25e Add TLS option and DNS SAN support to ingress config
xds: Only set TLS context for ingress listener when requested
2020-05-06 15:12:02 -05:00
Chris Piraino 881760f701 xds: Use only the port number as the configured route name
This removes duplication of protocol from the stats_prefix
2020-05-06 15:06:13 -05:00
Chris Piraino f40833d094 Allow Hosts field to be set on an ingress config entry
- Validate that this cannot be set on a 'tcp' listener nor on a wildcard
service.
- Add Hosts field to api and test in consul config write CLI
- xds: Configure envoy with user-provided hosts from ingress gateways
2020-05-06 15:06:13 -05:00
Kyle Havlovitz 711d1389aa Support multiple listeners referencing the same service in gateway definitions 2020-05-06 15:06:13 -05:00
Kyle Havlovitz 247f9eaf13 Allow ingress gateways to route traffic based on Host header
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.

To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.

- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
  route configuration when on an ingress gateway. This allows default
  services to easily use host header routing
- ingress-gateways have a single route config for each listener
  that utilizes domain matching to route to different services.
2020-05-06 15:06:13 -05:00
Freddy 137a2c32c6
TLS Origination for Terminating Gateways (#7671) 2020-04-27 16:25:37 -06:00
freddygv 034d7d83d4 Fix snapshot IsEmpty 2020-04-27 11:08:41 -06:00
Freddy 3b1b24c2ce Update agent/proxycfg/state_test.go 2020-04-27 11:08:41 -06:00
freddygv eddd5bd73b PR comments 2020-04-27 11:08:41 -06:00
freddygv 6abc71f915 Skip filter chain creation if no client cert 2020-04-27 11:08:41 -06:00
freddygv 09a8e5f36d Use golden files for gateway certs and fix listener test flakiness 2020-04-27 11:08:41 -06:00
freddygv 840d27a9d5 Un-nest switch in gateway update handler 2020-04-27 11:08:40 -06:00
freddygv c0e1751878 Allow terminating-gateway to setup listener before servicegroups are known 2020-04-27 11:08:40 -06:00
freddygv 913b13f31f Add subset support 2020-04-27 11:08:40 -06:00
freddygv 219c78e586 Add xds cluster/listener/endpoint management 2020-04-27 11:08:40 -06:00
freddygv 24207226ca Add proxycfg state management for terminating-gateways 2020-04-27 11:07:06 -06:00
Chris Piraino cb9df538d5 Add all the xds ingress tests
This commit copies many of the connect-proxy xds testcases and reuses
for ingress gateways. This allows us to more easily see changes to the
envoy configuration when make updates to ingress gateways.
2020-04-24 09:31:32 -05:00
Chris Piraino 0ca9b606e8 Pull out setupTestVariationConfigEntriesAndSnapshot in proxycfg
This allows us to reuse the same variations for ingress gateway testing
2020-04-24 09:31:32 -05:00
Daniel Nephin 5fe7043439 agent/cache: Make all cache options RegisterOptions
Previously the SupportsBlocking option was specified by a method on the
type, and all the other options were specified from RegisterOptions.

This change moves RegisterOptions to a method on the type, and moves
SupportsBlocking into the options struct.

Currently there are only 2 cache-types. So all cache-types can implement
this method by embedding a struct with those predefined values. In the
future if a cache type needs to be registered more than once with different
options it can remove the embedded type and implement the method in a way
that allows for paramaterization.
2020-04-16 18:56:34 -04:00
Kyle Havlovitz e9e8c0e730
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
Chris Piraino 584f90bbeb
Fix flapping of mesh gateway connect-service watches (#7575) 2020-04-02 10:12:13 -05:00
Andy Lindeman c1cb18c648
proxycfg: support path exposed with non-HTTP2 protocol (#7510)
If a proxied service is a gRPC or HTTP2 service, but a path is exposed
using the HTTP1 or TCP protocol, Envoy should not be configured with
`http2ProtocolOptions` for the cluster backing the path.

A situation where this comes up is a gRPC service whose healthcheck or
metrics route (e.g. for Prometheus) is an HTTP1 service running on
a different port. Previously, if these were exposed either using
`Expose: { Checks: true }` or `Expose: { Paths: ... }`, Envoy would
still be configured to communicate with the path over HTTP2, which would
not work properly.
2020-04-02 09:35:04 +02:00
R.B. Boyer 6adad71125
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Matt Keeler 4c9577678e
xDS Mesh Gateway Resolver Subset Fixes (#7294)
* xDS Mesh Gateway Resolver Subset Fixes

The first fix was that clusters were being generated for every service resolver subset regardless of there being any service instances of the associated service in that dc. The previous logic didn’t care at all but now it will omit generating those clusters unless we also have service instances that should be proxied.

The second fix was to respect the DefaultSubset of a service resolver so that mesh-gateways would configure the endpoints of the unnamed subset cluster to only those endpoints matched by the default subsets filters.

* Refactor the gateway endpoint generation to be a little easier to read
2020-02-19 11:57:55 -05:00
Lars Lehtonen 6bcd596539
agent/proxycfg: fix dropped error in state.initWatchesMeshGateway() (#7267) 2020-02-18 14:41:01 +01:00
Matt Keeler 9e5fd7f925
OSS Changes for various config entry namespacing bugs (#7226) 2020-02-06 10:52:25 -05:00
Matt Keeler dfb0177dbc
Testing updates to support namespaced testing of the agent/xds… (#7185)
* Various testing updates to support namespaced testing of the agent/xds package

* agent/proxycfg package updates to support better namespace testing
2020-02-03 09:26:47 -05:00
Chris Piraino 401221de58
Allow users to configure either unstructured or JSON logging (#7130)
* hclog Allow users to choose between unstructured and JSON logging
2020-01-28 17:50:41 -06:00
Matt Keeler c09693e545
Updates to Config Entries and Connect for Namespaces (#7116) 2020-01-24 10:04:58 -05:00
Aestek ba8fd8296f Add support for dual stack IPv4/IPv6 network (#6640)
* Use consts for well known tagged adress keys

* Add ipv4 and ipv6 tagged addresses for node lan and wan

* Add ipv4 and ipv6 tagged addresses for service lan and wan

* Use IPv4 and IPv6 address in DNS
2020-01-17 09:54:17 -05:00
Matt Keeler 27f49eede9
Move where the service-resolver watch is done so that it happen… (#7025)
Before we were issuing 1 watch for every service in the services listing which would have caused the agent to process many more identical events simultaneously.
2020-01-10 10:30:13 -05:00
Matt Keeler 5934f803bf
Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
R.B. Boyer 2011f3d7dc
xds: mesh gateway CDS requests are now allowed to receive an empty CDS reply (#6787)
This is the rest of the fix for #6543 that was incompletely fixed in #6576.
2019-11-26 15:55:13 -06:00
R.B. Boyer 97aa050c20
agent: allow mesh gateways to initialize even if there are no connect services registered yet (#6576)
Fixes #6543

Also improved some of the proxycfg tests to cover snapshot validity
better.
2019-10-17 16:46:49 -05:00
R.B. Boyer 9566df524e
agent: cache notifications work after error if the underlying RPC returns index=1 (#6547)
Fixes #6521

Ensure that initial failures to fetch an agent cache entry using the
notify API where the underlying RPC returns a synthetic index of 1
correctly recovers when those RPCs resume working.

The bug in the Cache.notifyBlockingQuery used to incorrectly "fix" the
index for the next query from 0 to 1 for all queries, when it should
have not done so for queries that errored.

Also fixed some things that made debugging difficult:

- config entry read/list endpoints send back QueryMeta headers
- xds event loops don't swallow the cache notification errors
2019-09-26 10:42:17 -05:00
Freddy fdd10dd8b8
Expose HTTP-based paths through Connect proxy (#6446)
Fixes: #5396

This PR adds a proxy configuration stanza called expose. These flags register
listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only
listening on the loopback interface, while still accepting traffic from non
Connect-enabled services.

Under expose there is a boolean checks flag that would automatically expose all
registered HTTP and gRPC check paths.

This stanza also accepts a paths list to expose individual paths. The primary
use case for this functionality would be to expose paths for third parties like
Prometheus or the kubelet.

Listeners for requests to exposed paths are be configured dynamically at run
time. Any time a proxy, or check can be registered, a listener can also be
created.

In this initial implementation requests to these paths are not
authenticated/encrypted.
2019-09-25 20:55:52 -06:00
R.B. Boyer af01d397a5
connect: don't colon-hex-encode the AuthorityKeyId and SubjectKeyId fields in connect certs (#6492)
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.

The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).
2019-09-23 12:52:35 -05:00
R.B. Boyer dfcdc41ef8
connect: allow 'envoy_cluster_json' escape hatch to continue to function (#6378) 2019-08-22 15:11:56 -05:00
R.B. Boyer 561b2fe606
connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340) 2019-08-19 13:03:03 -05:00
R.B. Boyer ae79cdab1b
connect: introduce ExternalSNI field on service-defaults (#6324)
Compiling this will set an optional SNI field on each DiscoveryTarget.
When set this value should be used for TLS connections to the instances
of the target. If not set the default should be used.

Setting ExternalSNI will disable mesh gateway use for that target. It also 
disables several service-resolver features that do not make sense for an 
external service.
2019-08-19 12:19:44 -05:00
Mike Morris 65be58703c
connect: remove managed proxies (#6220)
* connect: remove managed proxies implementation and all supporting config options and structs

* connect: remove deprecated ProxyDestination

* command: remove CONNECT_PROXY_TOKEN env var

* agent: remove entire proxyprocess proxy manager

* test: remove all managed proxy tests

* test: remove irrelevant managed proxy note from TestService_ServerTLSConfig

* test: update ContentHash to reflect managed proxy removal

* test: remove deprecated ProxyDestination test

* telemetry: remove managed proxy note

* http: remove /v1/agent/connect/proxy endpoint

* ci: remove deprecated test exclusion

* website: update managed proxies deprecation page to note removal

* website: remove managed proxy configuration API docs

* website: remove managed proxy note from built-in proxy config

* website: add note on removing proxy subdirectory of data_dir
2019-08-09 15:19:30 -04:00
R.B. Boyer 8e22d80e35
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer c395affc93
connect: expose an API endpoint to compile the discovery chain (#6248)
In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand.

Also removed ability to configure the envoy OverprovisioningFactor.
2019-08-02 15:34:54 -05:00
R.B. Boyer f02924fafe
connect: simplify the compiled discovery chain data structures (#6242)
This should make them better for sending over RPC or the API.

Instead of a chain implemented explicitly like a linked list (nodes
holding pointers to other nodes) instead switch to a flat map of named
nodes with nodes linking other other nodes by name. The shipped
structure is just a map and a string to indicate which key to start
from.

Other changes:

* inline the compiler option InferDefaults as true

* introduce compiled target config to avoid needing to send back
  additional maps of Resolvers; future target-specific compiled state
  can go here

* move compiled MeshGateway out of the Resolver and into the
  TargetConfig where it makes more sense.
2019-08-01 22:44:05 -05:00
R.B. Boyer 6393edba53
connect: reconcile how upstream configuration works with discovery chains (#6225)
* connect: reconcile how upstream configuration works with discovery chains

The following upstream config fields for connect sidecars sanely
integrate into discovery chain resolution:

- Destination Namespace/Datacenter: Compilation occurs locally but using
different default values for namespaces and datacenters. The xDS
clusters that are created are named as they normally would be.

- Mesh Gateway Mode (single upstream): If set this value overrides any
value computed for any resolver for the entire discovery chain. The xDS
clusters that are created may be named differently (see below).

- Mesh Gateway Mode (whole sidecar): If set this value overrides any
value computed for any resolver for the entire discovery chain. If this
is specifically overridden for a single upstream this value is ignored
in that case. The xDS clusters that are created may be named differently
(see below).

- Protocol (in opaque config): If set this value overrides the value
computed when evaluating the entire discovery chain. If the normal chain
would be TCP or if this override is set to TCP then the result is that
we explicitly disable L7 Routing and Splitting. The xDS clusters that
are created may be named differently (see below).

- Connect Timeout (in opaque config): If set this value overrides the
value for any resolver in the entire discovery chain. The xDS clusters
that are created may be named differently (see below).

If any of the above overrides affect the actual result of compiling the
discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op
override to "tcp") then the relevant parameters are hashed and provided
to the xDS layer as a prefix for use in naming the Clusters. This is to
ensure that if one Upstream discovery chain has no overrides and
tangentially needs a cluster named "api.default.XXX", and another
Upstream does have overrides for "api.default.XXX" that they won't
cross-pollinate against the operator's wishes.

Fixes #6159
2019-08-01 22:03:34 -05:00
Matt Keeler fcc18c1675
Fix prepared query upstream endpoint generation (#6236)
Use the correct SNI value for prepared query upstreams
2019-07-29 11:15:55 -04:00
R.B. Boyer e039dfd7f8
connect: rework how the service resolver subset OnlyPassing flag works (#6173)
The main change is that we no longer filter service instances by health,
preferring instead to render all results down into EDS endpoints in
envoy and merely label the endpoints as HEALTHY or UNHEALTHY.

When OnlyPassing is set to true we will force consul checks in a
'warning' state to render as UNHEALTHY in envoy.

Fixes #6171
2019-07-23 20:20:24 -05:00
R.B. Boyer e8132b61c0
add test for discovery chain agent cache-type (#6130) 2019-07-15 10:09:52 -05:00
Matt Keeler 4728329aeb
Various Gateway Fixes (#6093)
* Ensure the mesh gateway configuration comes back in the api within each upstream

* Add a test for the MeshGatewayConfig in the ToAPI functions

* Ensure we don’t use gateways for dc local connections

* Update the svc kind index for deletions

* Replace the proxycfg.state cache with an interface for testing

Also start implementing proxycfg state testing.

* Update the state tests to verify some gateway watches for upstream-targets of a discovery chain.
2019-07-12 17:19:37 -04:00
R.B. Boyer bcd2de3a2e
implement some missing service-router features and add more xDS testing (#6065)
- also implement OnlyPassing filters for non-gateway clusters
2019-07-12 14:16:21 -05:00
Jack Pearkes e6f1b78efb Make cluster names SNI always (#6081)
* Make cluster names SNI always

* Update some tests

* Ensure we check for prepared query types

* Use sni for route cluster names

* Proper mesh gateway mode defaulting when the discovery chain is used

* Ignore service splits from PatchSliceOfMaps

* Update some xds golden files for proper test output

* Allow for grpc/http listeners/cluster configs with the disco chain

* Update stats expectation
2019-07-08 12:48:48 +01:00
Matt Keeler 3d562bee5c Fix Internal.ServiceDump blocking (#6076)
maxIndexWatchTxn was only watching the IndexEntry of the max index of all the entries. It needed to watch all of them regardless of which was the max.

Also plumbed the query source through in the proxy config to help better track requests.
2019-07-04 16:17:49 +01:00
hashicorp-ci 7a32c5a618 Merge Consul OSS branch 'master' at commit a58d8e91ac 2019-07-03 02:00:31 +00:00
Matt Keeler a8e2e866e3 Update xds/proxycfg tests to use the same looking trust domain as a normal system
This is to prevent confusion about what our SNI fields actually look like.
2019-07-02 10:29:37 -04:00
Matt Keeler a7421c160f Implement mesh gateway management of service subsets
Fixup some error handling
2019-07-02 10:29:37 -04:00
R.B. Boyer 4bdb690a25
activate most discovery chain features in xDS for envoy (#6024) 2019-07-01 22:10:51 -05:00
Matt Keeler bdebe62fd0
Fix some tests that I broke when refactoring the ConfigSnapshot (#6051)
* Fix some tests that I broke when refactoring the ConfigSnapshot

* Make sure the MeshGateway config is added to all the right api structs

* Fix some more tests
2019-07-01 19:47:58 -04:00
Pierre Souchay fd9237a1ff Bump timeout in TestManager_BasicLifecycle (#6030) 2019-07-01 17:02:00 -06:00
Matt Keeler 8d953f5840 Implement Mesh Gateways
This includes both ingress and egress functionality.
2019-07-01 16:28:30 -04:00
R.B. Boyer f7fdf18335
fix test that was failing after #6013 (#6026) 2019-06-27 09:31:19 -05:00