126 Commits

Author SHA1 Message Date
Freddy
168073c4dc Add flag for transparent proxies to dial individual instances (#10329) 2021-06-09 20:39:37 +00:00
Freddy
32e013c834 Merge pull request #10187 from hashicorp/fixup/ent-tproxy-test 2021-05-06 20:48:25 +00:00
Mark Anderson
42ff449d4f Merge pull request #9981 from hashicorp/ma/uds_upstreams
Unix Domain Socket support for upstreams and downstreams
2021-05-05 16:17:32 -04:00
Daniel Nephin
c1d1be2a4b Merge pull request #10155 from hashicorp/dnephin/config-entry-remove-fields
config-entry: remove Kind and Name field from Mesh config entry
2021-05-04 21:28:26 +00:00
Freddy
2d633ed804 Fixup discovery chain handling in transparent mode (#10168)
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

Previously we would associate the address of a discovery chain target
with the discovery chain's filter chain. This was broken for a few reasons:

- If the upstream is a virtual service, the client proxy has no way of
dialing it because virtual services are not targets of their discovery
chains. The targets are distinct services. This is addressed by watching
the endpoints of all upstream services, not just their discovery chain
targets.

- If multiple discovery chains resolve to the same target, that would
lead to multiple filter chains attempting to match on the target's
virtual IP. This is addressed by only matching on the upstream's virtual
IP.

NOTE: this implementation requires an intention to the redirecting
virtual service and not just to the final destination. This is how
we can know that the virtual service is an upstream to watch.

A later PR will look into traversing discovery chains when computing
upstreams so that intentions are only required to the discovery chain
targets.
2021-05-04 14:46:53 +00:00
R.B. Boyer
6a39b47448 Support Incremental xDS mode (#9855)
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.

Union of all commit messages follows to give an overarching summary:

xds: exclusively support incremental xDS when using xDS v3

Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support

Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit

xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings

In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.

This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.

xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
2021-04-29 18:54:53 +00:00
Freddy
c652580b5b Rename "cluster" config entry to "mesh" (#10127)
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.

Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
2021-04-28 22:14:03 +00:00
Daniel Nephin
798953f57d Merge pull request #10112 from hashicorp/dnephin/remove-streaming-from-cache
streaming: replace agent/cache with submatview.Store
2021-04-28 21:58:32 +00:00
Freddy
439a7fce2d
Split Upstream.Identifier() so non-empty namespace is always prepended in ent (#10031) 2021-04-15 13:54:40 -06:00
freddygv
8857195437 Fixup wildcard ent assertion 2021-04-12 17:04:33 -06:00
freddygv
7bd51ff536 Replace TransparentProxy bool with ProxyMode
This PR replaces the original boolean used to configure transparent
proxy mode. It was replaced with a string mode that can be set to:

- "": Empty string is the default for when the setting should be
defaulted from other configuration like config entries.
- "direct": Direct mode is how applications originally opted into the
mesh. Proxy listeners need to be dialed directly.
- "transparent": Transparent mode enables configuring Envoy as a
transparent proxy. Traffic must be captured and redirected to the
inbound and outbound listeners.

This PR also adds a struct for transparent proxy specific configuration.
Initially this is not stored as a pointer. Will revisit that decision
before GA.
2021-04-12 09:35:14 -06:00
freddygv
b21224a4c8 PR comments 2021-04-08 11:16:03 -06:00
freddygv
49a4a78fd5 Ensure mesh gateway mode override is set for upstreams for intentions 2021-04-07 09:32:48 -06:00
freddygv
5140c3e51f Finish resolving upstream defaults in proxycfg 2021-04-07 09:32:48 -06:00
R.B. Boyer
499fee73b3
connect: add toggle to globally disable wildcard outbound network access when transparent proxy is enabled (#9973)
This adds a new config entry kind "cluster" with a single special name "cluster" where this can be controlled.
2021-04-06 13:19:59 -05:00
freddygv
098b9af901 Fixup enterprise tests from tproxy changes 2021-03-17 23:05:00 -06:00
freddygv
eb1e0a1751 Cancel watch on all errors 2021-03-17 21:44:14 -06:00
freddygv
f4f45af6d0 Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv
0da8702f34 PR comments 2021-03-17 16:18:56 -06:00
freddygv
a54d6a9010 Update proxycfg for transparent proxy 2021-03-17 13:40:39 -06:00
Daniel Nephin
f40b76af2d proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin
906834ce8e proxycfg: Use streaming in connect state 2021-03-12 11:35:42 -05:00
Freddy
82c269a7c5
Avoid potential proxycfg/xDS deadlock using non-blocking send 2021-02-08 16:14:06 -07:00
freddygv
ec5f75776b Update comments on avoiding proxycfg deadlock 2021-02-08 09:45:45 -07:00
R.B. Boyer
43193a35c6
xds: prevent LDS flaps in mesh gateways due to unstable datacenter lists (#9651)
Also fix a similar issue in Terminating Gateways that was masked by an overzealous test.
2021-02-08 10:19:57 -06:00
freddygv
6e443e5536 Retry send after timer fires, in case no updates occur 2021-02-05 18:00:59 -07:00
freddygv
95e7641faa Update proxycfg logging, labels were already attached 2021-02-05 15:14:49 -07:00
freddygv
5ba14ad41d Add trace logs to proxycfg state runner and xds srv 2021-02-02 12:26:38 -07:00
freddygv
37190c0d0d Avoid potential deadlock using non-blocking send
Deadlock scenario:
    1. Due to scheduling, the state runner sends one snapshot into
    snapCh and then attempts to send a second. The first send succeeds
    because the channel is buffered, but the second blocks.
    2. Separately, Manager.Watch is called by the xDS server after
    getting a discovery request from Envoy. This function acquires the
    manager lock and then blocks on receiving the CurrentSnapshot from
    the state runner.
    3. Separately, there is a Manager goroutine that reads the snapshots
    from the channel in step 1. These reads are done to notify proxy
    watchers, but they require holding the manager lock. This goroutine
    goes to acquire that lock, but can't because it is held by step 2.

Now, the goroutine from step 3 is waiting on the one from step 2 to
release the lock. The goroutine from step 2 won't release the lock until
the goroutine in step 1 advances. But the goroutine in step 1 is waiting
for the one in step 3. Deadlock.

By making this send non-blocking step 1 above can proceed. The coalesce
timer will be reset and a new valid snapshot will be delivered after it
elapses or when one is requested by xDS.
2021-02-02 11:31:14 -07:00
Daniel Nephin
b9e60c0775 testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
freddygv
856d5a25ee Fix text type assertion 2020-09-14 16:28:40 -06:00
freddygv
7fd518ff1d Merge master 2020-09-14 16:17:43 -06:00
freddygv
87541ab80a Fix type assertion 2020-09-14 16:12:21 -06:00
freddygv
768dbaa68d Add session flag to cookie config 2020-09-11 18:34:03 -06:00
freddygv
eab90ea9fa Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
freddygv
30ba080d25 Add explicit protocol overrides in tgw xds test cases 2020-09-03 08:57:48 -06:00
freddygv
f81fe6a1a1 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
freddygv
63f79e5f9b Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
freddygv
28d0602fc1 Pass LB config to Envoy via xDS 2020-08-28 14:27:40 -06:00
R.B. Boyer
74d5df7c7a
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
Matt Keeler
be01c4241d
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Pierre Souchay
505de6dc29
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler
12acdd7481
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Daniel Nephin
010a609912 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Freddy
5baa7b1b04
Always return a gateway cluster (#8158) 2020-06-19 13:31:39 -06:00
Daniel Nephin
5afcf5c1bc
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Daniel Nephin
068b43df90 Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin
cb050b280c ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
freddygv
19e3954603 Move compound service names to use ServiceName type 2020-06-12 13:47:43 -06:00
Freddy
166a8b2a58
Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 13:46:17 -06:00