121 Commits

Author SHA1 Message Date
Daniel Nephin
b9e60c0775 testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
R.B. Boyer
7c7a3e5165
command: when generating envoy bootstrap configs use the datacenter returned from the agent services endpoint (#9229)
Fixes #9215
2020-11-19 15:27:31 -06:00
Freddy
fe728855ed
Add DC and NS support for Envoy metrics (#9207)
This PR updates the tags that we generate for Envoy stats.

Several of these come with breaking changes, since we can't keep two stats prefixes for a filter.
2020-11-16 16:37:19 -07:00
Mike Morris
6396042ba7
connect: switch the default gateway port from 443 to 8443 (#9116)
* test: update ingress gateway golden file to port 8443

* test: update Envoy flags_test to port 8443

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-11-06 20:47:29 -05:00
R.B. Boyer
8baf158ea8
Revert "Add namespace support for metrics (OSS) (#9117)" (#9124)
This reverts commit 06b3b017d326853dbb53bc0ec08ce371265c5ce9.
2020-11-06 10:24:32 -06:00
Freddy
06b3b017d3
Add namespace support for metrics (OSS) (#9117) 2020-11-05 18:24:29 -07:00
R.B. Boyer
a2c50d3303
connect: add support for envoy 1.16.0, drop support for 1.12.x, and bump point releases as well (#8944)
Supported versions will be: "1.16.0", "1.15.2", "1.14.5", "1.13.6"
2020-10-22 13:46:19 -05:00
R.B. Boyer
9fbcb2e68d
command: remove conditional envoy bootstrap generation for versions <=1.10.0 since those are not supported (#8855) 2020-10-07 10:53:23 -05:00
R.B. Boyer
a2a8e9c783
connect: intentions are now managed as a new config entry kind "service-intentions" (#8834)
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.

- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.

- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.

- Add a new serf feature flag indicating support for
intentions-as-config-entries.

- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.

- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.

- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.

- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
2020-10-06 13:24:05 -05:00
Tim Arenz
a1fe711390
Add support for -ca-path option in the connect envoy command (#8606)
* Add support for -ca-path option in the connect envoy command
* Adding changelog entry
2020-09-08 12:16:16 +02:00
Daniel Nephin
33c401a16e logging: Setup accept io.Writer instead of []io.Writer
Also accept a non-pointer Config, since the config is not modified
2020-08-19 13:20:41 -04:00
Daniel Nephin
d68edcecf4 testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
R.B. Boyer
397019d970
xds: revert setting set_node_on_first_message_only to true when generating envoy bootstrap config (#8440)
When consul is restarted and an envoy that had already sent
DiscoveryRequests to the previous consul process sends a request to the
new process it doesn't respect the setting and never populates
DiscoveryRequest.Node for the life of the new consul process due to this
bug: https://github.com/envoyproxy/envoy/issues/9682

Fixes #8430
2020-08-05 15:00:24 -05:00
Daniel Nephin
0420d91cdd Remove LogOutput from Agent
Now that it is no longer used, we can remove this unnecessary field. This is a pre-step in cleanup up RuntimeConfig->Consul.Config, which is a pre-step to adding a gRPCHandler component to Server for streaming.

Removing this field also allows us to remove one of the return values from logging.Setup.
2020-08-05 14:00:44 -04:00
R.B. Boyer
c599a2f5f4
xds: add support for envoy 1.15.0 and drop support for 1.11.x (#8424)
Related changes:

- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
2020-07-31 15:52:49 -05:00
Chris Piraino
7c4cc71131
Fix envoy bootstrap logic to not append multiple self_admin clusters (#8371)
Previously, the envoy bootstrap config would blindly copy the self_admin
cluster into the list of static clusters when configuring either
ReadyBindAddr, PrometheusBindAddr, or StatsBindAddr.

Since ingress gateways always configure the ReadyBindAddr property,
users ran into this case much more often than previously.
2020-07-23 13:12:08 -05:00
Hans Hasselberg
496fb5fc5b
add support for envoy 1.14.4, 1.13.4, 1.12.6 (#8216) 2020-07-13 15:44:44 -05:00
R.B. Boyer
1eef096dfe
xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8222)
- cut down on extra node metadata transmission
- split the golden file generation to compare all envoy version
2020-07-09 17:04:51 -05:00
R.B. Boyer
462f0f37ed
connect: various changes to make namespaces for intentions work more like for other subsystems (#8194)
Highlights:

- add new endpoint to query for intentions by exact match

- using this endpoint from the CLI instead of the dump+filter approach

- enforcing that OSS can only read/write intentions with a SourceNS or
  DestinationNS field of "default".

- preexisting OSS intentions with now-invalid namespace fields will
  delete those intentions on initial election or for wildcard namespaces
  an attempt will be made to downgrade them to "default" unless one
  exists.

- also allow the '-namespace' CLI arg on all of the intention subcommands

- update lots of docs
2020-06-26 16:59:15 -05:00
Matt Keeler
3dbbd2d37d
Implement Client Agent Auto Config
There are a couple of things in here.

First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.

This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
2020-06-17 16:49:46 -04:00
Daniel Nephin
068b43df90 Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Hans Hasselberg
e62a43c6cf
Support envoy 1.14.2, 1.13.2, 1.12.4 (#8057) 2020-06-10 23:20:17 +02:00
Kyle Havlovitz
6fd3b25313 Fix a CLI test failure with namespaces in enterprise 2020-06-09 15:13:23 -07:00
Kyle Havlovitz
e3a725c4e0 Always allow updating the exposed service and differentiate by namespace 2020-06-09 11:09:53 -07:00
Kyle Havlovitz
edab5588d8 Add -host flag to expose command 2020-06-08 16:59:47 -07:00
Kyle Havlovitz
5958328552 Allow multiple listeners per service via expose command 2020-06-08 16:44:20 -07:00
Kyle Havlovitz
acae044df4 Document the namespace format for expose CLI command 2020-06-05 15:47:03 -07:00
Kyle Havlovitz
b874c8ef0c Add connect expose CLI command 2020-06-05 14:54:29 -07:00
Daniel Nephin
c88fae0aac ci: Add staticcheck and fix most errors
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.

In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
2020-05-28 11:59:58 -04:00
Kyle Havlovitz
b14696e32a
Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924)
* Standardize support for Tagged and BindAddresses in Ingress Gateways

This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.

* Update command/connect/envoy/envoy.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* PR comments

* Check to see if address is an actual IP address

* Update agent/xds/listeners.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix whitespace

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-05-21 09:08:12 -05:00
Daniel Nephin
c662f0f0de Fix a number of problems found by staticcheck
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2020-05-19 16:50:14 -04:00
Freddy
ccd0822539
Use proxy-id in gateway auto-registration (#7845) 2020-05-13 11:56:53 -06:00
Chris Piraino
3d2de925d8
Add support for ingress-gateway in CLI command (#7618)
* Add support for ingress-gateway in CLI command

- Supports -register command
- Creates a static Envoy listener that exposes only the /ready API so
that we can register a TCP healthcheck against the ingress gateway
itself
- Updates ServiceAddressValue.String() to be more in line with Value()
2020-04-14 09:48:02 -05:00
Daniel Nephin
25b585d0bf Fix golden file for envoy tests
The envoy version was updated after the PR which added this test was opened, and
merged before the test was merged, so it ended up with the wrong version.
2020-04-13 12:58:02 -04:00
Daniel Nephin
6b860c926f
Merge pull request #7608 from hashicorp/dnephin/grpc-default-scheme
command/envoy: enable TLS when CONSUL_HTTP_ADDR=https://...
2020-04-13 12:30:26 -04:00
Hans Hasselberg
66415be90e
connect: support envoy 1.14.1 (#7624) 2020-04-09 20:58:22 +02:00
Daniel Nephin
8b6861518f Fix CONSUL_HTTP_ADDR=https not enabling TLS
Use the config instead of attempting to reparse the env var.
2020-04-07 18:16:53 -04:00
Daniel Nephin
0888c6575b Step 3: fix a bug in api.NewClient and fix the tests
The api client should never rever to HTTP if the user explicitly
requested TLS. This change broke some tests because the tests always use
an non-TLS http server, but some tests explicitly enable TLS.
2020-04-07 18:02:56 -04:00
Daniel Nephin
1a8ffec6a7 Step 2: extract the grpc address logic and a new type
The new grpcAddress function contains all of the logic to translate the
command line options into the values used in the template.

The new type has two advantages.

1. It introduces a logical grouping of values in the BootstrapTplArgs
   struct which is exceptionally large. This grouping makes the struct
   easier to understand because each set of nested values can be seen
   as a single entity.
2. It gives us a reasonable return value for this new function.
2020-04-07 16:36:51 -04:00
Daniel Nephin
830b4a15f6 Step 1: move all the grpcAddr logic into the same spot
There is no reason a reader should have to jump around to find this value. It is only
used in 1 place
2020-04-07 15:53:12 -04:00
Freddy
b61214ef24
Fix regression with gateway registration and update docs (#7582) 2020-04-02 12:52:11 -06:00
Daniel Nephin
475659a132 Remove name from NewTestAgent
Using:

git grep -l 'NewTestAgent(t, t.Name(),' | \
    xargs sed -i -e 's/NewTestAgent(t, t.Name(),/NewTestAgent(t,/g'
2020-03-31 16:13:44 -04:00
Daniel Nephin
09d0876b6c command: remove unused logOutput field 2020-03-30 14:11:27 -04:00
Freddy
18d356899c
Enable CLI to register terminating gateways (#7500)
* Enable CLI to register terminating gateways

* Centralize gateway proxy configuration
2020-03-26 10:20:56 -06:00
Daniel Nephin
e5d6273a48
command/envoy: Refactor flag parsing/validation (#7504) 2020-03-26 08:19:21 -06:00
Daniel Nephin
a95974cf79 Remove unnecessary methods
They call only a single method and add no additional functionality
2020-03-24 18:35:07 -04:00
Daniel Nephin
8df3746927 cmd: use env vars as defaults
Insted of setting them afterward in Run.

This change required a small re-ordering of the test to patch the
environment before calling New()
2020-03-24 18:34:46 -04:00
Daniel Nephin
6e10616b13 Fix tests failing on master
The default version was changed in https://github.com/hashicorp/consul/pull/7452
which caused these tests to fail.
2020-03-23 16:38:14 -04:00
Hans Hasselberg
d5f4b8c3a3
envoy: default to 1.13.1 (#7452) 2020-03-17 22:23:42 +01:00
R.B. Boyer
6adad71125
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00