Commit Graph

2196 Commits

Author SHA1 Message Date
Kyle Havlovitz 844e9ffe16 Use mapstructure for decoding vault data 2020-10-07 16:40:27 -04:00
Kyle Havlovitz 449103411d Add a stop function to make sure the renewer is shut down on leader change 2020-10-07 16:40:27 -04:00
Kyle Havlovitz 2fc2b61b48 Add a test for token renewal 2020-10-07 16:40:27 -04:00
Kyle Havlovitz f416c1a8bd Automatically renew the token used by the Vault CA provider 2020-10-07 16:40:27 -04:00
Hans Hasselberg 780d2d79fb fix ent error (#8750) 2020-09-25 10:41:18 -05:00
Hans Hasselberg 010abda64c use service datacenter for dns name (#8704)
* Use args.Datacenter instead of configured datacenter
2020-09-25 10:41:02 -05:00
R.B. Boyer e05c30de1f agent: when enable_central_service_config is enabled ensure agent reload doesn't revert check state to critical (#8747)
Likely introduced when #7345 landed.
2020-09-24 21:24:51 +00:00
Alexander Mykolaichuk e039087adf added permission denied error message (#8044) 2020-09-22 18:36:36 +00:00
Hans Hasselberg abd8e605cf fix TestLeader_SecondaryCA_IntermediateRenew (#8702)
* fix lessThanHalfTime
* get lock for CAProvider()
* make a var to relate both vars
* rename to getCAProviderWithLock
* move CertificateTimeDriftBuffer to agent/connect/ca
2020-09-18 08:14:09 +00:00
Mike Morris dc50eca37f test: update tags for database service registrations and queries (#8693) 2020-09-16 18:21:49 +00:00
Daniel Nephin 3ee2aa1325 Merge pull request #8685 from pierresouchay/do_not_flood_logs_with_Non-server_in_server-only_area
[BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs
2020-09-15 21:58:29 +00:00
Kyle Havlovitz 2ed68b9f45 Merge pull request #8646 from hashicorp/common-intermediate-ttl
Move IntermediateCertTTL to common CA config
2020-09-15 19:04:27 +00:00
hashicorp-ci 2394439344
update bindata_assetfs.go 2020-09-11 03:06:11 +00:00
Hans Hasselberg 89b7e80478 secondaryIntermediateCertRenewalWatch abort on success (#8588)
secondaryIntermediateCertRenewalWatch was using `retryLoopBackoff` to
renew the intermediate certificate. Once it entered the inner loop and
started `retryLoopBackoff` it would never leave that.
`retryLoopBackoffAbortOnSuccess` will return when renewing is
successful, like it was intended originally.
2020-09-04 09:49:16 +00:00
R.B. Boyer 770fc0985a connect: all config entries pick up a meta field (#8596)
Fixes #8595
2020-09-02 19:22:37 +00:00
R.B. Boyer b8cfef599c agent: ensure that we normalize bootstrapped config entries (#8547) 2020-09-02 19:21:58 +00:00
R.B. Boyer c2a28ba268 connect: fix bug in preventing some namespaced config entry modifications (#8601)
Whenever an upsert/deletion of a config entry happens, within the open
state store transaction we speculatively test compile all discovery
chains that may be affected by the pending modification to verify that
the write would not create an erroneous scenario (such as splitting
traffic to a subset that did not exist).

If a single discovery chain evaluation references two config entries
with the same kind and name in different namespaces then sometimes the
upsert/deletion would be falsely rejected. It does not appear as though
this bug would've let invalid writes through to the state store so the
correction does not require a cleanup phase.
2020-09-02 15:47:53 +00:00
Matt Keeler 8036981dcb
Backport: #8523 (#8589)
auto-encrypt is now handled as a special case of auto-config.

This also is moving all the cert-monitor code into the auto-config package.
2020-08-31 16:46:37 -04:00
Daniel Nephin f8dbe4821d Merge pull request #8548 from edevil/fix_flake
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve
2020-08-28 19:11:24 +00:00
Daniel Nephin 607a494000 Merge pull request #8552 from pierresouchay/reload_cache_throttling_config
Ensure that Cache options are reloaded when `consul reload` is performed
2020-08-28 19:05:15 +00:00
Jack 295358044b Add http2 and grpc support to ingress gateways (#8458) 2020-08-27 15:41:39 -06:00
R.B. Boyer f5e62f1d1b
agent: expose the list of supported envoy versions on /v1/agent/self (#8566)
also backport of a portion of c599a2f5f4 from #8424
2020-08-27 11:33:33 -05:00
Matt Keeler fafc6cf7ff Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 15:24:25 +00:00
Daniel Nephin 0bf7bc788e Merge pull request #8540 from hashicorp/dnephin/logging-setup-cleanup
logging: cleanup Setup and configuration
2020-08-26 17:16:15 -04:00
Daniel Nephin ec50628a39 Merge pull request #8511 from hashicorp/dnephin/agent-setup
agent: extract dependency creation from New
2020-08-26 17:15:12 -04:00
Daniel Nephin 6f93764548 Merge pull request #8528 from hashicorp/dnephin/move-node-name-validation
config: Move some config validation from Agent.Start to config.Builder.Validate
2020-08-26 17:13:11 -04:00
Daniel Nephin 1a5ba078a8 Merge pull request #8514 from hashicorp/dnephin/testing-improvements-1
testing: small improvements to TestSessionCreate and testutil.retry
2020-08-26 17:11:43 -04:00
Daniel Nephin 690d3f4f20 Merge pull request #8515 from hashicorp/dnephin/unexport-testing-shims
config: unexport fields and resolve TODOs in config.Builder
2020-08-26 17:09:46 -04:00
Daniel Nephin cbfae50854 Merge pull request #8473 from hashicorp/dnephin/unmethod-consul-config
agent: convert consulConfig method to a function
2020-08-26 17:06:32 -04:00
Daniel Nephin 298c4d7e66 Merge pull request #8463 from hashicorp/dnephin/unmethod-make-node-id
agent: convert NodeID methods to functions
2020-08-26 17:05:57 -04:00
Daniel Nephin 533c53b8ef Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown
agent/consul: Remove NotifyShutdown
2020-08-26 17:04:03 -04:00
Daniel Nephin 81de78d131 Merge pull request #8500 from hashicorp/dnephin/auto-config-loader
auto-config: reduce awareness of config
2020-08-26 17:01:55 -04:00
Daniel Nephin 6dc6507abc Merge pull request #8469 from hashicorp/dnephin/config-source
config: make Source an interface to avoid the marshal/unmarshal cycle in auto-config
2020-08-26 17:00:51 -04:00
Daniel Nephin 56444e0405 Merge pull request #8546 from edevil/fix_vet
testing: Fix govet errors
2020-08-24 18:39:56 +00:00
Daniel Nephin 984318c2d8 Merge pull request #8537 from hashicorp/dnephin/fix-panic-on-connect-nil
Fix panic when decoding 'Connect: null'
2020-08-20 22:01:30 +00:00
Hans Hasselberg bc5e2ddfc3 add primary keys to list keyring (#8522)
During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key.

Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output:

```json
[
  {
    "WAN": true,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "NumNodes": 6
  },
  {
    "WAN": false,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  },
  {
    "WAN": false,
    "Datacenter": "dc1",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  }
]
```

I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later:
* add a flag to show the primary keys
* add a flag to show json output

Fixes #3393.
2020-08-18 07:51:22 +00:00
Daniel Nephin d5179becf6 Merge pull request #8509 from hashicorp/dnephin/use-t.cleanup-in-testagent
testing: Use t.cleanup in TestAgent , and fix two flaky tests
2020-08-14 20:34:09 +00:00
R.B. Boyer 7983023acf
[backport/1.8.x] connect: use stronger validation that ingress gateways have compatible protocols defined for their upstreams (#8494)
Backport of #8470 to 1.8.x
2020-08-13 15:26:23 -05:00
Daniel Nephin 010a8eb515 Merge pull request #8365 from hashicorp/dnephin/fix-service-by-node-meta-flake
state: speed up tests that use watchLimit
2020-08-13 15:16:58 +00:00
hashicorp-ci 39b8ecb84a
update bindata_assetfs.go 2020-08-12 19:03:06 +00:00
Freddy 56993a0054 Notify alias checks when aliased service is [de]registered (#8456) 2020-08-12 15:48:23 +00:00
Hans Hasselberg 04fcbff24b Merge pull request #8471 from hashicorp/local_only
thread local-only through the layers
2020-08-12 06:56:10 +00:00
Kyle Havlovitz ae8aa66cd7 Backport catalog index fix to 1.8.x 2020-08-11 15:13:25 -07:00
Kyle Havlovitz eaf3a00b88 Fix a state store comment about version 2020-08-11 14:22:44 -07:00
Kyle Havlovitz 0b715f5f27 fsm: Fix snapshot bug with restoring node/service/check indexes 2020-08-11 14:22:42 -07:00
Mike Morris 99d683bfe5 changelog: Update for 1.8.2, 1.7.6, 1.7.5 and 1.6.7 (#8462)
* update bindata_assetfs.go

* Release v1.8.2

* Putting source back into Dev Mode

* changelog: add entries for 1.7.6, 1.7.5 and 1.6.7

Co-authored-by: hashicorp-ci <hashicorp-ci@users.noreply.github.com>
2020-08-07 19:00:23 -04:00
Daniel Nephin 62703e4426
Merge pull request #8438 from hashicorp/dnephin/1.8.x-backport-ineffassign
[1.8.x] Backport addition of  ineffassign linter and staticcheck
2020-08-07 13:04:56 -04:00
Matt Keeler 1c3c8c7804 Require token replication to be enabled in secondary dcs when ACLs are enabled with AutoConfig (#8451)
AutoConfig will generate local tokens for clients and the ability to use local tokens is gated off of token replication being enabled and being configured with a replication token. Therefore we already have a hard requirement on having token replication enabled, this commit just makes sure to surface that to the operator instead of having to discern what the issue is from RPC errors.
2020-08-07 14:20:57 +00:00
Hans Hasselberg ba495cd11b auto_config implies connect (#8433) 2020-08-07 10:02:30 +00:00
Hans Hasselberg 1a351d53bb Mark its own cluster as healthy when rebalancing. (#8406)
This code started as an optimization to avoid doing an RPC Ping to
itself. But in a single server cluster the rebalancing was led to
believe that there were no healthy servers because foundHealthyServer
was not set. Now this is being set properly.

Fixes #8401 and #8403.
2020-08-06 08:43:18 +00:00
Daniel Nephin 2bde91a2a0 Merge pull request #8404 from hashicorp/dnephin/remove-log-output-field
Use Logger consistently, instead of LogOutput
2020-08-05 18:32:16 +00:00
Daniel Nephin 4756cb6af4 Merge pull request #8437 from hashicorp/dnephin/fix-service-checks-cache-type
cache-type: Return nil value on error
2020-08-05 17:50:28 +00:00
Daniel Nephin 26e53053a9 Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-08-05 13:37:35 -04:00
freddygv b5e858d3e1 Avoid panics during shutdown routine 2020-07-30 11:13:40 -06:00
Matt Keeler c9b66157a1
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:42:24 -04:00
Matt Keeler 4f98af0724 Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394)
Ensure that enabling AutoConfig sets the tls configurator properly

This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
2020-07-30 14:16:15 +00:00
Hans Hasselberg ae2cbbce99 agent/cache test for cache throttling. (#8396) 2020-07-30 12:41:38 +00:00
Matt Keeler e813445e57 Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 19:32:22 +00:00
Matt Keeler 91ec880e07
Backport #8389 (#8392)
# Conflicts:
#	agent/cache-types/catalog_list_services_test.go
2020-07-28 14:22:29 -04:00
Pierre Souchay 678489d9d1 Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 21:11:42 +00:00
Matt Keeler 0937f70ddf Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364)
The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.
2020-07-24 14:01:58 +00:00
Matt Keeler 4d41ee3887 Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 20:05:52 +00:00
Matt Keeler 56b46436c1
Backport: #8362 (#8366)
Refactoring of the agentpb package.

First move the whole thing to the top-level proto package name.

Secondly change some things around internally to have sub-packages.
# Conflicts:
#	agent/consul/state/acl.go
#	agent/consul/state/acl_test.go
2020-07-23 12:44:27 -04:00
Daniel Nephin 4205fdf1d6 Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs
testutil: NewLogBuffer - buffer logs until a test fails
2020-07-21 19:22:29 +00:00
Matt Keeler a8d2e5a2c2
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.

# Conflicts:
#	agent/cache/cache.go
2020-07-21 13:51:18 -04:00
Matt Keeler 24e11b511e
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.

# Conflicts:
#	agent/agent.go
2020-07-21 13:49:18 -04:00
Daniel Nephin 65566e2c98 Merge pull request #8290 from hashicorp/dnephin/watch-decode
watch: fix script watches with single arg
2020-07-20 18:41:48 +00:00
André 927e73d8db minor: fix docstring of DNSOnlyPassing (#8318)
In runtime.go it had "duration" but it is actually a boolean.
2020-07-16 13:48:07 +00:00
Matt Keeler 625055a556 Add ability for notifications when one of the agent tokens is updated (#8301)
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-07-14 13:54:38 +00:00
Freddy 89af0212d3 Add api mod support for /catalog/gateway-services (#8278) 2020-07-10 19:02:09 +00:00
R.B. Boyer 2142a697ad
[backport: 1.8.x] xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8265)
cherry-pick of #8222 onto origin/release/1.8.x

Fixes: #8205
2020-07-09 17:04:23 -05:00
Matt Keeler 38251ab0e8
Pass the Config and TLS Configurator into the AutoConfig constructor
This is instead of having the AutoConfigBackend interface provide functions for retrieving them.

NOTE: the config is not reloadable. For now this is fine as we don’t look at any reloadable fields. If that changes then we should provide a way to make it reloadable.
2020-07-09 10:38:29 -04:00
Matt Keeler f06595992a
Rename (*Server).forward to (*Server).ForwardRPC
Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.
2020-07-09 10:38:16 -04:00
Matt Keeler 977eb725a7
Refactor AutoConfig RPC to not have a direct dependency on the Server type
Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.

# Conflicts:
#	agent/pool/pool.go
2020-07-09 10:37:55 -04:00
R.B. Boyer 8a5680aaf0
connect: upgrade github.com/envoyproxy/go-control-plane to v0.9.5 (#8247)
cherry-pick of #8165 onto origin/release/1.8.x
2020-07-07 16:22:30 -05:00
Chris Piraino cbf143844f Append port number to ingress host domain (#8190)
A port can be sent in the Host header as defined in the HTTP RFC, so we
take any hosts that we want to match traffic to and also add another
host with the listener port added.

Also fix an issue with envoy integration tests not running the
case-ingress-gateway-tls test.
2020-07-07 15:43:32 +00:00
Matt Keeler 9c64239db7 Merge pull request #8211 from hashicorp/bugfix/auto-encrypt-various 2020-07-02 13:51:34 +00:00
Matt Keeler d73d299848 Merge pull request #8218 from yurkeen/fix-dns-rcode 2020-07-01 13:13:55 +00:00
Matt Keeler c7a6c5c4f5 Merge pull request #8193 from hashicorp/feature/auto-config/suppress-config-warnings 2020-06-27 14:07:30 +00:00
Freddy be263d7885 Split up unused key validation for oss/ent (#8189)
Split up unused key validation in config entry decode for oss/ent.

This is needed so that we can return an informative error in OSS if namespaces are provided.
2020-06-26 12:02:56 +02:00
Matt Keeler 8853e38c72
Various go routine leak fixes 2020-06-25 09:36:14 -04:00
Chris Piraino 3da13af6b4 Merge pull request #7932 from hashicorp/ingress/internal-ui-endpoint-multiple-ports
Update gateway-services-nodes API endpoint to allow multiple addresses
2020-06-24 22:11:45 +00:00
Matt Keeler 1858153500 Don’t leak metrics go routines in tests (#8182) 2020-06-24 14:15:50 +00:00
gitforbit 657db029b2 agent-http: cleanup: return nil instead of err (#8043)
Since err is already checked, it should return `nil`
2020-06-24 12:29:48 +00:00
Freddy fc1baf2223 Merge pull request #8169 from hashicorp/config-entry-ns 2020-06-23 11:44:57 -06:00
Pierre Souchay 9df55f5995 Returns DNS Error NSDOMAIN when DC does not exists (#8103)
This will allow to increase cache value when DC is not valid (aka
return SOA to avoid too many consecutive requests) and will
distinguish DC being temporarily not available from DC not existing.

Implements https://github.com/hashicorp/consul/issues/8102
2020-06-22 13:02:47 +00:00
Matt Keeler 3f2fc48623 Require enabling TLS to enable Auto Config (#8159)
On the servers they must have a certificate.

On the clients they just have to set verify_outgoing to true to attempt TLS connections for RPCs.

Eventually we may relax these restrictions but right now all of the settings we push down (acl tokens, acl related settings, certificates, gossip key) are sensitive and shouldn’t be transmitted over an unencrypted connection. Our guides and docs should recoommend verify_server_hostname on the clients as well.

Another reason to do this is weird things happen when making an insecure RPC when TLS is not enabled. Basically it tries TLS anyways. We should probably fix that to make it clearer what is going on.
2020-06-19 20:38:38 +00:00
Freddy dce775d0d8 Always return a gateway cluster (#8158) 2020-06-19 19:32:24 +00:00
Matt Keeler 0736c42b72 Allow cancelling startup when performing auto-config (#8157)
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-06-19 19:16:20 +00:00
Matt Keeler fdef446e82 Change auto config authorizer to allow for future extension
The envisioned changes would allow extra settings to enable dynamically defined auth methods to be used instead of  or in addition to the statically defined one in the configuration.
2020-06-18 19:22:51 +00:00
Chris Piraino 8d72225d33 Remove ACLEnforceVersion8 from tests (#8138)
The field had been deprecated for a while and was recently removed,
however a PR which added these tests prior to removal was merged.
2020-06-18 18:15:43 +00:00
Matt Keeler 6375db7b4b Merge pull request #8086 from hashicorp/feature/auto-config/client-config-inject 2020-06-18 14:45:52 +00:00
Matt Keeler 9f37a218c5 Merge pull request #8035 from hashicorp/feature/auto-config/server-rpc 2020-06-17 20:08:17 +00:00
Daniel Nephin 058114e82e Merge pull request #7762 from hashicorp/dnephin/warn-on-unknown-service-file
config: warn if a config file is being skipped because of its file extension
2020-06-17 15:21:34 -04:00
Pierre Souchay 318495d1f8 gossip: Ensure that metadata of Consul Service is updated (#7903)
While upgrading servers to a new version, I saw that metadata of
existing servers are not upgraded, so the version and raft meta
is not up to date in catalog.

The only way to do it was to:
 * update Consul server
 * make it leave the cluster, then metadata is accurate

That's because the optimization to avoid updating catalog does
not take into account metadata, so no update on catalog is performed.
2020-06-17 10:17:33 +00:00
Matt Keeler c3b348bebb Agent Auto Configuration: Configuration Syntax Updates (#8003) 2020-06-16 19:03:59 +00:00
Matt Keeler 3c4413cbed ACL Node Identities (#7970)
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).

Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
2020-06-16 16:55:01 +00:00
Matt Keeler 64262d22d6 Make the Agent Cache more Context aware (#8092)
Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.
2020-06-15 15:43:32 +00:00
Freddy 2af14433be Merge pull request #8099 from hashicorp/gateway-services-endpoint 2020-06-12 21:15:25 +00:00
Freddy c9dbb6c51a Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 19:46:51 +00:00