Commit Graph

2096 Commits

Author SHA1 Message Date
Hans Hasselberg 1a351d53bb Mark its own cluster as healthy when rebalancing. (#8406)
This code started as an optimization to avoid doing an RPC Ping to
itself. But in a single server cluster the rebalancing was led to
believe that there were no healthy servers because foundHealthyServer
was not set. Now this is being set properly.

Fixes #8401 and #8403.
2020-08-06 08:43:18 +00:00
Daniel Nephin 2bde91a2a0 Merge pull request #8404 from hashicorp/dnephin/remove-log-output-field
Use Logger consistently, instead of LogOutput
2020-08-05 18:32:16 +00:00
Daniel Nephin 4756cb6af4 Merge pull request #8437 from hashicorp/dnephin/fix-service-checks-cache-type
cache-type: Return nil value on error
2020-08-05 17:50:28 +00:00
freddygv b5e858d3e1 Avoid panics during shutdown routine 2020-07-30 11:13:40 -06:00
Matt Keeler c9b66157a1
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:42:24 -04:00
Matt Keeler 4f98af0724 Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394)
Ensure that enabling AutoConfig sets the tls configurator properly

This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
2020-07-30 14:16:15 +00:00
Hans Hasselberg ae2cbbce99 agent/cache test for cache throttling. (#8396) 2020-07-30 12:41:38 +00:00
Matt Keeler e813445e57 Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 19:32:22 +00:00
Matt Keeler 91ec880e07
Backport #8389 (#8392)
# Conflicts:
#	agent/cache-types/catalog_list_services_test.go
2020-07-28 14:22:29 -04:00
Pierre Souchay 678489d9d1 Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 21:11:42 +00:00
Matt Keeler 0937f70ddf Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364)
The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.
2020-07-24 14:01:58 +00:00
Matt Keeler 4d41ee3887 Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 20:05:52 +00:00
Matt Keeler 56b46436c1
Backport: #8362 (#8366)
Refactoring of the agentpb package.

First move the whole thing to the top-level proto package name.

Secondly change some things around internally to have sub-packages.
# Conflicts:
#	agent/consul/state/acl.go
#	agent/consul/state/acl_test.go
2020-07-23 12:44:27 -04:00
Daniel Nephin 4205fdf1d6 Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs
testutil: NewLogBuffer - buffer logs until a test fails
2020-07-21 19:22:29 +00:00
Matt Keeler a8d2e5a2c2
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.

# Conflicts:
#	agent/cache/cache.go
2020-07-21 13:51:18 -04:00
Matt Keeler 24e11b511e
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.

# Conflicts:
#	agent/agent.go
2020-07-21 13:49:18 -04:00
Daniel Nephin 65566e2c98 Merge pull request #8290 from hashicorp/dnephin/watch-decode
watch: fix script watches with single arg
2020-07-20 18:41:48 +00:00
André 927e73d8db minor: fix docstring of DNSOnlyPassing (#8318)
In runtime.go it had "duration" but it is actually a boolean.
2020-07-16 13:48:07 +00:00
Matt Keeler 625055a556 Add ability for notifications when one of the agent tokens is updated (#8301)
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-07-14 13:54:38 +00:00
Freddy 89af0212d3 Add api mod support for /catalog/gateway-services (#8278) 2020-07-10 19:02:09 +00:00
R.B. Boyer 2142a697ad
[backport: 1.8.x] xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8265)
cherry-pick of #8222 onto origin/release/1.8.x

Fixes: #8205
2020-07-09 17:04:23 -05:00
Matt Keeler 38251ab0e8
Pass the Config and TLS Configurator into the AutoConfig constructor
This is instead of having the AutoConfigBackend interface provide functions for retrieving them.

NOTE: the config is not reloadable. For now this is fine as we don’t look at any reloadable fields. If that changes then we should provide a way to make it reloadable.
2020-07-09 10:38:29 -04:00
Matt Keeler f06595992a
Rename (*Server).forward to (*Server).ForwardRPC
Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.
2020-07-09 10:38:16 -04:00
Matt Keeler 977eb725a7
Refactor AutoConfig RPC to not have a direct dependency on the Server type
Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.

# Conflicts:
#	agent/pool/pool.go
2020-07-09 10:37:55 -04:00
R.B. Boyer 8a5680aaf0
connect: upgrade github.com/envoyproxy/go-control-plane to v0.9.5 (#8247)
cherry-pick of #8165 onto origin/release/1.8.x
2020-07-07 16:22:30 -05:00
Chris Piraino cbf143844f Append port number to ingress host domain (#8190)
A port can be sent in the Host header as defined in the HTTP RFC, so we
take any hosts that we want to match traffic to and also add another
host with the listener port added.

Also fix an issue with envoy integration tests not running the
case-ingress-gateway-tls test.
2020-07-07 15:43:32 +00:00
Matt Keeler 9c64239db7 Merge pull request #8211 from hashicorp/bugfix/auto-encrypt-various 2020-07-02 13:51:34 +00:00
Matt Keeler d73d299848 Merge pull request #8218 from yurkeen/fix-dns-rcode 2020-07-01 13:13:55 +00:00
Matt Keeler c7a6c5c4f5 Merge pull request #8193 from hashicorp/feature/auto-config/suppress-config-warnings 2020-06-27 14:07:30 +00:00
Freddy be263d7885 Split up unused key validation for oss/ent (#8189)
Split up unused key validation in config entry decode for oss/ent.

This is needed so that we can return an informative error in OSS if namespaces are provided.
2020-06-26 12:02:56 +02:00
Matt Keeler 8853e38c72
Various go routine leak fixes 2020-06-25 09:36:14 -04:00
Chris Piraino 3da13af6b4 Merge pull request #7932 from hashicorp/ingress/internal-ui-endpoint-multiple-ports
Update gateway-services-nodes API endpoint to allow multiple addresses
2020-06-24 22:11:45 +00:00
Matt Keeler 1858153500 Don’t leak metrics go routines in tests (#8182) 2020-06-24 14:15:50 +00:00
gitforbit 657db029b2 agent-http: cleanup: return nil instead of err (#8043)
Since err is already checked, it should return `nil`
2020-06-24 12:29:48 +00:00
Freddy fc1baf2223 Merge pull request #8169 from hashicorp/config-entry-ns 2020-06-23 11:44:57 -06:00
Pierre Souchay 9df55f5995 Returns DNS Error NSDOMAIN when DC does not exists (#8103)
This will allow to increase cache value when DC is not valid (aka
return SOA to avoid too many consecutive requests) and will
distinguish DC being temporarily not available from DC not existing.

Implements https://github.com/hashicorp/consul/issues/8102
2020-06-22 13:02:47 +00:00
Matt Keeler 3f2fc48623 Require enabling TLS to enable Auto Config (#8159)
On the servers they must have a certificate.

On the clients they just have to set verify_outgoing to true to attempt TLS connections for RPCs.

Eventually we may relax these restrictions but right now all of the settings we push down (acl tokens, acl related settings, certificates, gossip key) are sensitive and shouldn’t be transmitted over an unencrypted connection. Our guides and docs should recoommend verify_server_hostname on the clients as well.

Another reason to do this is weird things happen when making an insecure RPC when TLS is not enabled. Basically it tries TLS anyways. We should probably fix that to make it clearer what is going on.
2020-06-19 20:38:38 +00:00
Freddy dce775d0d8 Always return a gateway cluster (#8158) 2020-06-19 19:32:24 +00:00
Matt Keeler 0736c42b72 Allow cancelling startup when performing auto-config (#8157)
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-06-19 19:16:20 +00:00
Matt Keeler fdef446e82 Change auto config authorizer to allow for future extension
The envisioned changes would allow extra settings to enable dynamically defined auth methods to be used instead of  or in addition to the statically defined one in the configuration.
2020-06-18 19:22:51 +00:00
Chris Piraino 8d72225d33 Remove ACLEnforceVersion8 from tests (#8138)
The field had been deprecated for a while and was recently removed,
however a PR which added these tests prior to removal was merged.
2020-06-18 18:15:43 +00:00
Matt Keeler 6375db7b4b Merge pull request #8086 from hashicorp/feature/auto-config/client-config-inject 2020-06-18 14:45:52 +00:00
Matt Keeler 9f37a218c5 Merge pull request #8035 from hashicorp/feature/auto-config/server-rpc 2020-06-17 20:08:17 +00:00
Daniel Nephin 058114e82e Merge pull request #7762 from hashicorp/dnephin/warn-on-unknown-service-file
config: warn if a config file is being skipped because of its file extension
2020-06-17 15:21:34 -04:00
Pierre Souchay 318495d1f8 gossip: Ensure that metadata of Consul Service is updated (#7903)
While upgrading servers to a new version, I saw that metadata of
existing servers are not upgraded, so the version and raft meta
is not up to date in catalog.

The only way to do it was to:
 * update Consul server
 * make it leave the cluster, then metadata is accurate

That's because the optimization to avoid updating catalog does
not take into account metadata, so no update on catalog is performed.
2020-06-17 10:17:33 +00:00
Matt Keeler c3b348bebb Agent Auto Configuration: Configuration Syntax Updates (#8003) 2020-06-16 19:03:59 +00:00
Matt Keeler 3c4413cbed ACL Node Identities (#7970)
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).

Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
2020-06-16 16:55:01 +00:00
Matt Keeler 64262d22d6 Make the Agent Cache more Context aware (#8092)
Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.
2020-06-15 15:43:32 +00:00
Freddy 2af14433be Merge pull request #8099 from hashicorp/gateway-services-endpoint 2020-06-12 21:15:25 +00:00
Freddy c9dbb6c51a Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 19:46:51 +00:00