Commit Graph

435 Commits

Author SHA1 Message Date
Daniel Nephin 1f9479603c
Add failures_before_warning to checks (#10969)
Signed-off-by: Jakub Sokołowski <jakub@status.im>

* agent: add failures_before_warning setting

The new setting allows users to specify the number of check failures
that have to happen before a service status us updated to be `warning`.
This allows for more visibility for detected issues without creating
alerts and pinging administrators. Unlike the previous behavior, which
caused the service status to not update until it reached the configured
`failures_before_critical` setting, now Consul updates the Web UI view
with the `warning` state and the output of the service check when
`failures_before_warning` is breached.

The default value of `FailuresBeforeWarning` is the same as the value of
`FailuresBeforeCritical`, which allows for retaining the previous default
behavior of not triggering a warning.

When `FailuresBeforeWarning` is set to a value higher than that of
`FailuresBeforeCritical it has no effect as `FailuresBeforeCritical`
takes precedence.

Resolves: https://github.com/hashicorp/consul/issues/10680

Signed-off-by: Jakub Sokołowski <jakub@status.im>

Co-authored-by: Jakub Sokołowski <jakub@status.im>
2021-09-14 12:47:52 -04:00
R.B. Boyer 097e1645e3
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00
Daniel Nephin 17841248dd config: remove ACLResolver settings from RuntimeConfig 2021-08-17 13:32:52 -04:00
Daniel Nephin 31e034215f acl: remove ACLResolver config fields from consul.Config 2021-08-17 13:32:52 -04:00
Daniel Nephin c85c62dffb
Merge pull request #10807 from hashicorp/dnephin/remove-acl-datacenter
config: remove ACLDatacenter
2021-08-17 13:07:09 -04:00
Daniel Nephin 0575498d0d proxycfg: Lookup the agent token as a default
When no ACL token is provided with the service registration.
2021-08-12 15:51:34 -04:00
Daniel Nephin 7160f7a614 acl: remove ACLDatacenter
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.

This change was made by GoLand refactor, which did most of the work for me.
2021-08-06 18:27:00 -04:00
Daniel Nephin 8c575445da telemetry: add a metric for agent TLS cert expiry 2021-08-04 13:51:44 -04:00
Daniel Nephin 242b3a2dc5 streaming: set a default timeout
The blocking query backend sets the default value on the server side.
The streaming backend does not using blocking queries, so we must set the timeout on
the client.
2021-07-28 17:50:00 -04:00
R.B. Boyer 188e8dc51f
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Daniel Nephin 74fb650b6b
Merge pull request #10588 from hashicorp/dnephin/config-fix-ports-grpc
config: rename `ports.grpc` to `ports.xds`
2021-07-13 13:11:38 -04:00
Daniel Nephin be8c675942 config: remove misleading UseTLS field
This field was documented as enabling TLS for outgoing RPC, but that was not the case.
All this field did was set the use_tls serf tag.

Instead of setting this field in a place far from where it is used, move the logic to where
the serf tag is set, so that the code is much more obvious.
2021-07-09 19:01:45 -04:00
Daniel Nephin 70770db345 config: remove duplicate TLSConfig fields from agent/consul.Config
tlsutil.Config already presents an excellent structure for this
configuration. Copying the runtime config fields to agent/consul.Config
makes code harder to trace, and provides no advantage.

Instead of copying the fields around, use the tlsutil.Config struct
directly instead.

This is one small step in removing the many layers of duplicate
configuration.
2021-07-09 18:49:42 -04:00
Daniel Nephin 895bf9adec config: update GRPCPort and addr in runtime config 2021-07-09 12:31:53 -04:00
Daniel Nephin 7d73fd7ae5 rename GRPC->XDS where appropriate 2021-07-09 12:17:45 -04:00
Daniel Nephin c78391797d streaming: fix enable of streaming in the client
And add checks to all the tests that explicitly use streaming.
2021-06-28 17:23:14 -04:00
Matt Keeler da31e0449e Move some things around to allow for license updating via config reload
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
2021-05-25 09:57:50 -04:00
Matt Keeler caafc02449 hcs-1936: Prepare for adding license auto-retrieval to auto-config in enterprise 2021-05-24 13:20:30 -04:00
Matt Keeler 234d0a3c2a Preparation for changing where license management is done. 2021-05-24 10:19:31 -04:00
Iryna Shustava d7d44f6ae7
Save exposed ports in agent's store and expose them via API (#10173)
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.
2021-05-12 13:51:39 -07:00
Paul Banks 3ad754ca7b
Make Raft trailing logs and snapshot timing reloadable (#10129)
* WIP reloadable raft config

* Pre-define new raft gauges

* Update go-metrics to change gauge reset behaviour

* Update raft to pull in new metric and reloadable config

* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important

* Update telemetry docs

* Update config and telemetry docs

* Add note to oldestLogAge on when it is visible

* Add changelog entry

* Update website/content/docs/agent/options.mdx

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-05-04 15:36:53 +01:00
R.B. Boyer 71d45a3460
Support Incremental xDS mode (#9855)
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.

Union of all commit messages follows to give an overarching summary:

xds: exclusively support incremental xDS when using xDS v3

Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support

Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit

xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings

In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.

This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.

xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
2021-04-29 13:54:05 -05:00
Daniel Nephin 10ec9c2be3 rpcclient: close the grpc.ClientConn on shutdown 2021-04-27 19:03:16 -04:00
Daniel Nephin e229b877d8 health: create health.Client in Agent.New 2021-04-27 19:03:16 -04:00
Daniel Nephin 0558586dbd health: use blocking queries for near query parameter 2021-04-27 19:03:16 -04:00
Daniel Nephin 2a26085b2c connect: do not set QuerySource.Node
Setting this field to a value is equivalent to using the 'near' query paramter.
The intent is to sort the results by proximity to the node requesting
them. However with connect we send the results to envoy, which doesn't
care about the order, so setting this field is increasing the work
performed for no gain.

It is necessary to unset this field now because we would like connect
to use streaming, but streaming does not support sorting by proximity.
2021-04-27 19:03:16 -04:00
Matt Keeler bbf5993534
Move static token resolution into the ACLResolver (#10013) 2021-04-14 12:39:35 -04:00
Tara Tufano 9deb52e868
add http2 ping health checks (#8431)
* add http2 ping checks

* fix test issue

* add h2ping check to config resources

* add new test and docs for h2ping

* fix grammatical inconsistency in H2PING documentation

* resolve rebase conflicts, add test for h2ping tls verification failure

* api documentation for h2ping

* update test config data with H2PING

* add H2PING to protocol buffers and update changelog

* fix typo in changelog entry
2021-04-09 15:12:10 -04:00
R.B. Boyer e494313e7b
api: ensure v1/health/ingress/:service endpoint works properly when streaming is enabled (#9967)
The streaming cache type for service health has no way to handle v1/health/ingress/:service queries as there is no equivalent topic that would return the appropriate data.

Ensure that attempts to use this endpoint will use the old cache-type for now so that they return appropriate data when streaming is enabled.
2021-04-05 13:23:00 -05:00
freddygv f4f45af6d0 Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv a54d6a9010 Update proxycfg for transparent proxy 2021-03-17 13:40:39 -06:00
Christopher Broglie f0307c73e5 Add support for configuring TLS ServerName for health checks
Some TLS servers require SNI, but the Golang HTTP client doesn't
include it in the ClientHello when connecting to an IP address. This
change adds a new TLSServerName field to health check definitions to
optionally set it. This fixes #9473.
2021-03-16 18:16:44 -04:00
Daniel Nephin f40b76af2d proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin 906834ce8e proxycfg: Use streaming in connect state 2021-03-12 11:35:42 -05:00
Daniel Nephin 1a764553c0 rpcclient: use streaming for connect health 2021-03-12 11:35:42 -05:00
freddygv acec711a6a Update server-side config resolution and client-side merging 2021-03-10 21:05:11 -07:00
Daniel Nephin 8c45f2c1fe agent: use the new lib/mutex for stateLock
Previously the ServiceManager had to run a separate goroutine so that it could block on a channel
send/receive instead of a lock. Using this mutex with TryLock allows us to cancel the lock when
the serviceConfigWatch is stopped.

Without this change removing the ServiceManager.Start goroutine would not be possible because
when AddService is called it acquires the stateLock. While that lock is held, if there are
existing watches for the service, the old watch will be stopped, and the goroutine holding the
lock will attempt to wait for that watcher goroutine to exit.

If the goroutine is handling an update (serviceConfigWatch.handleUpdate) then it can block on
acquiring the stateLock and deadlock the agent.  With this change the context is cancelled as
and the goroutine will exit instead of waiting on the stateLock.
2021-01-25 18:01:47 -05:00
Daniel Nephin 4e42b8f1d4 agent: remove ServiceManager goroutine
The ServiceManager.Start goroutine was used to serialize calls to
agent.addServiceInternal.

All the goroutines which sent events to the channel would block waiting
for a response from that same goroutine, which is effectively the same
as a synchronous call without any channels.

This commit removes the goroutine and channels, and instead calls
addServiceInternal directly. Since all of these goroutines will need to
take the agent.stateLock, the mutex handles the serializing of calls.
2021-01-25 18:01:47 -05:00
Daniel Nephin 8b67231e8c agent: update godoc for AddServiceRequest 2021-01-25 18:01:03 -05:00
Daniel Nephin f34703fc63 agent: move checkStateSnapshot
Move the field into the struct for addServiceLocked. Also don't require
setting a default value, so that the callers can leave it as nil if they
don't already have a snapshot.
2021-01-25 18:01:03 -05:00
Daniel Nephin 1495291054 agent: move two fields off of AddServiceRequest 2021-01-25 18:01:03 -05:00
Daniel Nephin 60c6a1c220 agent: Replace two fields on AddServiceRequest with a func field
The two previous fields were mutually exclusive. They can be represented
with a single function which provides the value.
2021-01-25 18:01:03 -05:00
Daniel Nephin 5f9930a9be agent: remove an a branch in the AddService flow
Handle the decision to use ServiceManager in a single place. Instead of
calling ServiceManager.AddService, then calling back into
addServiceInternal, only call ServiceManager.AddService if we are going
to use it.

This change removes some small duplication and removes a branch from the
AddService flow.
2021-01-25 18:01:03 -05:00
Daniel Nephin 4da0718c57 agent: use fields directly, not temp variables
The temprorary variables make it much harder to trace where and how struct
fields are used. If a field is only used a small number of times than
refer to the field directly.
2021-01-25 17:25:04 -05:00
Daniel Nephin 5b6f806f4f agent: addServiceIternalRequest
Move fields that are only relevant for addServiceInternal onto a new
struct and embed AddServiceRequest.
2021-01-25 17:25:04 -05:00
Daniel Nephin 3d39359bcb agent: move deprecated AddServiceFromSource to a test file
The method is only used in tests, and only exists for legacy calls.

There was one other package which used this method in tests. Export
the AddServiceRequest and a couple of its fields so the new function can
be used in those tests.
2021-01-25 17:25:03 -05:00
Daniel Nephin e44fd1cb92 agent: use a single method for Agent.AddService 2021-01-25 17:25:03 -05:00
Daniel Nephin 6757231b82 agent: rename AddService->AddServiceFromSource
In preparation for extracting a single AddService func that accepts a request struct.
2021-01-25 17:25:01 -05:00
Daniel Nephin e66af1a559 agent/consuk: Rename RPCRate -> RPCRateLimit
so that the field name is consistent across config structs.
2021-01-14 17:26:00 -05:00
Daniel Nephin 5684223e36 agent/consul: make Client/Server config reloading more obvious
I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value.

Also improve the test runtime by not doing a lot of unnecessary work.
2021-01-14 17:21:10 -05:00