Commit Graph

568 Commits

Author SHA1 Message Date
Dhia Ayachi 83720e5737
add a rate limiter to config auto-reload (#12490)
* add config watcher to the config package

* add logging to watcher

* add test and refactor to add WatcherEvent.

* add all API calls and fix a bug with recreated files

* add tests for watcher

* remove the unnecessary use of context

* Add debug log and a test for file rename

* use inode to detect if the file is recreated/replaced and only listen to create events.

* tidy ups (#1535)

* tidy ups

* Add tests for inode reconcile

* fix linux vs windows syscall

* fix linux vs windows syscall

* fix windows compile error

* increase timeout

* use ctime ID

* remove remove/creation test as it's a use case that fail in linux

* fix linux/windows to use Ino/CreationTime

* fix the watcher to only overwrite current file id

* fix linter error

* fix remove/create test

* set reconcile loop to 200 Milliseconds

* fix watcher to not trigger event on remove, add more tests

* on a remove event try to add the file back to the watcher and trigger the handler if success

* fix race condition

* fix flaky test

* fix race conditions

* set level to info

* fix when file is removed and get an event for it after

* fix to trigger handler when we get a remove but re-add fail

* fix error message

* add tests for directory watch and fixes

* detect if a file is a symlink and return an error on Add

* rename Watcher to FileWatcher and remove symlink deref

* add fsnotify@v1.5.1

* fix go mod

* do not reset timer on errors, rename OS specific files

* rename New func

* events trigger on write and rename

* add missing test

* fix flaking tests

* fix flaky test

* check reconcile when removed

* delete invalid file

* fix test to create files with different mod time.

* back date file instead of sleeping

* add watching file in agent command.

* fix watcher call to use new API

* add configuration and stop watcher when server stop

* add certs as watched files

* move FileWatcher to the agent start instead of the command code

* stop watcher before replacing it

* save watched files in agent

* add add and remove interfaces to the file watcher

* fix remove to not return an error

* use `Add` and `Remove` to update certs files

* fix tests

* close events channel on the file watcher even when the context is done

* extract `NotAutoReloadableRuntimeConfig` is a separate struct

* fix linter errors

* add Ca configs and outgoing verify to the not auto reloadable config

* add some logs and fix to use background context

* add tests to auto-config reload

* remove stale test

* add tests to changes to config files

* add check to see if old cert files still trigger updates

* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`

* fix to re add both key and cert file. Add test to cover this case.

* review suggestion

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add check to static runtime config changes

* fix test

* add changelog file

* fix review comments

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* update flag description

Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>

* fix compilation error

* add static runtime config support

* fix test

* fix review comments

* fix log test

* Update .changelog/12329.txt

Co-authored-by: Dan Upton <daniel@floppy.co>

* transfer tests to runtime_test.go

* fix filewatcher Replace to not deadlock.

* avoid having lingering locks

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* split ReloadConfig func

* fix warning message

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* convert `FileWatcher` into an interface

* fix compilation errors

* fix tests

* extract func for adding and removing files

* add a coalesceTimer with a very small timer

* extract coaelsce Timer and add a shim for testing

* add tests to coalesceTimer fix to send remaining events

* set `coalesceTimer` to 1 Second

* support symlink, fix a nil deref.

* fix compile error

* fix compile error

* refactor file watcher rate limiting to be a Watcher implementation

* fix linter issue

* fix runtime config

* fix runtime test

* fix flaky tests

* fix compile error

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix agent New to return an error if File watcher New return an error

* quit timer loop if ctx is canceled

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-04-04 11:31:39 -04:00
FFMMM 973d2d0f9a
mark disable_compat_1.9 to deprecate in 1.13, change default to true (#12675)
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-04-01 10:35:56 -07:00
Dhia Ayachi 16b19dd82d
auto-reload configuration when config files change (#12329)
* add config watcher to the config package

* add logging to watcher

* add test and refactor to add WatcherEvent.

* add all API calls and fix a bug with recreated files

* add tests for watcher

* remove the unnecessary use of context

* Add debug log and a test for file rename

* use inode to detect if the file is recreated/replaced and only listen to create events.

* tidy ups (#1535)

* tidy ups

* Add tests for inode reconcile

* fix linux vs windows syscall

* fix linux vs windows syscall

* fix windows compile error

* increase timeout

* use ctime ID

* remove remove/creation test as it's a use case that fail in linux

* fix linux/windows to use Ino/CreationTime

* fix the watcher to only overwrite current file id

* fix linter error

* fix remove/create test

* set reconcile loop to 200 Milliseconds

* fix watcher to not trigger event on remove, add more tests

* on a remove event try to add the file back to the watcher and trigger the handler if success

* fix race condition

* fix flaky test

* fix race conditions

* set level to info

* fix when file is removed and get an event for it after

* fix to trigger handler when we get a remove but re-add fail

* fix error message

* add tests for directory watch and fixes

* detect if a file is a symlink and return an error on Add

* rename Watcher to FileWatcher and remove symlink deref

* add fsnotify@v1.5.1

* fix go mod

* do not reset timer on errors, rename OS specific files

* rename New func

* events trigger on write and rename

* add missing test

* fix flaking tests

* fix flaky test

* check reconcile when removed

* delete invalid file

* fix test to create files with different mod time.

* back date file instead of sleeping

* add watching file in agent command.

* fix watcher call to use new API

* add configuration and stop watcher when server stop

* add certs as watched files

* move FileWatcher to the agent start instead of the command code

* stop watcher before replacing it

* save watched files in agent

* add add and remove interfaces to the file watcher

* fix remove to not return an error

* use `Add` and `Remove` to update certs files

* fix tests

* close events channel on the file watcher even when the context is done

* extract `NotAutoReloadableRuntimeConfig` is a separate struct

* fix linter errors

* add Ca configs and outgoing verify to the not auto reloadable config

* add some logs and fix to use background context

* add tests to auto-config reload

* remove stale test

* add tests to changes to config files

* add check to see if old cert files still trigger updates

* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`

* fix to re add both key and cert file. Add test to cover this case.

* review suggestion

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add check to static runtime config changes

* fix test

* add changelog file

* fix review comments

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* update flag description

Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>

* fix compilation error

* add static runtime config support

* fix test

* fix review comments

* fix log test

* Update .changelog/12329.txt

Co-authored-by: Dan Upton <daniel@floppy.co>

* transfer tests to runtime_test.go

* fix filewatcher Replace to not deadlock.

* avoid having lingering locks

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* split ReloadConfig func

* fix warning message

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* convert `FileWatcher` into an interface

* fix compilation errors

* fix tests

* extract func for adding and removing files

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
2022-03-31 15:11:49 -04:00
Dan Upton 7298967070
Restructure gRPC server setup (#12586)
OSS sync of enterprise changes at 0b44395e
2022-03-22 12:40:24 +00:00
Dan Upton b36d4e16b6
Support per-listener TLS configuration ⚙️ (#12504)
Introduces the capability to configure TLS differently for Consul's
listeners/ports (i.e. HTTPS, gRPC, and the internal multiplexed RPC
port) which is useful in scenarios where you may want the HTTPS or
gRPC interfaces to present a certificate signed by a well-known/public
CA, rather than the certificate used for internal communication which
must have a SAN in the form `server.<dc>.consul`.
2022-03-18 10:46:58 +00:00
Eric cf3e517d0e Create and wire up the serverless patcher 2022-03-15 10:12:57 -04:00
Evan Culver 602e08ada7
checks: populate interval and timeout when registering services (#11138) 2022-02-18 12:05:33 -08:00
Dhia Ayachi 4f0a71d7b4
fix race when starting a service while the agent `serviceManager` is … (#12302)
* fix race when starting a service while the agent `serviceManager` is stopping

* add changelog
2022-02-10 13:30:49 -05:00
Daniel Nephin edca8d61a3 acl: remove ResolveTokenToIdentity
By exposing the AccessorID from the primary ResolveToken method we can
remove this duplication.
2022-01-22 14:47:59 -05:00
Daniel Nephin a5e8af79c3 acl: return a resposne from ResolveToken that includes the ACLIdentity
So that we can duplicate duplicate methods.
2022-01-22 14:33:09 -05:00
Dan Upton ca3aca92c4
[OSS] Remove remaining references to master (#11827) 2022-01-20 12:47:50 +00:00
Dan Upton 7fe81171d9
Rename `ACLMasterToken` => `ACLInitialManagementToken` (#11746) 2021-12-07 12:39:28 +00:00
Freddy e246defb6c
Merge pull request #11720 from hashicorp/bbolt 2021-12-03 14:44:36 -07:00
R.B. Boyer c46f9f9f31
agent: add variation of force-leave that exclusively works on the WAN (#11722)
Fixes #6548
2021-12-02 17:15:10 -06:00
Daniel Nephin e47cecc653 config: add NoFreelistSync option
# Conflicts:
#	agent/config/testdata/TestRuntimeConfig_Sanitize-enterprise.golden
#	agent/consul/server.go
2021-12-02 16:56:15 -05:00
R.B. Boyer dd4a59db8e
agent: purge service/check registration files for incorrect partitions on reload (#11607) 2021-11-18 14:44:20 -06:00
R.B. Boyer eb21649f82
partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
R.B. Boyer 44c023a302
segments: ensure that the serf_lan_allowed_cidrs applies to network segments (#11495) 2021-11-04 17:17:19 -05:00
Mark Anderson 7e8228a20b
Remove some usage of md5 from the system (#11491)
* Remove some usage of md5 from the system

OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253

This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-11-04 13:07:54 -07:00
Daniel Nephin 51d8417545
Merge pull request #10690 from tarat44/h2c-support-in-ping-checks
add support for h2c in h2 ping health checks
2021-11-02 13:53:06 -04:00
Daniel Nephin b57cae94de
Merge pull request #10771 from hashicorp/dnephin/emit-telemetry-metrics-immediately
telemetry: improve cert expiry metrics
2021-11-01 18:31:03 -04:00
R.B. Boyer af9ffc214d
agent: add a clone function for duplicating the serf lan configuration (#11443) 2021-10-28 16:11:26 -05:00
Daniel Nephin a8e2e1c365 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin c92513ec16 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
freddygv 954d21c6ba Register the ExportingPartitions cache type 2021-10-27 11:15:25 -06:00
R.B. Boyer ef559dfdd4
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
Chris S. Kim fa293362be
agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
tarat44 c1ed3a9a94 change config option to H2PingUseTLS 2021-10-05 00:12:21 -04:00
tarat44 3c9f5a73d9 add support for h2c in h2 ping health checks 2021-10-04 22:51:08 -04:00
Daniel Nephin d12dd48c61 acl: remove ACL upgrading from Clients
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.

This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.

This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
2021-09-29 14:02:38 -04:00
Daniel Nephin cc310224aa command/envoy: stop using the DebugConfig from Self endpoint
The DebugConfig in the self endpoint can change at any time. It's not a stable API.

This commit adds the XDSPort to a stable part of the XDS api, and changes the envoy command to read
this new field.

It includes support for the old API as well, in case a newer CLI is used with an older API, and
adds a test for both cases.
2021-09-29 13:21:28 -04:00
Daniel Nephin 1502547e38 Revert "Merge pull request #10588 from hashicorp/dnephin/config-fix-ports-grpc"
This reverts commit 74fb650b6b, reversing
changes made to 58bd817336.
2021-09-29 12:28:41 -04:00
Chris S. Kim e3248c20c9
agent: Clean up unused built-in proxy config (#11165) 2021-09-28 11:29:10 -04:00
Daniel Nephin 1f9479603c
Add failures_before_warning to checks (#10969)
Signed-off-by: Jakub Sokołowski <jakub@status.im>

* agent: add failures_before_warning setting

The new setting allows users to specify the number of check failures
that have to happen before a service status us updated to be `warning`.
This allows for more visibility for detected issues without creating
alerts and pinging administrators. Unlike the previous behavior, which
caused the service status to not update until it reached the configured
`failures_before_critical` setting, now Consul updates the Web UI view
with the `warning` state and the output of the service check when
`failures_before_warning` is breached.

The default value of `FailuresBeforeWarning` is the same as the value of
`FailuresBeforeCritical`, which allows for retaining the previous default
behavior of not triggering a warning.

When `FailuresBeforeWarning` is set to a value higher than that of
`FailuresBeforeCritical it has no effect as `FailuresBeforeCritical`
takes precedence.

Resolves: https://github.com/hashicorp/consul/issues/10680

Signed-off-by: Jakub Sokołowski <jakub@status.im>

Co-authored-by: Jakub Sokołowski <jakub@status.im>
2021-09-14 12:47:52 -04:00
R.B. Boyer 097e1645e3
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00
Daniel Nephin 17841248dd config: remove ACLResolver settings from RuntimeConfig 2021-08-17 13:32:52 -04:00
Daniel Nephin 31e034215f acl: remove ACLResolver config fields from consul.Config 2021-08-17 13:32:52 -04:00
Daniel Nephin c85c62dffb
Merge pull request #10807 from hashicorp/dnephin/remove-acl-datacenter
config: remove ACLDatacenter
2021-08-17 13:07:09 -04:00
Daniel Nephin 0575498d0d proxycfg: Lookup the agent token as a default
When no ACL token is provided with the service registration.
2021-08-12 15:51:34 -04:00
Daniel Nephin 7160f7a614 acl: remove ACLDatacenter
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.

This change was made by GoLand refactor, which did most of the work for me.
2021-08-06 18:27:00 -04:00
Daniel Nephin 8c575445da telemetry: add a metric for agent TLS cert expiry 2021-08-04 13:51:44 -04:00
Daniel Nephin 242b3a2dc5 streaming: set a default timeout
The blocking query backend sets the default value on the server side.
The streaming backend does not using blocking queries, so we must set the timeout on
the client.
2021-07-28 17:50:00 -04:00
R.B. Boyer 188e8dc51f
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Daniel Nephin 74fb650b6b
Merge pull request #10588 from hashicorp/dnephin/config-fix-ports-grpc
config: rename `ports.grpc` to `ports.xds`
2021-07-13 13:11:38 -04:00
Daniel Nephin be8c675942 config: remove misleading UseTLS field
This field was documented as enabling TLS for outgoing RPC, but that was not the case.
All this field did was set the use_tls serf tag.

Instead of setting this field in a place far from where it is used, move the logic to where
the serf tag is set, so that the code is much more obvious.
2021-07-09 19:01:45 -04:00
Daniel Nephin 70770db345 config: remove duplicate TLSConfig fields from agent/consul.Config
tlsutil.Config already presents an excellent structure for this
configuration. Copying the runtime config fields to agent/consul.Config
makes code harder to trace, and provides no advantage.

Instead of copying the fields around, use the tlsutil.Config struct
directly instead.

This is one small step in removing the many layers of duplicate
configuration.
2021-07-09 18:49:42 -04:00
Daniel Nephin 895bf9adec config: update GRPCPort and addr in runtime config 2021-07-09 12:31:53 -04:00
Daniel Nephin 7d73fd7ae5 rename GRPC->XDS where appropriate 2021-07-09 12:17:45 -04:00
Daniel Nephin c78391797d streaming: fix enable of streaming in the client
And add checks to all the tests that explicitly use streaming.
2021-06-28 17:23:14 -04:00
Matt Keeler da31e0449e Move some things around to allow for license updating via config reload
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
2021-05-25 09:57:50 -04:00
Matt Keeler caafc02449 hcs-1936: Prepare for adding license auto-retrieval to auto-config in enterprise 2021-05-24 13:20:30 -04:00
Matt Keeler 234d0a3c2a Preparation for changing where license management is done. 2021-05-24 10:19:31 -04:00
Iryna Shustava d7d44f6ae7
Save exposed ports in agent's store and expose them via API (#10173)
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.
2021-05-12 13:51:39 -07:00
Paul Banks 3ad754ca7b
Make Raft trailing logs and snapshot timing reloadable (#10129)
* WIP reloadable raft config

* Pre-define new raft gauges

* Update go-metrics to change gauge reset behaviour

* Update raft to pull in new metric and reloadable config

* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important

* Update telemetry docs

* Update config and telemetry docs

* Add note to oldestLogAge on when it is visible

* Add changelog entry

* Update website/content/docs/agent/options.mdx

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-05-04 15:36:53 +01:00
R.B. Boyer 71d45a3460
Support Incremental xDS mode (#9855)
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.

Union of all commit messages follows to give an overarching summary:

xds: exclusively support incremental xDS when using xDS v3

Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support

Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit

xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings

In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.

This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.

xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
2021-04-29 13:54:05 -05:00
Daniel Nephin 10ec9c2be3 rpcclient: close the grpc.ClientConn on shutdown 2021-04-27 19:03:16 -04:00
Daniel Nephin e229b877d8 health: create health.Client in Agent.New 2021-04-27 19:03:16 -04:00
Daniel Nephin 0558586dbd health: use blocking queries for near query parameter 2021-04-27 19:03:16 -04:00
Daniel Nephin 2a26085b2c connect: do not set QuerySource.Node
Setting this field to a value is equivalent to using the 'near' query paramter.
The intent is to sort the results by proximity to the node requesting
them. However with connect we send the results to envoy, which doesn't
care about the order, so setting this field is increasing the work
performed for no gain.

It is necessary to unset this field now because we would like connect
to use streaming, but streaming does not support sorting by proximity.
2021-04-27 19:03:16 -04:00
Matt Keeler bbf5993534
Move static token resolution into the ACLResolver (#10013) 2021-04-14 12:39:35 -04:00
Tara Tufano 9deb52e868
add http2 ping health checks (#8431)
* add http2 ping checks

* fix test issue

* add h2ping check to config resources

* add new test and docs for h2ping

* fix grammatical inconsistency in H2PING documentation

* resolve rebase conflicts, add test for h2ping tls verification failure

* api documentation for h2ping

* update test config data with H2PING

* add H2PING to protocol buffers and update changelog

* fix typo in changelog entry
2021-04-09 15:12:10 -04:00
R.B. Boyer e494313e7b
api: ensure v1/health/ingress/:service endpoint works properly when streaming is enabled (#9967)
The streaming cache type for service health has no way to handle v1/health/ingress/:service queries as there is no equivalent topic that would return the appropriate data.

Ensure that attempts to use this endpoint will use the old cache-type for now so that they return appropriate data when streaming is enabled.
2021-04-05 13:23:00 -05:00
freddygv f4f45af6d0 Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv a54d6a9010 Update proxycfg for transparent proxy 2021-03-17 13:40:39 -06:00
Christopher Broglie f0307c73e5 Add support for configuring TLS ServerName for health checks
Some TLS servers require SNI, but the Golang HTTP client doesn't
include it in the ClientHello when connecting to an IP address. This
change adds a new TLSServerName field to health check definitions to
optionally set it. This fixes #9473.
2021-03-16 18:16:44 -04:00
Daniel Nephin f40b76af2d proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin 906834ce8e proxycfg: Use streaming in connect state 2021-03-12 11:35:42 -05:00
Daniel Nephin 1a764553c0 rpcclient: use streaming for connect health 2021-03-12 11:35:42 -05:00
freddygv acec711a6a Update server-side config resolution and client-side merging 2021-03-10 21:05:11 -07:00
Daniel Nephin 8c45f2c1fe agent: use the new lib/mutex for stateLock
Previously the ServiceManager had to run a separate goroutine so that it could block on a channel
send/receive instead of a lock. Using this mutex with TryLock allows us to cancel the lock when
the serviceConfigWatch is stopped.

Without this change removing the ServiceManager.Start goroutine would not be possible because
when AddService is called it acquires the stateLock. While that lock is held, if there are
existing watches for the service, the old watch will be stopped, and the goroutine holding the
lock will attempt to wait for that watcher goroutine to exit.

If the goroutine is handling an update (serviceConfigWatch.handleUpdate) then it can block on
acquiring the stateLock and deadlock the agent.  With this change the context is cancelled as
and the goroutine will exit instead of waiting on the stateLock.
2021-01-25 18:01:47 -05:00
Daniel Nephin 4e42b8f1d4 agent: remove ServiceManager goroutine
The ServiceManager.Start goroutine was used to serialize calls to
agent.addServiceInternal.

All the goroutines which sent events to the channel would block waiting
for a response from that same goroutine, which is effectively the same
as a synchronous call without any channels.

This commit removes the goroutine and channels, and instead calls
addServiceInternal directly. Since all of these goroutines will need to
take the agent.stateLock, the mutex handles the serializing of calls.
2021-01-25 18:01:47 -05:00
Daniel Nephin 8b67231e8c agent: update godoc for AddServiceRequest 2021-01-25 18:01:03 -05:00
Daniel Nephin f34703fc63 agent: move checkStateSnapshot
Move the field into the struct for addServiceLocked. Also don't require
setting a default value, so that the callers can leave it as nil if they
don't already have a snapshot.
2021-01-25 18:01:03 -05:00
Daniel Nephin 1495291054 agent: move two fields off of AddServiceRequest 2021-01-25 18:01:03 -05:00
Daniel Nephin 60c6a1c220 agent: Replace two fields on AddServiceRequest with a func field
The two previous fields were mutually exclusive. They can be represented
with a single function which provides the value.
2021-01-25 18:01:03 -05:00
Daniel Nephin 5f9930a9be agent: remove an a branch in the AddService flow
Handle the decision to use ServiceManager in a single place. Instead of
calling ServiceManager.AddService, then calling back into
addServiceInternal, only call ServiceManager.AddService if we are going
to use it.

This change removes some small duplication and removes a branch from the
AddService flow.
2021-01-25 18:01:03 -05:00
Daniel Nephin 4da0718c57 agent: use fields directly, not temp variables
The temprorary variables make it much harder to trace where and how struct
fields are used. If a field is only used a small number of times than
refer to the field directly.
2021-01-25 17:25:04 -05:00
Daniel Nephin 5b6f806f4f agent: addServiceIternalRequest
Move fields that are only relevant for addServiceInternal onto a new
struct and embed AddServiceRequest.
2021-01-25 17:25:04 -05:00
Daniel Nephin 3d39359bcb agent: move deprecated AddServiceFromSource to a test file
The method is only used in tests, and only exists for legacy calls.

There was one other package which used this method in tests. Export
the AddServiceRequest and a couple of its fields so the new function can
be used in those tests.
2021-01-25 17:25:03 -05:00
Daniel Nephin e44fd1cb92 agent: use a single method for Agent.AddService 2021-01-25 17:25:03 -05:00
Daniel Nephin 6757231b82 agent: rename AddService->AddServiceFromSource
In preparation for extracting a single AddService func that accepts a request struct.
2021-01-25 17:25:01 -05:00
Daniel Nephin e66af1a559 agent/consuk: Rename RPCRate -> RPCRateLimit
so that the field name is consistent across config structs.
2021-01-14 17:26:00 -05:00
Daniel Nephin 5684223e36 agent/consul: make Client/Server config reloading more obvious
I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value.

Also improve the test runtime by not doing a lot of unnecessary work.
2021-01-14 17:21:10 -05:00
Daniel Nephin a04cefaa28 Remove an unnecessary else 2021-01-07 18:13:49 -05:00
Daniel Nephin 4b8b2a4291 xds: remove Server.Initialize
Requiring a call to initialize to set a single field is not really substantially different
from having to set that field to a value.
2021-01-07 18:13:48 -05:00
Daniel Nephin 375aed5ed6 xds: Pass in logger
small cleanup in tests
2021-01-07 18:13:48 -05:00
Daniel Nephin d64425d2e4
Merge pull request #9213 from hashicorp/dnephin/resolve-tokens-take-2
acl: Remove some unused things and document delegate method
2021-01-06 18:51:51 -05:00
Michael Montgomery e4f603dfae Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-12-30 14:14:05 -06:00
Kenia 27f6899ec8
Create consul version metric with version label (#9350)
* create consul version metric with version label

* agent/agent.go: add pre-release Version as well as label

Co-Authored-By: Radha13 <kumari.radha3@gmail.com>

* verion and pre-release version labels.

* hyphen/- breaks prometheus

* Add Prometheus gauge defintion for version metric

* Add new metric to telemetry docs

Co-authored-by: Radha Kumari <kumari.radha3@gmail.com>
Co-authored-by: Aestek <thib.gilles@gmail.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-12-09 09:16:53 -05:00
Michael Montgomery 585c84e9ff Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-11-20 07:43:53 -06:00
Daniel Nephin 738bf9efdc agent: fix bug with multiple listeners
Previously the listener was being passed to a closure in a loop without
capturing the loop variable. The result is only the last listener is
used, so the http/https servers only listen on one address.

This problem is fixed by capturing the variable by passing it into a
function.
2020-11-18 13:03:29 -05:00
Daniel Nephin 0ee86935f0 Remove two unused delegate methods 2020-11-17 18:16:26 -05:00
Matt Keeler 66fd23d67f
Refactor to call non-voting servers read replicas (#9191)
Co-authored-by: Kit Patella <kit@jepsen.io>
2020-11-17 10:53:57 -05:00
Michael Montgomery 5b6ac035ff Resolves #6074. Adds new option to configure HTTP Server's MaxHeaderBytes with option `-http-max-header-bytes`
Adds tests for behavior
2020-10-29 12:38:19 -05:00
Daniel Nephin bd44952c2e streaming: disable streaming when requesting connect events
Until the correct events are created for terminating gateways.
2020-10-26 11:55:49 -04:00
Daniel Nephin 853667e7d8 health: change the name of UseStreamingBackend config
Remove it from the cache section, and update the docs.
2020-10-23 17:47:01 -04:00
Daniel Nephin 3a55c30a05
Merge pull request #8924 from ShimmerGlass/fix-sidecar-deregister-after-restart
Fix: service LocallyRegisteredAsSidecar property is not persisted
2020-10-22 13:26:55 -04:00
Mathilde Gilles 1c8369b3c3 Fix: service LocallyRegisteredAsSidecar property is not persisted
When a service is deregistered, we check whever matching services were
registered as sidecar along with it and deregister them as well.
To determine if a service is indeed a sidecar we check the
structs.ServiceNode.LocallyRegisteredAsSidecar property. However, to
avoid interal API leakage, it is excluded from JSON serialization,
meaning it is not persisted to disk either.
When the agent is restarted, this property lost and sidecars are no
longer deregistered along with their parent service.
To fix this, we now specifically save this property in the persisted
service file.
2020-10-13 19:38:58 +02:00
Daniel Nephin ea77eccb14
Merge pull request #8825 from hashicorp/streaming/add-config
streaming: add config and docs
2020-10-09 14:33:58 -04:00
Daniel Nephin e7d505dc33 config: add field for enabling streaming in the client
agent: register the new streaming cache-type
2020-10-09 14:11:34 -04:00
Kit Patella adeabf2399
Merge pull request #8877 from hashicorp/mkcp/telemetry/consul.api.http
Add flag for disabling 1.9 metrics backwards compatibility and warnings when set to default
2020-10-08 13:22:37 -07:00
Matt Keeler 38f5ddce2a
Add per-agent reconnect timeouts (#8781)
This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it.

This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.
2020-10-08 15:02:19 -04:00
Daniel Nephin b93577c94f config: add field for enabling streaming RPC endpoint 2020-10-08 12:11:20 -04:00
Kit Patella 7fe2f80b4b add config flag to disable 1.9 metrics backwards compatibility. Add warnings on start and reload on default value 2020-10-07 17:12:52 -07:00
Daniel Nephin 529f252d5c rpcclient: Add health.Client and use it in http and dns
This new package provides a client agent implementation of an interface
for fetching the health of services.

This approach has a number of benefits:

1. It provides a much more explicit interface. Instead of everything
   dependency on `RPC()` and `Cache.Get()` for many unrelated things
   they can depend on a type that are named according to the behaviour
   it provides.

2. It gives us a single place to vary the behaviour and migrate to
   a new form of RPC (gRPC). The current implementation has two options
   (cache, or direct RPC), and in the future we will have more.
   It is also a great opporunity to start adding `context.Context` args
   to these operations, which in the future will allow us to cancel
   the operations.

3. As a concequence of the first, in the Server agent where we make
   these calls we can replace the current in-memory RPC calls with
   a thin adapter for the real method. This removes the `net/rpc`
   machinery from the call in places where it is not needed.

This new package is quite small right now, but I think we can expect it
to grow to a more reasonable size as other RPC calls are replaced.

This change also happens to replace two very similar implementations with
a single implementation.
2020-10-04 18:55:02 -04:00
Paul Banks e4db845246
Refactor uiserver to separate package, cleaner Reloading 2020-10-01 11:32:25 +01:00
Paul Banks f6d55e1d25
Fix reload test; address other PR feedback 2020-09-30 18:00:07 +01:00
Paul Banks 526bab6164
Add config changes for UI metrics 2020-09-30 17:59:16 +01:00
R.B. Boyer 7eef25daf5
agent: when enable_central_service_config is enabled ensure agent reload doesn't revert check state to critical (#8747)
Likely introduced when #7345 landed.
2020-09-24 16:24:04 -05:00
Daniel Nephin c18516ad7d
Merge pull request #8680 from hashicorp/dnephin/replace-consul-opts-with-base-deps
agent: Repalce ConsulOptions with a new struct from agent.BaseDeps
2020-09-24 12:45:54 -04:00
Daniel Nephin 282fbdfa75 api: rename HTTPServer to HTTPHandlers
Resolves a TODO about naming. This type is a set of handlers for an http.Server, it is not
itself a Server. It provides http.Handler functions.
2020-09-18 17:38:23 -04:00
Daniel Nephin cdd392d77f agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 4c9ed41eab
Merge pull request #8554 from hashicorp/dnephin/agent-setup-persisted-tokens
agent: move token persistence from agent into token.Store
2020-09-03 17:29:21 -04:00
Daniel Nephin 6ca45e1a61 agent: add apiServers type for managing HTTP servers
Remove Server field from HTTPServer. The field is no longer used.
2020-09-03 13:40:12 -04:00
Daniel Nephin 330be5b740 agent/token: Move token persistence out of agent
And into token.Store. This change isolates any awareness of token
persistence in a single place.

It is a small step in allowing Agent.New to accept its dependencies.
2020-08-31 15:00:34 -04:00
Matt Keeler 91d680b830
Merge of auto-config and auto-encrypt code (#8523)
auto-encrypt is now handled as a special case of auto-config.

This also is moving all the cert-monitor code into the auto-config package.
2020-08-31 13:12:17 -04:00
Daniel Nephin 72bf350069
Merge pull request #8552 from pierresouchay/reload_cache_throttling_config
Ensure that Cache options are reloaded when `consul reload` is performed
2020-08-28 15:04:42 -04:00
R.B. Boyer 74d5df7c7a
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
Matt Keeler f97cc0445a
Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 11:23:52 -04:00
Pierre Souchay d2be9d38da Ensure that Cache options are reloaded when `consul reload` is performed.
This will apply cache throttling parameters are properly applied:
 * cache.EntryFetchMaxBurst
 * cache.EntryFetchRate

When values are updated, a log is displayed in info.
2020-08-24 23:33:10 +02:00
Daniel Nephin e16375216d config: use logging.Config in RuntimeConfig
To add structure to RuntimeConfig, and remove the need to translate into a third type.
2020-08-19 13:21:00 -04:00
Daniel Nephin f2373a5575 logging: move init of grpclog
This line initializes global state. Moving it out of the constructor and closer to where logging
is setup helps keep related things together.
2020-08-19 13:21:00 -04:00
Daniel Nephin 63bad36de7 testing: disable global metrics sink in tests
This might be better handled by allowing configuration for the InMemSink interval and retail, and disabling
the global. For now this is a smaller change to remove the goroutine leak caused by tests because go-metrics
does not provide any way of shutting down the global goroutine.
2020-08-18 19:04:57 -04:00
Daniel Nephin 5d4df54296 agent: extract dependency creation from New
With this change, Agent.New() accepts many of the dependencies instead
of creating them in New. Accepting fully constructed dependencies from
a constructor makes the type easier to test, and easier to change.

There are still a number of dependencies created in Start() which can
be addressed in a follow up.
2020-08-18 19:04:55 -04:00
Daniel Nephin 35f1ecee0b config: Move remote-script-checks warning to config
Previously it was done in Agent.Start, but it can be done much earlier
2020-08-17 17:39:49 -04:00
Daniel Nephin 27b36bfc4e config: move NodeName validation to config validation
Previsouly it was done in Agent.Start, which is much later then it needs to be.

The new 'dns' package was required, because otherwise there would be an
import cycle. In the future we should move more of the dns server into
the dns package.
2020-08-17 17:25:02 -04:00
Daniel Nephin 399c77dfb6 agent: rename vars in newConsulConfig
'base' is a bit misleading, since it is the return value. Renamed to cfg.
2020-08-13 11:58:21 -04:00
Daniel Nephin 7b5b170a0d agent: Move setupKeyring functions to keyring.go
There are a couple reasons for this change:

1. agent.go is way too big. Smaller files makes code eaasier to read
   because tools that show usage also include filename which can give
   a lot more context to someone trying to understand which functions
   call other functions.
2. these two functions call into a large number of functions already in
   keyring.go.
2020-08-13 11:58:21 -04:00
Daniel Nephin 9919e5dfa5 agent: unmethod consulConfig
To allow us to move newConsulConfig out of Agent.
2020-08-13 11:58:21 -04:00
Daniel Nephin 8f596f5551 Fix conflict in merged PRs
One PR renamed the var from config->cfg, and another used the old name config, which caused the
build to fail on master.
2020-08-13 11:28:26 -04:00
Daniel Nephin 190fcc14a3
Merge pull request #8463 from hashicorp/dnephin/unmethod-make-node-id
agent: convert NodeID methods to functions
2020-08-13 11:18:11 -04:00
Daniel Nephin 37eacf8192 auto-config: reduce awareness of config
This is a small step to allowing Agent to accept its dependencies
instead of creating them in New.

There were two fields in autoconfig.Config that were used exclusively
to load config. These were replaced with a single function, allowing us
to move LoadConfig back to the config package.

Also removed the WithX functions for building a Config. Since these were
simple assignment, it appeared we were not getting much value from them.
2020-08-12 13:23:23 -04:00
Daniel Nephin 875d8bde42 agent: convert NodeID methods to functions
Making these functions allows us to cleanup how an agent is initialized. They only make use of a config and a logger, so they do not need to be agent methods.

Also cleanup the testing to use t.Run and require.
2020-08-12 13:05:10 -04:00
Daniel Nephin 0738eb8596 Extract nodeID functions to a different file
In preparation for turning them into functions.
To reduce the scope of Agent, and refactor how Agent is created and started.
2020-08-12 13:05:10 -04:00
Daniel Nephin 38980ebb4c config: Make Source an interface
This will allow us to accept config from auto-config without needing to
go through a serialziation cycle.
2020-08-10 12:46:28 -04:00
Daniel Nephin 3b82ad0955 Rename NewClient/NewServer
Now that duplicate constructors have been removed we can use the shorter names for the single constructor.
2020-08-05 14:00:55 -04:00
Daniel Nephin 0420d91cdd Remove LogOutput from Agent
Now that it is no longer used, we can remove this unnecessary field. This is a pre-step in cleanup up RuntimeConfig->Consul.Config, which is a pre-step to adding a gRPCHandler component to Server for streaming.

Removing this field also allows us to remove one of the return values from logging.Setup.
2020-08-05 14:00:44 -04:00
Daniel Nephin 5acf01ceeb Remove LogOutput from Server 2020-08-05 14:00:44 -04:00
Daniel Nephin e8ee2cf2f7 Pass a logger to ConnPool and yamux, instead of an io.Writer
Allowing us to remove the LogOutput field from config.
2020-08-05 13:25:08 -04:00
Daniel Nephin ed8210fe4d api: Use a Logger instead of an io.Writer in api.Watch
So that we can pass around only a Logger, not a LogOutput
2020-08-05 13:25:08 -04:00
Daniel Nephin 1e17a0c3e1 config: Remove unused field 2020-08-05 13:25:08 -04:00
Matt Keeler 1a78cf9b4c
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:37:18 -04:00
Matt Keeler 34034b76f5
Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 15:31:48 -04:00
Pierre Souchay 505de6dc29
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler 2ee9fe0a4d
Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 16:05:28 -04:00
Matt Keeler 9da8c51ac5
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.
2020-07-21 12:19:25 -04:00
Daniel Nephin 653c938edc watch: extract makeWatchPlan to facilitate testing
There is a bug in here now that slices in opaque config are unsliced. But to test that bug fix we
need a function that can be easily tested.
2020-07-10 13:33:45 -04:00
Daniel Nephin f22f3d300d
Merge pull request #8231 from hashicorp/dnephin/unembed-HTTPServer-Server
agent/http: un-embed the http.Server
2020-07-09 17:42:33 -04:00
Daniel Nephin df4088291c agent/http: Update TestSetupHTTPServer_HTTP2
To remove the need to store the http.Server. This will allow us to
remove the http.Server field from the HTTPServer struct.
2020-07-09 16:42:19 -04:00
Daniel Nephin 5247ef4c70 Remove ACLsEnabled from delegate interface
In all cases (oss/ent, client/server) this method was returning a value from config. Since the
value is consistent, it doesn't need to be part of the delegate interface.
2020-07-03 17:00:20 -04:00