Commit Graph

182 Commits

Author SHA1 Message Date
Dhia Ayachi 83720e5737
add a rate limiter to config auto-reload (#12490)
* add config watcher to the config package

* add logging to watcher

* add test and refactor to add WatcherEvent.

* add all API calls and fix a bug with recreated files

* add tests for watcher

* remove the unnecessary use of context

* Add debug log and a test for file rename

* use inode to detect if the file is recreated/replaced and only listen to create events.

* tidy ups (#1535)

* tidy ups

* Add tests for inode reconcile

* fix linux vs windows syscall

* fix linux vs windows syscall

* fix windows compile error

* increase timeout

* use ctime ID

* remove remove/creation test as it's a use case that fail in linux

* fix linux/windows to use Ino/CreationTime

* fix the watcher to only overwrite current file id

* fix linter error

* fix remove/create test

* set reconcile loop to 200 Milliseconds

* fix watcher to not trigger event on remove, add more tests

* on a remove event try to add the file back to the watcher and trigger the handler if success

* fix race condition

* fix flaky test

* fix race conditions

* set level to info

* fix when file is removed and get an event for it after

* fix to trigger handler when we get a remove but re-add fail

* fix error message

* add tests for directory watch and fixes

* detect if a file is a symlink and return an error on Add

* rename Watcher to FileWatcher and remove symlink deref

* add fsnotify@v1.5.1

* fix go mod

* do not reset timer on errors, rename OS specific files

* rename New func

* events trigger on write and rename

* add missing test

* fix flaking tests

* fix flaky test

* check reconcile when removed

* delete invalid file

* fix test to create files with different mod time.

* back date file instead of sleeping

* add watching file in agent command.

* fix watcher call to use new API

* add configuration and stop watcher when server stop

* add certs as watched files

* move FileWatcher to the agent start instead of the command code

* stop watcher before replacing it

* save watched files in agent

* add add and remove interfaces to the file watcher

* fix remove to not return an error

* use `Add` and `Remove` to update certs files

* fix tests

* close events channel on the file watcher even when the context is done

* extract `NotAutoReloadableRuntimeConfig` is a separate struct

* fix linter errors

* add Ca configs and outgoing verify to the not auto reloadable config

* add some logs and fix to use background context

* add tests to auto-config reload

* remove stale test

* add tests to changes to config files

* add check to see if old cert files still trigger updates

* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`

* fix to re add both key and cert file. Add test to cover this case.

* review suggestion

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add check to static runtime config changes

* fix test

* add changelog file

* fix review comments

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* update flag description

Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>

* fix compilation error

* add static runtime config support

* fix test

* fix review comments

* fix log test

* Update .changelog/12329.txt

Co-authored-by: Dan Upton <daniel@floppy.co>

* transfer tests to runtime_test.go

* fix filewatcher Replace to not deadlock.

* avoid having lingering locks

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* split ReloadConfig func

* fix warning message

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* convert `FileWatcher` into an interface

* fix compilation errors

* fix tests

* extract func for adding and removing files

* add a coalesceTimer with a very small timer

* extract coaelsce Timer and add a shim for testing

* add tests to coalesceTimer fix to send remaining events

* set `coalesceTimer` to 1 Second

* support symlink, fix a nil deref.

* fix compile error

* fix compile error

* refactor file watcher rate limiting to be a Watcher implementation

* fix linter issue

* fix runtime config

* fix runtime test

* fix flaky tests

* fix compile error

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix agent New to return an error if File watcher New return an error

* quit timer loop if ctx is canceled

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-04-04 11:31:39 -04:00
Dhia Ayachi 16b19dd82d
auto-reload configuration when config files change (#12329)
* add config watcher to the config package

* add logging to watcher

* add test and refactor to add WatcherEvent.

* add all API calls and fix a bug with recreated files

* add tests for watcher

* remove the unnecessary use of context

* Add debug log and a test for file rename

* use inode to detect if the file is recreated/replaced and only listen to create events.

* tidy ups (#1535)

* tidy ups

* Add tests for inode reconcile

* fix linux vs windows syscall

* fix linux vs windows syscall

* fix windows compile error

* increase timeout

* use ctime ID

* remove remove/creation test as it's a use case that fail in linux

* fix linux/windows to use Ino/CreationTime

* fix the watcher to only overwrite current file id

* fix linter error

* fix remove/create test

* set reconcile loop to 200 Milliseconds

* fix watcher to not trigger event on remove, add more tests

* on a remove event try to add the file back to the watcher and trigger the handler if success

* fix race condition

* fix flaky test

* fix race conditions

* set level to info

* fix when file is removed and get an event for it after

* fix to trigger handler when we get a remove but re-add fail

* fix error message

* add tests for directory watch and fixes

* detect if a file is a symlink and return an error on Add

* rename Watcher to FileWatcher and remove symlink deref

* add fsnotify@v1.5.1

* fix go mod

* do not reset timer on errors, rename OS specific files

* rename New func

* events trigger on write and rename

* add missing test

* fix flaking tests

* fix flaky test

* check reconcile when removed

* delete invalid file

* fix test to create files with different mod time.

* back date file instead of sleeping

* add watching file in agent command.

* fix watcher call to use new API

* add configuration and stop watcher when server stop

* add certs as watched files

* move FileWatcher to the agent start instead of the command code

* stop watcher before replacing it

* save watched files in agent

* add add and remove interfaces to the file watcher

* fix remove to not return an error

* use `Add` and `Remove` to update certs files

* fix tests

* close events channel on the file watcher even when the context is done

* extract `NotAutoReloadableRuntimeConfig` is a separate struct

* fix linter errors

* add Ca configs and outgoing verify to the not auto reloadable config

* add some logs and fix to use background context

* add tests to auto-config reload

* remove stale test

* add tests to changes to config files

* add check to see if old cert files still trigger updates

* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`

* fix to re add both key and cert file. Add test to cover this case.

* review suggestion

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add check to static runtime config changes

* fix test

* add changelog file

* fix review comments

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* update flag description

Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>

* fix compilation error

* add static runtime config support

* fix test

* fix review comments

* fix log test

* Update .changelog/12329.txt

Co-authored-by: Dan Upton <daniel@floppy.co>

* transfer tests to runtime_test.go

* fix filewatcher Replace to not deadlock.

* avoid having lingering locks

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* split ReloadConfig func

* fix warning message

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* convert `FileWatcher` into an interface

* fix compilation errors

* fix tests

* extract func for adding and removing files

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
2022-03-31 15:11:49 -04:00
Evan Culver 602e08ada7
checks: populate interval and timeout when registering services (#11138) 2022-02-18 12:05:33 -08:00
R.B. Boyer b60d89e7ef bulk rewrite using this script
set -euo pipefail

    unset CDPATH

    cd "$(dirname "$0")"

    for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== require: $f ==="
        sed -i '/require := require.New(t)/d' $f
        # require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
        # require.XXX(tblah) but not require.XXX(t, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
        # require.XXX(rblah) but not require.XXX(r, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
        gofmt -s -w $f
    done

    for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== assert: $f ==="
        sed -i '/assert := assert.New(t)/d' $f
        # assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
        # assert.XXX(tblah) but not assert.XXX(t, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
        # assert.XXX(rblah) but not assert.XXX(r, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
        gofmt -s -w $f
    done
2022-01-20 10:46:23 -06:00
Dan Upton 205ce9a69d
Remove references to "master" ACL tokens in tests (#11751) 2021-12-07 12:48:50 +00:00
Dan Upton 3a91815169
agent/token: rename `agent_master` to `agent_recovery` (internally) (#11744) 2021-12-07 12:12:47 +00:00
Daniel Nephin 81afb208ac
Merge pull request #11677 from hashicorp/dnephin/freeport-interface
sdk: use t.Cleanup in freeport and remove unnecessary calls
2021-12-02 15:58:41 -05:00
R.B. Boyer db91cbf484
auto-config: ensure the feature works properly with partitions (#11699) 2021-12-01 13:32:34 -06:00
Daniel Nephin 4185045a7f sdk/freeport: rename Port to GetOne
For better consistency with GetN
2021-11-30 17:32:41 -05:00
Daniel Nephin e8312d6b5a testing: remove unnecessary calls to freeport
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.

https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.

This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.

This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
2021-11-29 12:19:43 -05:00
Daniel Nephin d795a73f78 testing: use the new freeport interfaces 2021-11-27 15:39:46 -05:00
Daniel Nephin 8c7475d95e sdk: add freeport functions that use t.Cleanup 2021-11-27 15:04:43 -05:00
Mark Anderson 7e8228a20b
Remove some usage of md5 from the system (#11491)
* Remove some usage of md5 from the system

OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253

This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-11-04 13:07:54 -07:00
Daniel Nephin 51d8417545
Merge pull request #10690 from tarat44/h2c-support-in-ping-checks
add support for h2c in h2 ping health checks
2021-11-02 13:53:06 -04:00
Dhia Ayachi 2801785710
regenerate expired certs (#11462)
* regenerate expired certs

* add documentation to generate tests certificates
2021-11-01 11:40:16 -04:00
R.B. Boyer ef559dfdd4
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
tarat44 1e8e44d442 fix formatting 2021-10-05 00:12:23 -04:00
tarat44 c1ed3a9a94 change config option to H2PingUseTLS 2021-10-05 00:12:21 -04:00
tarat44 3c9f5a73d9 add support for h2c in h2 ping health checks 2021-10-04 22:51:08 -04:00
R.B. Boyer 5b6d96d27d
grpc: ensure that streaming gRPC requests work over mesh gateway based wan federation (#10838)
Fixes #10796
2021-08-24 16:28:44 -05:00
Evan Culver 710bd90ef7
checks: Add Interval and Timeout to API response (#10717) 2021-08-03 15:26:49 -07:00
R.B. Boyer fc9b1a277d
sync changes to oss files made in enterprise (#10670) 2021-07-22 13:58:08 -05:00
R.B. Boyer 188e8dc51f
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Daniel Nephin 970f5d78ec agent: fix two data race in agent tests
The LogOutput io.Writer used by TestAgent must allow concurrent reads and writes, and a
bytes.Buffer does not allow this. The bytes.Buffer must be wrapped with a lock to make this safe.
2021-07-14 18:58:16 -04:00
Daniel Nephin 3a045cca8d ca: remove unused RotationPeriod field
This field was never used. Since it is persisted as part of a map[string]interface{} it
is pretty easy to remove it.
2021-07-05 19:15:44 -04:00
Daniel Nephin 8b365f8271 Remove a racy and failing test
This test is super racy (it's not just a single line).

This test also starts failing once streaming is enabled, because the
cache rate limit no longer applies to the requests in the test. The
queries use streaming instead of the cache.

This test is no longer valid, and the functionality is already well
tested by TestCacheThrottle.  Instead of spending time rewriting this
test, let's remove it.

```
WARNING: DATA RACE
Read at 0x00c01de410fc by goroutine 735:
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:1024 +0x9af
  github.com/hashicorp/consul/testrpc.WaitForTestAgent()
      /home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Previous write at 0x00c01de410fc by goroutine 605:
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1.2()
      /home/daniel/pers/code/consul/agent/agent_test.go:998 +0xe9

Goroutine 735 (running) created at:
  testing.(*T).Run()
      /usr/lib/go/src/testing/testing.go:1238 +0x5d7
  github.com/hashicorp/consul/agent.TestCacheRateLimit()
      /home/daniel/pers/code/consul/agent/agent_test.go:961 +0x375
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Goroutine 605 (finished) created at:
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:1022 +0x91e
  github.com/hashicorp/consul/testrpc.WaitForTestAgent()
      /home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202
```
2021-06-28 17:23:13 -04:00
R.B. Boyer a2876453a5
connect/ca: cease including the common name field in generated certs (#10424)
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.

Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.

Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:

    ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
2021-06-25 13:00:00 -05:00
Matt Keeler 234d0a3c2a Preparation for changing where license management is done. 2021-05-24 10:19:31 -04:00
Iryna Shustava d7d44f6ae7
Save exposed ports in agent's store and expose them via API (#10173)
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.
2021-05-12 13:51:39 -07:00
Daniel Nephin e229b877d8 health: create health.Client in Agent.New 2021-04-27 19:03:16 -04:00
Tara Tufano 9deb52e868
add http2 ping health checks (#8431)
* add http2 ping checks

* fix test issue

* add h2ping check to config resources

* add new test and docs for h2ping

* fix grammatical inconsistency in H2PING documentation

* resolve rebase conflicts, add test for h2ping tls verification failure

* api documentation for h2ping

* update test config data with H2PING

* add H2PING to protocol buffers and update changelog

* fix typo in changelog entry
2021-04-09 15:12:10 -04:00
Daniel Nephin f36d77ee4a
Merge pull request #9301 from hashicorp/dnephin/add-service-2
agent: reduce AddService 2
2021-01-26 12:01:34 -05:00
R.B. Boyer c608dc0d60
server: initialize mgw-wanfed to use local gateways more on startup (#9528)
Fixes #9342
2021-01-25 17:30:38 -06:00
Daniel Nephin 1495291054 agent: move two fields off of AddServiceRequest 2021-01-25 18:01:03 -05:00
Daniel Nephin 3d39359bcb agent: move deprecated AddServiceFromSource to a test file
The method is only used in tests, and only exists for legacy calls.

There was one other package which used this method in tests. Export
the AddServiceRequest and a couple of its fields so the new function can
be used in those tests.
2021-01-25 17:25:03 -05:00
Daniel Nephin 6757231b82 agent: rename AddService->AddServiceFromSource
In preparation for extracting a single AddService func that accepts a request struct.
2021-01-25 17:25:01 -05:00
Daniel Nephin d113f0e690 structs: Fix printing of IDs
These types are used as values (not pointers) in other structs. Using a pointer receiver causes
problems when the value is printed. fmt will not call the String method if it is passed a value
and the String method has a pointer receiver. By using a value receiver the correct string is printed.

Also remove some unused methods.
2021-01-07 18:47:38 -05:00
Michael Montgomery e4f603dfae Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-12-30 14:14:05 -06:00
Michael Montgomery 817903b925 Fixed failing tests
Removed use of `NewTestAgent`, per review comment
Removed CLI flag, per review comment
Updated website documentation
Added changelog entry
2020-12-30 14:09:50 -06:00
Daniel Nephin b9e60c0775 testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Michael Montgomery 585c84e9ff Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-11-20 07:43:53 -06:00
Daniel Nephin 79963be559 Use freeport
To prevent other tests which already use freeport from flaking when port 0 steals their reserved port.
2020-11-18 16:07:34 -05:00
Daniel Nephin 738bf9efdc agent: fix bug with multiple listeners
Previously the listener was being passed to a closure in a loop without
capturing the loop variable. The result is only the last listener is
used, so the http/https servers only listen on one address.

This problem is fixed by capturing the variable by passing it into a
function.
2020-11-18 13:03:29 -05:00
Michael Montgomery 5b6ac035ff Resolves #6074. Adds new option to configure HTTP Server's MaxHeaderBytes with option `-http-max-header-bytes`
Adds tests for behavior
2020-10-29 12:38:19 -05:00
Mathilde Gilles 1c8369b3c3 Fix: service LocallyRegisteredAsSidecar property is not persisted
When a service is deregistered, we check whever matching services were
registered as sidecar along with it and deregister them as well.
To determine if a service is indeed a sidecar we check the
structs.ServiceNode.LocallyRegisteredAsSidecar property. However, to
avoid interal API leakage, it is excluded from JSON serialization,
meaning it is not persisted to disk either.
When the agent is restarted, this property lost and sidecars are no
longer deregistered along with their parent service.
To fix this, we now specifically save this property in the persisted
service file.
2020-10-13 19:38:58 +02:00
R.B. Boyer 9801ef8eb1
agent: enable enable_central_service_config by default (#8746) 2020-10-01 09:19:14 -05:00
R.B. Boyer 7eef25daf5
agent: when enable_central_service_config is enabled ensure agent reload doesn't revert check state to critical (#8747)
Likely introduced when #7345 landed.
2020-09-24 16:24:04 -05:00
Daniel Nephin c18516ad7d
Merge pull request #8680 from hashicorp/dnephin/replace-consul-opts-with-base-deps
agent: Repalce ConsulOptions with a new struct from agent.BaseDeps
2020-09-24 12:45:54 -04:00
Daniel Nephin cdd392d77f agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 0bb9c318b7 http: fix tests incorrectly using HTTPAddr to get the address of the
https server.

In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the
addr used in the test. Due to the way HTTPAddr() was implemented these
tests were passing, but I think the pass was incidental. HTTPAddr() was
not matching any servers, and was instead returning the last server,
which happened to be the one these tests wanted.

This commit fixes the implementation of HTTPAddr to panic if no match
was found. The tests which require an HTTPS server are changed to use
a new firstAddr() to look up the correct address.
2020-09-04 15:29:17 -04:00