* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Add handler errors
* Set the global read and write limits
* fixing multilimiter moving packages
* Fix typo
* Simplify globalLimit usage
* add multilimiter and tests
* exporting LimitedEntity
* Apply suggestions from code review
Co-authored-by: John Murret <john.murret@hashicorp.com>
* add config update and rename config params
* add doc string and split config
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* use timer to avoid go routine leak and change the interface
* add comments to tests
* fix failing test
* add prefix with config edge, refactor tests
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* refactor to apply configs for limiters under a prefix
* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic
* make KeyType an exported type
* split the config and limiter trees to fix race conditions in config update
* rename variables
* fix race in test and remove dead code
* fix reconcile loop to not create a timer on each loop
* add extra benchmark tests and fix tests
* fix benchmark test to pass value to func
* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Set the global read and write limits
* fixing multilimiter moving packages
* add server configuration for global rate limiting.
* remove agent test
* remove added stuff from handler
* remove added stuff from multilimiter
* removing unnecessary TODOs
* Removing TODO comment from handler
* adding in defaulting to infinite
* add disabled status in there
* adding in documentation for disabled mode.
* make disabled the default.
* Add mock and agent test
* addig documentation and missing mock file.
* Fixing test TestLoad_IntegrationWithFlags
* updating docs based on PR feedback.
* Updating Request Limits mode to use int based on PR feedback.
* Adding RequestLimits struct so we have a nested struct in ReloadableConfig.
* fixing linting references
* Update agent/consul/rate/handler.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* Update agent/consul/config.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* removing the ignore of the request limits in JSON. addingbuilder logic to convert any read rate or write rate less than 0 to rate.Inf
* added conversion function to convert request limits object to handler config.
* Updating docs to reflect gRPC and RPC are rate limit and as a result, HTTP requests are as well.
* Updating values for TestLoad_FullConfig() so that they were different and discernable.
* Updating TestRuntimeConfig_Sanitize
* Fixing TestLoad_IntegrationWithFlags test
* putting nil check in place
* fixing rebase
* removing change for missing error checks. will put in another PR
* Rebasing after default multilimiter config change
* resolving rebase issues
* updating reference for incomingRPCLimiter to use interface
* updating interface
* Updating interfaces
* Fixing mock reference
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* integ-test: test consul upgrade from the snapshot of a running cluster
* use Target version as default
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* auto-config: relax node name validation for JWT authorization
This changes the JWT authorization logic to allow all non-whitespace,
non-quote characters when validating node names. Consul had previously
allowed these characters in node names, until this validation was added
to fix a security vulnerability with whitespace/quotes being passed to
the `bexpr` library. This unintentionally broke node names with
characters like `.` which aren't related to this vulnerability.
* Update website/content/docs/agent/config/cli-flags.mdx
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Prevent serving TLS via ports.grpc
We remove the ability to run the ports.grpc in TLS mode to avoid
confusion and to simplify configuration. This breaking change
ensures that any user currently using ports.grpc in an encrypted
mode will receive an error message indicating that ports.grpc_tls
must be explicitly used.
The suggested action for these users is to simply swap their ports.grpc
to ports.grpc_tls in the configuration file. If both ports are defined,
or if the user has not configured TLS for grpc, then the error message
will not be printed.
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
This continues the work done in #14908 where a crude solution to prevent a
goroutine leak was implemented. The former code would launch a perpetual
goroutine family every iteration (+1 +1) and the fixed code simply caused a
new goroutine family to first cancel the prior one to prevent the
leak (-1 +1 == 0).
This PR refactors this code completely to:
- make it more understandable
- remove the recursion-via-goroutine strangeness
- prevent unnecessary RPC fetches when the prior one has errored.
The core issue arose from a conflation of the entry.Fetching field to mean:
- there is an RPC (blocking query) in flight right now
- there is a goroutine running to manage the RPC fetch retry loop
The problem is that the goroutine-leak-avoidance check would treat
Fetching like (2), but within the body of a goroutine it would flip that
boolean back to false before the retry sleep. This would cause a new
chain of goroutines to launch which #14908 would correct crudely.
The refactored code uses a plain for-loop and changes the semantics
to track state for "is there a goroutine associated with this cache entry"
instead of the former.
We use a uint64 unique identity per goroutine instead of a boolean so
that any orphaned goroutines can tell when they've been replaced when
the expiry loop deletes a cache entry while the goroutine is still running
and is later replaced.
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.
Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.
This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
Preivously the TLS configurator would default to presenting auto TLS
certificates as client certificates.
Server agents should not have this behavior and should instead present
the manually configured certs. The autoTLS certs for servers are
exclusively used for peering and should not be used as the default for
outbound communication.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.
The resulting behavior from this commit is:
`ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
`ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
`ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
- Introduce a new telemetry configurable parameter retry_failed_connection. User can set the value to true to let consul agent continue its start process on failed connection to datadog server. When set to false, agent will stop on failed start. The default behavior is true.
Co-authored-by: Dan Upton <daniel@floppy.co>
Co-authored-by: Evan Culver <eculver@users.noreply.github.com>
Changes to how the version string was handled created small regression with the release of consul 1.12.0 enterprise.
Many tools use the Config:Version field reported by the agent/self resource to determine whether Consul is an enterprise or OSS instance, expect something like 1.12.0+ent for enterprise and simply 1.12.0 for OSS. This was accidentally broken during the runup to 1.12.x
This work fixes the value returned by both the self endpoint in ["Config"]["Version"] and the metrics consul.version field.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* Support vault namespaces in connect CA
Follow on to some missed items from #12655
From an internal ticket "Support standard "Vault namespace in the
path" semantics for Connect Vault CA Provider"
Vault allows the namespace to be specified as a prefix in the path of
a PKI definition, but our usage of the Vault API includes calls that
don't support a namespaced key. In particular the sys.* family of
calls simply appends the key, instead of prefixing the namespace in
front of the path.
Unfortunately it is difficult to reliably parse a path with a
namespace; only vault knows what namespaces are present, and the '/'
separator can be inside a key name, as well as separating path
elements. This is in use in the wild; for example
'dc1/intermediate-key' is a relatively common naming schema.
Instead we add two new fields: RootPKINamespace and
IntermediatePKINamespace, which are the absolute namespace paths
'prefixed' in front of the respective PKI Paths.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* add config watcher to the config package
* add logging to watcher
* add test and refactor to add WatcherEvent.
* add all API calls and fix a bug with recreated files
* add tests for watcher
* remove the unnecessary use of context
* Add debug log and a test for file rename
* use inode to detect if the file is recreated/replaced and only listen to create events.
* tidy ups (#1535)
* tidy ups
* Add tests for inode reconcile
* fix linux vs windows syscall
* fix linux vs windows syscall
* fix windows compile error
* increase timeout
* use ctime ID
* remove remove/creation test as it's a use case that fail in linux
* fix linux/windows to use Ino/CreationTime
* fix the watcher to only overwrite current file id
* fix linter error
* fix remove/create test
* set reconcile loop to 200 Milliseconds
* fix watcher to not trigger event on remove, add more tests
* on a remove event try to add the file back to the watcher and trigger the handler if success
* fix race condition
* fix flaky test
* fix race conditions
* set level to info
* fix when file is removed and get an event for it after
* fix to trigger handler when we get a remove but re-add fail
* fix error message
* add tests for directory watch and fixes
* detect if a file is a symlink and return an error on Add
* rename Watcher to FileWatcher and remove symlink deref
* add fsnotify@v1.5.1
* fix go mod
* do not reset timer on errors, rename OS specific files
* rename New func
* events trigger on write and rename
* add missing test
* fix flaking tests
* fix flaky test
* check reconcile when removed
* delete invalid file
* fix test to create files with different mod time.
* back date file instead of sleeping
* add watching file in agent command.
* fix watcher call to use new API
* add configuration and stop watcher when server stop
* add certs as watched files
* move FileWatcher to the agent start instead of the command code
* stop watcher before replacing it
* save watched files in agent
* add add and remove interfaces to the file watcher
* fix remove to not return an error
* use `Add` and `Remove` to update certs files
* fix tests
* close events channel on the file watcher even when the context is done
* extract `NotAutoReloadableRuntimeConfig` is a separate struct
* fix linter errors
* add Ca configs and outgoing verify to the not auto reloadable config
* add some logs and fix to use background context
* add tests to auto-config reload
* remove stale test
* add tests to changes to config files
* add check to see if old cert files still trigger updates
* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`
* fix to re add both key and cert file. Add test to cover this case.
* review suggestion
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add check to static runtime config changes
* fix test
* add changelog file
* fix review comments
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* update flag description
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
* fix compilation error
* add static runtime config support
* fix test
* fix review comments
* fix log test
* Update .changelog/12329.txt
Co-authored-by: Dan Upton <daniel@floppy.co>
* transfer tests to runtime_test.go
* fix filewatcher Replace to not deadlock.
* avoid having lingering locks
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* split ReloadConfig func
* fix warning message
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* convert `FileWatcher` into an interface
* fix compilation errors
* fix tests
* extract func for adding and removing files
* add a coalesceTimer with a very small timer
* extract coaelsce Timer and add a shim for testing
* add tests to coalesceTimer fix to send remaining events
* set `coalesceTimer` to 1 Second
* support symlink, fix a nil deref.
* fix compile error
* fix compile error
* refactor file watcher rate limiting to be a Watcher implementation
* fix linter issue
* fix runtime config
* fix runtime test
* fix flaky tests
* fix compile error
* Apply suggestions from code review
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix agent New to return an error if File watcher New return an error
* quit timer loop if ctx is canceled
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>