* update go-control-plane envoy dependency to 0.12.0
* add changelog
* go mod tidy
* fix linting issues
* add agent/grpc-internal to the list of SA1019 ignores
Currently, when a client starts a blocking query and an ACL token expires within
that time, Consul will return ACL not found error with a 403 status code. However,
sometimes if an ACL token is invalidated at the same time as the query's deadline is reached,
Consul will instead return an empty response with a 200 status code.
This is because of the events being executed.
1. Client issues a blocking query request with timeout `t`.
2. ACL is deleted.
3. Server detects a change in ACLs and force closes the gRPC stream.
4. Client resubscribes with the same token and resets its state (view).
5. Client sees "ACL not found" error.
If ACL is deleted before step 4, the client is unaware that the stream was closed due to
an ACL error and will return an empty view (from the reset state) with the 200 status code.
To fix this problem, we introduce another state to the subsciption to indicate when a change
to ACLs has occured. If the server sees that there was an error due to ACL change, it will
re-authenticate the request and return an error if the token is no longer valid.
Fixes#20790
* Implement In-Process gRPC for use by controller caching/indexing
This replaces the pipe base listener implementation we were previously using. The new style CAN avoid cloning resources which our controller caching/indexing is taking advantage of to not duplicate resource objects in memory.
To maintain safety for controllers and for them to be able to modify data they get back from the cache and the resource service, the client they are presented in their runtime will be wrapped with an autogenerated client which clones request and response messages as they pass through the client.
Another sizable change in this PR is to consolidate how server specific gRPC services get registered and managed. Before this was in a bunch of different methods and it was difficult to track down how gRPC services were registered. Now its all in one place.
* Fix race in tests
* Ensure the resource service is registered to the multiplexed handler for forwarding from client agents
* Expose peer streaming on the internal handler
* Adding explicit MPL license for sub-package
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
* Adding explicit MPL license for sub-package
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
* Updating the license from MPL to Business Source License
Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at <Blog URL>, FAQ at www.hashicorp.com/licensing-faq, and details of the license at www.hashicorp.com/bsl.
* add missing license headers
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
* Update copyright file headers to BUSL-1.1
---------
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
* bump testcontainers-go from 0.22.0 and remove pinned go version in integ test
* go mod tidy
* Replace deprecated target.Authority with target.URL.Host
TLDR with many modules the versions included in each diverged quite a bit. Attempting to use Go Workspaces produces a bunch of errors.
This commit:
1. Fixes envoy-library-references.sh to work again
2. Ensures we are pulling in go-control-plane@v0.11.0 everywhere (previously it was at that version in some modules and others were much older)
3. Remove one usage of golang/protobuf that caused us to have a direct dependency on it.
4. Remove deprecated usage of the Endpoint field in the grpc resolver.Target struct. The current version of grpc (v1.55.0) has removed that field and recommended replacement with URL.Opaque and calls to the Endpoint() func when needing to consume the previous field.
4. `go work init <all the paths to go.mod files>` && `go work sync`. This syncrhonized versions of dependencies from the main workspace/root module to all submodules
5. Updated .gitignore to ignore the go.work and go.work.sum files. This seems to be standard practice at the moment.
6. Update doc comments in protoc-gen-consul-rate-limit to be go fmt compatible
7. Upgraded makefile infra to perform linting, testing and go mod tidy on all modules in a flexible manner.
8. Updated linter rules to prevent usage of golang/protobuf
9. Updated a leader peering test to account for an extra colon in a grpc error message.
The grpc resolver implementation is fed from changes to the
router.Router. Within the router there is a map of various areas storing
the addressing information for servers in those areas. All map entries
are of the WAN variety except a single special entry for the LAN.
Addressing information in the LAN "area" are local addresses intended
for use when making a client-to-server or server-to-server request.
The client agent correctly updates this LAN area when receiving lan serf
events, so by extension the grpc resolver works fine in that scenario.
The server agent only initially populates a single entry in the LAN area
(for itself) on startup, and then never mutates that area map again.
For normal RPCs a different structure is used for LAN routing.
Additionally when selecting a server to contact in the local datacenter
it will randomly select addresses from either the LAN or WAN addressed
entries in the map.
Unfortunately this means that the grpc resolver stack as it exists on
server agents is either broken or only accidentally functions by having
servers dial each other over the WAN-accessible address. If the operator
disables the serf wan port completely likely this incidental functioning
would break.
This PR enforces that local requests for servers (both for stale reads
or leader forwarded requests) exclusively use the LAN "area" information
and also fixes it so that servers keep that area up to date in the
router.
A test for the grpc resolver logic was added, as well as a higher level
full-stack test to ensure the externally perceived bug does not return.
Registering gRPC balancers is thread-unsafe because they are stored in a
global map variable that is accessed without holding a lock. Therefore,
it's expected that balancers are registered _once_ at the beginning of
your program (e.g. in a package `init` function) and certainly not after
you've started dialing connections, etc.
> NOTE: this function must only be called during initialization time
> (i.e. in an init() function), and is not thread-safe.
While this is fine for us in production, it's challenging for tests that
spin up multiple agents in-memory. We currently register a balancer per-
agent which holds agent-specific state that cannot safely be shared.
This commit introduces our own registry that _is_ thread-safe, and
implements the Builder interface such that we can call gRPC's `Register`
method once, on start-up. It uses the same pattern as our resolver
registry where we use the dial target's host (aka "authority"), which is
unique per-agent, to determine which builder to use.
Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
* remove legacy tokens
* Update test comment
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* fix imports
* update docs for additional CLI changes
* add test case for anonymous token
* set deprecated api fields to json ignore and fix patch errors
* update changelog to breaking-change
* fix import
* update api docs to remove legacy reference
* fix docs nav data
---------
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* Protobuf Modernization
Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf
Marshallers (protobuf and json) needed some changes to account for different APIs.
Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package.
This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces.
* Fix go-mod-tidy make target to work on all modules
This is the OSS portion of enterprise PR 3822.
Adds a custom gRPC balancer that replicates the router's server cycling
behavior. Also enables automatic retries for RESOURCE_EXHAUSTED errors,
which we now get for free.
Adds automation for generating the map of `gRPC Method Name → Rate Limit Type`
used by the middleware introduced in #15550, and will ensure we don't forget
to add new endpoints.
Engineers must annotate their RPCs in the proto file like so:
```
rpc Foo(FooRequest) returns (FooResponse) {
option (consul.internal.ratelimit.spec) = {
operation_type: READ,
};
}
```
When they run `make proto` a protoc plugin `protoc-gen-consul-rate-limit` will
be installed that writes rate-limit specs as a JSON array to a file called
`.ratelimit.tmp` (one per protobuf package/directory).
After running Buf, `make proto` will execute a post-process script that will
ingest all of the `.ratelimit.tmp` files and generate a Go file containing the
mappings in the `agent/grpc-middleware` package. In the enterprise repository,
it will write an additional file with the enterprise-only endpoints.
If an engineer forgets to add the annotation to a new RPC, the plugin will
return an error like so:
```
RPC Foo is missing rate-limit specification, fix it with:
import "proto-public/annotations/ratelimit/ratelimit.proto";
service Bar {
rpc Foo(...) returns (...) {
option (hashicorp.consul.internal.ratelimit.spec) = {
operation_type: OPERATION_READ | OPERATION_WRITE | OPERATION_EXEMPT,
};
}
}
```
In the future, this annotation can be extended to support rate-limit
category (e.g. KV vs Catalog) and to determine the retry policy.
* Rate limiting handler - ensure configuration has changed before modifying limiters
* Updating test to validate arguments to UpdateConfig
* Removing duplicate test. Updating mock.
* Renaming NullRateLimiter to NullRequestLimitsHandler
* Rate Limit Handler - ensure rate limiting is not in the code path when not configured
* Update agent/consul/rate/handler.go
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* formatting handler.go
* Rate limiting handler - ensure configuration has changed before modifying limiters
* Updating test to validate arguments to UpdateConfig
* Removing duplicate test. Updating mock.
* adding logging for when UpdateConfig is called but the config has not changed.
* Update agent/consul/rate/handler.go
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
* Update agent/consul/rate/handler_test.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* modifying existing variable name based on pr feedback
* updating a broken merge conflict;
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
The new balancer is a patched version of gRPC's default pick_first balancer
which removes the behavior of preserving the active subconnection if
a list of new addresses contains the currently active address.
* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Add handler errors
* Set the global read and write limits
* fixing multilimiter moving packages
* Fix typo
* Simplify globalLimit usage
* add multilimiter and tests
* exporting LimitedEntity
* Apply suggestions from code review
Co-authored-by: John Murret <john.murret@hashicorp.com>
* add config update and rename config params
* add doc string and split config
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* use timer to avoid go routine leak and change the interface
* add comments to tests
* fix failing test
* add prefix with config edge, refactor tests
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* refactor to apply configs for limiters under a prefix
* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic
* make KeyType an exported type
* split the config and limiter trees to fix race conditions in config update
* rename variables
* fix race in test and remove dead code
* fix reconcile loop to not create a timer on each loop
* add extra benchmark tests and fix tests
* fix benchmark test to pass value to func
* server: add placeholder glue for rate limit handler
This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.
This allows us to start working on the net/rpc and gRPC interceptors and
config logic.
* Set the global read and write limits
* fixing multilimiter moving packages
* add server configuration for global rate limiting.
* remove agent test
* remove added stuff from handler
* remove added stuff from multilimiter
* removing unnecessary TODOs
* Removing TODO comment from handler
* adding in defaulting to infinite
* add disabled status in there
* adding in documentation for disabled mode.
* make disabled the default.
* Add mock and agent test
* addig documentation and missing mock file.
* Fixing test TestLoad_IntegrationWithFlags
* updating docs based on PR feedback.
* Updating Request Limits mode to use int based on PR feedback.
* Adding RequestLimits struct so we have a nested struct in ReloadableConfig.
* fixing linting references
* Update agent/consul/rate/handler.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* Update agent/consul/config.go
Co-authored-by: Dan Upton <daniel@floppy.co>
* removing the ignore of the request limits in JSON. addingbuilder logic to convert any read rate or write rate less than 0 to rate.Inf
* added conversion function to convert request limits object to handler config.
* Updating docs to reflect gRPC and RPC are rate limit and as a result, HTTP requests are as well.
* Updating values for TestLoad_FullConfig() so that they were different and discernable.
* Updating TestRuntimeConfig_Sanitize
* Fixing TestLoad_IntegrationWithFlags test
* putting nil check in place
* fixing rebase
* removing change for missing error checks. will put in another PR
* Rebasing after default multilimiter config change
* resolving rebase issues
* updating reference for incomingRPCLimiter to use interface
* updating interface
* Updating interfaces
* Fixing mock reference
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Implements the gRPC middleware for rate-limiting as a tap.ServerInHandle
function (executed before the request is unmarshaled).
Mappings between gRPC methods and their operation type are generated by
a protoc plugin introduced by #15564.
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
Previously, public referred to gRPC services that are both exposed on
the dedicated gRPC port and have their definitions in the proto-public
directory (so were considered usable by 3rd parties). Whereas private
referred to services on the multiplexed server port that are only usable
by agents and other servers.
Now, we're splitting these definitions, such that external/internal
refers to the port and public/private refers to whether they can be used
by 3rd parties.
This is necessary because the peering replication API needs to be
exposed on the dedicated port, but is not (yet) suitable for use by 3rd
parties.