These functions are moved to the one place they are called to improve code locality.
They are being moved out of agent/consul/acl.go in preparation for moving
ACLResolver to an acl package.
These functions are used in only one place. Move the functions next to their one caller
to improve code locality.
This change is being made in preparation for moving the ACLResolver into an
acl package. The moved functions were previously in the same file as the ACLResolver.
By moving them out of that file we may be able to move the entire file
with fewer modifications.
* defer setting the state before returning to avoid being stuck in `INITIALIZING` state
* add changelog
* move comment with the right if statement
* ca: report state transition error from setSTate
* update comment to reflect state transition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Follow up to: https://github.com/hashicorp/consul/pull/10738#discussion_r680190210
Previously we were passing an Authorizer that would always allow the
operation, then later checking the authorization using vetServiceTxnOp.
On the surface this seemed strange, but I think it was actually masking
a bug as well. Over time `servicePreApply` was changed to add additional
authorization for `service.Proxy.DestinationServiceName`, but because
we were passing a nil Authorizer, that authorization was not handled on
the txn_endpoint.
`TxnServiceOp.FillAuthzContext` has some special handling in enterprise,
so we need to make sure to continue to use that from the Txn endpoint.
This commit removes the `vetServiceTxnOp` function, and passes in the
`FillAuthzContext` function so that `servicePreApply` can be used by
both the catalog and txn endpoints. This should be much less error prone
and prevent bugs like this in the future.
Follow up to https://github.com/hashicorp/consul/pull/10737#discussion_r680134445
Move the check for the Intention.DestinationName into the Authorizer to remove the
need to check what kind of Authorizer is being used.
It sounds like this check is only for legacy ACLs, so is probably just a safeguard
.
1. do not emit the metric if Query fails
2. properly check for PrimaryUsersIntermediate, the logic was inverted
Also improve the logging by including the metric name in the log message
* fix state index for `CAOpSetRootsAndConfig` op
* add changelog
* Update changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* remove the change log as it's not needed
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
These checks were a bit more involved. They were previously skipping some code paths
when the authorizer was nil. After looking through these it seems correct to remove the
authz == nil check, since it will never evaluate to true.
These case are already impossible conditions, because most of these functions already start
with a check for ACLs being disabled. So the code path being removed could never be reached.
The one other case (ConnectAuthorized) was already changed in a previous commit. This commit
removes an impossible branch because authz == nil can never be true.
These methods are no longer used. Remove the methods, and update the
tests to use actual method used by production code.
Also removes the 'authz == nil' check is no longer a possible code path
now that we are returning a non-nil acl.Authorizer when ACLs are disabled.
The blocking query backend sets the default value on the server side.
The streaming backend does not using blocking queries, so we must set the timeout on
the client.
Now that we have at least one endpoint that uses context for cancellation we can
encounter this scenario where the returned error is a context.Cancelled or
context.DeadlineExceeded.
If the request.Context().Err() is not nil, then we know the request itself was cancelled, so
we can log a different message at Info level, instad of the error.
Knowing that blocking queries are firing does not provide much
information on its own. If we know the correlation IDs we can
piece together which parts of the snapshot have been populated.
Some of these responses might be empty from the blocking
query timing out. But if they're returning quickly I think we
can reasonably assume they contain data.
* return an error when the index is not valid
* check response as bool when applying `CAOpSetConfig`
* remove check for bool response
* fix error message and add check to test
* fix comment
* add changelog
If multiple instances of a service are co-located on the same node then
their proxies will all share a cache entry for their resolved service
configuration. This is because the cache key contains the name of the
watched service but does not take into account the ID of the watching
proxies.
This means that there will be multiple agent service manager watches
that can wake up on the same cache update. These watchers then
concurrently modify the value in the cache when merging the resolved
config into the local proxy definitions.
To avoid this concurrent map write we will only delete the key from
opaque config in the local proxy definition after the merge, rather
than from the cached value before the merge.
This change adds a new `dns_config.recursor_strategy` option which
controls how Consul queries DNS resolvers listed in the `recursors`
config option. The supported options are `sequential` (default), and
`random`.
Closes#8807
Co-authored-by: Blake Covarrubias <blake@covarrubi.as>
Co-authored-by: Priyanka Sengupta <psengupta@flatiron.com>
A previous commit used SetHash on two of the cases to fix a data race. This commit applies
that change to all cases. Using SetHash in this test helper should ensure that the
test helper behaves closer to production.
These changes ensure that the identity of services dialed is
cryptographically verified.
For all upstreams we validate against SPIFFE IDs in the format used by
Consul's service mesh:
spiffe://<trust-domain>/ns/<namespace>/dc/<datacenter>/svc/<service>
The dnsConfig pulled from the atomic.Value is a pointer, so modifying it in place
creates a data race. Use the exported ReloadConfig interface instead.
The LogOutput io.Writer used by TestAgent must allow concurrent reads and writes, and a
bytes.Buffer does not allow this. The bytes.Buffer must be wrapped with a lock to make this safe.
Some global variables are patched to shorter values in these tests. But the goroutines that read
them can outlive the test because nothing waited for them to exit.
This commit adds a Wait() method to the routine manager, so that tests can wait for the goroutines
to exit. This prevents the data race because the 'reset to original value' can happen
after all other goroutines have stopped.
A previous commit started using QueryRefuced, but that is not correct. QueryRefuced refers to
the OpCode, not the query type.
Instead use errNoAnswer because we have no records for that query type.
Now that trimDNSResponse is handled by the caller we don't need to pass this value
around. We can remove it from both the serviceLookup struct, and two functions.
The dispatch function was called from a single place and did nothing but add a default value.
Removing it makes code easier to trace by removing an unnecessary hop.
The compatv2 integration tests were failing because they use an older CLI version with a newer
HTTP API. This commit restores the GRPCPort field to the DebugConfig output to allow older
CIs to continue to fetch the port.
The DebugConfig in the self endpoint can change at any time. It's not a stable API.
With the previous change to rename GRPCPort to XDSPort this command would have broken.
This commit adds the XDSPort to a stable part of the XDS api, and changes the envoy command to read
this new field.
It includes support for the old API as well, in case a newer CLI is used with an older API, and
adds a test for both cases.
* ca: move provider creation into CAManager
This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* ca: move SignCertificate to the file where it is used
* auto-config: move autoConfigBackend impl off of Server
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
* fix linter issues
* check error when `raftApplyMsgpack`
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* check expiry date of the intermediate before using it to sign a leaf
* fix typo in comment
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
* Fix test name
* do not check cert start date
* wrap error to mention it is the intermediate expired
* Fix failing test
* update comment
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use shim to avoid sleep in test
* add root cert validation
* remove duplicate code
* Revert "fix linter issues"
This reverts commit 6356302b54f06c8f2dee8e59740409d49e84ef24.
* fix import issue
* gofmt leader_connect_ca
* add changelog entry
* update error message
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* fix error message in test
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.