* agent: remove agent cache dependency from service mesh leaf certificate management
This extracts the leaf cert management from within the agent cache.
This code was produced by the following process:
1. All tests in agent/cache, agent/cache-types, agent/auto-config,
agent/consul/servercert were run at each stage.
- The tests in agent matching .*Leaf were run at each stage.
- The tests in agent/leafcert were run at each stage after they
existed.
2. The former leaf cert Fetch implementation was extracted into a new
package behind a "fake RPC" endpoint to make it look almost like all
other cache type internals.
3. The old cache type was shimmed to use the fake RPC endpoint and
generally cleaned up.
4. I selectively duplicated all of Get/Notify/NotifyCallback/Prepopulate
from the agent/cache.Cache implementation over into the new package.
This was renamed as leafcert.Manager.
- Code that was irrelevant to the leaf cert type was deleted
(inlining blocking=true, refresh=false)
5. Everything that used the leaf cert cache type (including proxycfg
stuff) was shifted to use the leafcert.Manager instead.
6. agent/cache-types tests were moved and gently replumbed to execute
as-is against a leafcert.Manager.
7. Inspired by some of the locking changes from derek's branch I split
the fat lock into N+1 locks.
8. The waiter chan struct{} was eventually replaced with a
singleflight.Group around cache updates, which was likely the biggest
net structural change.
9. The awkward two layers or logic produced as a byproduct of marrying
the agent cache management code with the leaf cert type code was
slowly coalesced and flattened to remove confusion.
10. The .*Leaf tests from the agent package were copied and made to work
directly against a leafcert.Manager to increase direct coverage.
I have done a best effort attempt to port the previous leaf-cert cache
type's tests over in spirit, as well as to take the e2e-ish tests in the
agent package with Leaf in the test name and copy those into the
agent/leafcert package to get more direct coverage, rather than coverage
tangled up in the agent logic.
There is no net-new test coverage, just coverage that was pushed around
from elsewhere.
Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
* Add Peer field to service-defaults upstream overrides.
* add api changes, compat mode for service default overrides
* Fixes based on testing
---------
Co-authored-by: DanStough <dan.stough@hashicorp.com>
Previously, we'd begin a session with the xDS concurrency limiter
regardless of whether the proxy was registered in the catalog or in
the server's local agent state.
This caused problems for users who run `consul connect envoy` directly
against a server rather than a client agent, as the server's locally
registered proxies wouldn't be included in the limiter's capacity.
Now, the `ConfigSource` is responsible for beginning the session and we
only do so for services in the catalog.
Fixes: https://github.com/hashicorp/consul/issues/15753
In practice this was masked by #14956 and was only uncovered fixing the
other bug.
go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal
would fail when only #14956 was fixed.
Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies).
Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
For initial cluster peering TProxy support we consider all imported services of a partition to be potential upstreams.
We leverage the VirtualIP table because it stores plain service names (e.g. "api", not "api-sidecar-proxy").
This is only configured in xDS when a service with an L7 protocol is
exported.
They also load any relevant trust bundles for the peered services to
eventually use for L7 SPIFFE validation during mTLS termination.
* update gateway-services table with endpoints
* fix failing test
* remove unneeded config in test
* rename "endpoint" to "destination"
* more endpoint renaming to destination in tests
* update isDestination based on service-defaults config entry creation
* use a 3 state kind to be able to set the kind to unknown (when neither a service or a destination exist)
* set unknown state to empty to avoid modifying alot of tests
* fix logic to set the kind correctly on CRUD
* fix failing tests
* add missing tests and fix service delete
* fix failing test
* Apply suggestions from code review
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* fix a bug with kind and add relevant test
* fix compile error
* fix failing tests
* add kind to clone
* fix failing tests
* fix failing tests in catalog endpoint
* fix service dump test
* Apply suggestions from code review
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* remove duplicate tests
* first draft of destinations intention in connect proxy
* remove ServiceDestinationList
* fix failing tests
* fix agent/consul failing tests
* change to filter intentions in the state store instead of adding a field.
* fix failing tests
* fix comment
* fix comments
* store service kind destination and add relevant tests
* changes based on review
* filter on destinations when querying source match
* change state store API to get an IntentionTarget parameter
* add intentions tests
* add destination upstream endpoint
* fix failing test
* fix failing test and a bug with wildcard intentions
* fix failing test
* Apply suggestions from code review
Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
* add missing test and clarify doc
* fix style
* gofmt intention.go
* fix merge introduced issue
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
Mesh gateways will now enable tcp connections with SNI names including peering information so that those connections may be proxied.
Note: this does not change the callers to use these mesh gateways.
There are a handful of changes in this commit:
* When querying trust bundles for a service we need to be able to
specify the namespace of the service.
* The endpoint needs to track the index because the cache watches use
it.
* Extracted bulk of the endpoint's logic to a state store function
so that index tracking could be tested more easily.
* Removed check for service existence, deferring that sort of work to ACL authz
* Added the cache type
For mTLS to work between two proxies in peered clusters with different root CAs,
proxies need to configure their outbound listener to use different root certificates
for validation.
Up until peering was introduced proxies would only ever use one set of root certificates
to validate all mesh traffic, both inbound and outbound. Now an upstream proxy
may have a leaf certificate signed by a CA that's different from the dialing proxy's.
This PR makes changes to proxycfg and xds so that the upstream TLS validation
uses different root certificates depending on which cluster is being dialed.
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
Remove View.Result error return value, it was always nil, and seems like it will likely always remain nill
since it is simply reading a stored value.
Also replace some cache types with local types.
I added this recently without realizing that the method already existed and was named
NamespaceOrEmpty. Replace all calls to GetNamespace with NamespaceOrEmpty or NamespaceOrDefault
as appropriate.
So that all the client side filtering is in the same place. Previously
only the bexpr filter was in the cache-entry.
Also makes a small change to the filtering so that instead of rebuilding
slices of items, the filtering can return a bool to determine if the
event payload is saved or not.