This commit fixes an issue where the partition was not properly set
on the peering query failover target created from sameness-groups.
Before this change, it was always empty, meaning that the data
would be queried with respect to the default partition always. This
resulted in a situation where a PQ that was attempting to use a
sameness-group for failover would select peers from the default
partition, rather than the partition of the sameness-group itself.
* cli: Deprecate the `-admin-access-log-path` flag from `consul connect envoy` command in favor of: `-admin-access-log-config`.
* fix changelog
* add in documentation change.
* updating usage of http2_protocol_options and access_log_path
* add changelog
* update template for AdminAccessLogConfig
* remove mucking with AdminAccessLogConfig
* add a hash to config entries when normalizing
* add GetHash and implement comparing hashes
* only update if the Hash is different
* only update if the Hash is different and not 0
* fix proto to include the Hash
* fix proto gen
* buf format
* add SetHash and fix tests
* fix config load tests
* fix state test and config test
* recalculate hash when restoring config entries
* fix snapshot restore test
* add changelog
* fix missing normalize, fix proto indexes and add normalize test
When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).
This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.
This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster
Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.
The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.
An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects. Visit the following thread for some
extra context on why certainty around these race conditions is difficult:
https://github.com/envoyproxy/envoy/issues/13009
This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.
Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
* Add CE version of gateway-upstream-disambiguation
* Use NamespaceOrDefault and PartitionOrDefault
* Add Changelog entry
* Remove the unneeded reassignment
* Use c.ID()
* Adding cli command to list exported services to a peer
* Changelog added
* Addressing docs comments
* Adding test case for no exported services scenario
The client.rpc metric now excludes internal retries for consistency
with client.rpc.exceeded and client.rpc.failed. All of these metrics
now increment at most once per RPC method call, allowing for
accurate calculation of failure / rate limit application occurrence.
Additionally, if an RPC fails because no servers are present,
client.rpc.failed is now incremented.
* Upgrade hcp-sdk-go to latest version v0.73
Changes:
- go get github.com/hashicorp/hcp-sdk-go
- go mod tidy
* From upgrade: regenerate protobufs for upgrade from 1.30 to 1.31
Ran: `make proto`
Slack: https://hashicorp.slack.com/archives/C0253EQ5B40/p1701105418579429
* From upgrade: fix mock interface implementation
After upgrading, there is the following compile error:
cannot use &mockHCPCfg{} (value of type *mockHCPCfg) as "github.com/hashicorp/hcp-sdk-go/config".HCPConfig value in return statement: *mockHCPCfg does not implement "github.com/hashicorp/hcp-sdk-go/config".HCPConfig (missing method Logout)
Solution: update the mock to have the missing Logout method
* From upgrade: Lint: remove usage of deprecated req.ServerState.TLS
Due to upgrade, linting is erroring due to usage of a newly deprecated field
22:47:56 [consul]: make lint
--> Running golangci-lint (.)
agent/hcp/testing.go:157:24: SA1019: req.ServerState.TLS is deprecated: use server_tls.internal_rpc instead. (staticcheck)
time.Until(time.Time(req.ServerState.TLS.CertExpiry)).Hours()/24,
^
* From upgrade: adjust oidc error message
From the upgrade, this test started failing:
=== FAIL: internal/go-sso/oidcauth TestOIDC_ClaimsFromAuthCode/failed_code_exchange (re-run 2) (0.01s)
oidc_test.go:393: unexpected error: Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"
Prior to the upgrade, the error returned was:
```
Provider login failed: Error exchanging oidc code: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"invalid_grant\",\"error_description\":\"unexpected auth code\"}\n
```
Now the error returned is as below and does not contain "cannot fetch token"
```
Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"
```
* Update AgentPushServerState structs with new fields
HCP-side changes for the new fields are in:
https://github.com/hashicorp/cloud-global-network-manager-service/pull/1195/files
* Minor refactor for hcpServerStatus to abstract tlsInfo into struct
This will make it easier to set the same tls-info information to both
- status.TLS (deprecated field)
- status.ServerTLSMetadata (new field to use instead)
* Update hcpServerStatus to parse out information for new fields
Changes:
- Improve error message and handling (encountered some issues and was confused)
- Set new field TLSInfo.CertIssuer
- Collect certificate authority metadata and set on TLSInfo.CertificateAuthorities
- Set TLSInfo on both server.TLS and server.ServerTLSMetadata.InternalRPC
* Update serverStatusToHCP to convert new fields to GNM rpc
* Add changelog
* Feedback: connect.ParseCert, caCerts
* Feedback: refactor and unit test server status
* Feedback: test to use expected struct
* Feedback: certificate with intermediate
* Feedback: catch no leaf, remove expectedErr
* Feedback: update todos with jira ticket
* Feedback: mock tlsConfigurator
security: Bump github.com/golang-jwt/jwt/v4 to 4.5.0
This version is accepted by Prisma/Twistlock, resolving scan results for
issue PRISMA-2022-0270. Chosen over later versions to avoid a major
version with breaking changes that is otherwise unnecessary.
Note that in practice this is a false positive (see
https://github.com/golang-jwt/jwt/issues/258), but we should update the
version to aid customers relying on scanners that flag it.
* Set default of 1m for StatsFlushInterval when the collector is setup
* Add documentation on the stats_flush_interval value
* Do not default in two conditions 1) preconfigured sinks exist 2) preconfigured flush interval exists
* Fix wording of docs
* Add changelog
* Fix docs
* Initial work for sidenav
* Use HDS::Text
* Add resolution for ember-element-helper
* WIP dc selector
* Update HCP Home link
* DC selector
* Hook up remaining selectors
* Fix settings and tutorial links
* Remove comments
* Remove skip-links
* Replace auth with new dropdown
* Use href-to helper for sidenav links
* Changelog
* Add description to NavSelector
* Wrap version in footer and role
* Fix login tests
* Add data-test selectors for namespaces
* Fix datacenter disclosure menu test
* Stop rendering auth dialog if acls are disabled
* Update disabled selector state and token selector
* Fix logic in ACL selector
* Fix HCP Home integration test
* Remove toggling the sidenav in tests
* Add sidenav to eng docs
* Re-add debug navigation for eng docs
* Remove ember-in-viewport
* Remove unused styles
* Upgrade @hashicorp/design-system-componentseee
* Add translations for side-nav
* Only show back to hcp link if url is present
* Disable responsive due to a11y-dialog issue
Fix issue with wanfed lan ip conflicts.
Prior to this commit, the connection pools were unaware which datacenter the
connection was associated with. This meant that any time servers with
overlapping LAN IP addresses and node shortnames existed, they would be
incorrectly co-located in the same pool. Whenever this occurred, the servers
would get stuck in an infinite loop of forwarding RPCs to themselves (rather
than the intended remote DC) until they eventually run out of memory.
Most notably, this issue can occur whenever wan federation through mesh
gateways is enabled.
This fix adds extra metadata to specify which DC the connection is associated
with in the pool.
Prior to the introduction of this configuration, grpc keepalive messages were
sent after 2 hours of inactivity on the stream. This posed issues in various
scenarios where the server-side xds connection balancing was unaware that envoy
instances were uncleanly killed / force-closed, since the connections would
only be cleaned up after ~5 minutes of TCP timeouts occurred. Setting this
config to a 30 second interval with a 20 second timeout ensures that at most,
it should take up to 50 seconds for a dead xds connection to be closed.
This PR fixes an issue where upstreams did not correctly inherit the proper
namespace / partition from the parent service when attempting to fetch the
upstream protocol due to inconsistent normalization.
Some of the merge-service-configuration logic would normalize to default, while
some of the proxycfg logic would normalize to match the parent service. Due to
this mismatch in logic, an incorrect service-defaults configuration entry would
be fetched and have its protocol applied to the upstream.
* Add InboundPeerTrustBundle maps to Terminating Gateway
* Add notify and cancelation of watch for inbound peer trust bundles
* Pass peer trust bundles to the RBAC creation function
* Regenerate Golden Files
* add changelog, also adds another spot that needed peeredTrustBundles
* Add basic test for terminating gateway with peer trust bundle
* Add intention to cluster peered golden test
* rerun codegen
* update changelog
* really update the changelog
---------
Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
Ongoing work to support Nomad Workload Identity for authenticating with Consul
will mean that Nomad's service registration sync with Consul will want to use
Consul tokens scoped to individual workloads for registering services and
checks. The `CheckRegister` method in the API doesn't have an option to pass the
token in, which prevent us from sharing the same Consul connection for all
workloads. Add a `CheckRegisterOpts` to match the behavior of
`ServiceRegisterOpts`.
Ongoing work to support Nomad Workload Identity for authenticating with Consul
will mean that Nomad's service registration sync with Consul will want to use
Consul tokens scoped to individual workloads for registering services and
checks. The `ServiceRegisterOpts` type in the API doesn't have an option to pass
the token in, which prevent us from sharing the same Consul connection for all
workloads. Add a `Token` field to match the behavior of `ServiceDeregisterOpts`.
* dns token
fix whitespace for docs and comments
fix test cases
fix test cases
remove tabs in help text
Add changelog
Peering dns test
Peering dns test
Partial implementation of Peered DNS test
Swap to new topology lib
expose dns port for integration tests on client
remove partial test implementation
remove extra port exposure
remove changelog from the ent pr
Add dns token to set-agent-token switch
Add enterprise golden file
Use builtin/dns template in tests
Update ent dns policy
Update ent dns template test
remove local gen certs
fix templated policy specs
* add changelog
* go mod tidy
Configure Envoy to use the same HTTP protocol version used by the
downstream caller when forwarding requests to a local application that
is configured with the protocol set to either `http2` or `grpc`.
This allows upstream applications that support both HTTP/1.1 and
HTTP/2 on a single port to receive requests using either protocol. This
is beneficial when the application primarily communicates using HTTP/2,
but also needs to support HTTP/1.1, such as to respond to Kubernetes
HTTP readiness/liveness probes.
Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
* debug since
* fix docs
* chagelog added
* fix go mod
* debug test fix
* fix test
* tabs test fix
* Update .changelog/18797.txt
Co-authored-by: Ganesh S <ganesh.seetharaman@hashicorp.com>
---------
Co-authored-by: Ganesh S <ganesh.seetharaman@hashicorp.com>
* Add response header filters to http-route config entry definitions
* Map response header filters from config entry when constructing route destination
* Support response header modifiers at the service level as well
* Update protobuf definitions
* Update existing unit tests
* Add response filters to route consolidation logic
* Make existing unit tests more robust
* Add missing docstring
* Add changelog entry
* Add response filter modifiers to existing integration test
* Add more robust testing for response header modifiers in the discovery chain
* Add more robust testing for request header modifiers in the discovery chain
* Modify test to verify that service filter modifiers take precedence over rule filter modifiers
* [NET-5325] ACL templated policies support in tokens and roles
- Add API support for creating tokens/roles with templated-policies
- Add CLI support for creating tokens/roles with templated-policies
* adding changelog
This PR enables the GetEnvoyBootstrapParams endpoint to construct envoy bootstrap parameters from v2 catalog and mesh resources.
* Make bootstrap request and response parameters less specific to services so that we can re-use them for workloads or service instances.
* Remove ServiceKind from bootstrap params response. This value was unused previously and is not needed for V2.
* Make access logs generation generic so that we can generate them using v1 or v2 resources.
Add support for querying tokens by service name
The consul-k8s endpoints controller has a workflow where it fetches all tokens.
This is not performant for large clusters, where there may be a sizable number
of tokens. This commit attempts to alleviate that problem and introduces a new
way to query by the token's service name.
Fix issue where agentless endpoints would fail to populate after snapshot restore.
Fixes an issue that was introduced in #17775. This issue happens because
a long-lived pointer to the state store is held, which is unsafe to do.
Snapshot restorations will swap out this state store, meaning that the
proxycfg watches would break for agentless.
* init
* tests added and few fixes
* revert arg message
* changelog added
* removed var declaration
* fix CI
* fix test
* added node name and status
* updated save.mdx
* added example
* fix tense
* fix description
* fix for #18406 , non presence of consul-version meta
* removed redundant checks
* updated mock-api to mimic api response for synthetic nodes
* added test to test getDistinctConsulVersions method with synthetic-node case
* updated typo in comments
* added change log
* Added oss config entries for Policy and JWT on APIGW
* Updated structs for config entry
* Updated comments, ran deep-copy
* Move JWT configuration into OSS file
* Add in the config entry OSS file for jwts
* Added changelog
* fixing proto spacing
* Moved to using manually written deep copy method
* Use pointers for override/default fields in apigw config entries
* Run gen scripts for changed types
* Add logging to locality policy application
In OSS, this is currently a no-op.
* Inherit locality when registering sidecars
When sidecar locality is not explicitly configured, inherit locality
from the proxied service.
* OTElExporter now uses an EndpointProvider to discover the endpoint
* OTELSink uses a ConfigProvider to obtain filters and labels configuration
* improve tests for otel_sink
* Regex logic is moved into client for a method on the TelemetryConfig object
* Create a telemetry_config_provider and update deps to use it
* Fix conversion
* fix import newline
* Add logger to hcp client and move telemetry_config out of the client.go file
* Add a telemetry_config.go to refactor client.go
* Update deps
* update hcp deps test
* Modify telemetry_config_providers
* Check for nil filters
* PR review updates
* Fix comments and move around pieces
* Fix comments
* Remove context from client struct
* Moved ctx out of sink struct and fixed filters, added a test
* Remove named imports, use errors.New if not fformatting
* Remove HCP dependencies in telemetry package
* Add success metric and move lock only to grab the t.cfgHahs
* Update hash
* fix nits
* Create an equals method and add tests
* Improve telemetry_config_provider.go tests
* Add race test
* Add missing godoc
* Remove mock for MetricsClient
* Avoid goroutine test panics
* trying to kick CI lint issues by upgrading mod
* imprve test code and add hasher for testing
* Use structure logging for filters, fix error constants, and default to allow all regex
* removed hashin and modify logic to simplify
* Improve race test and fix PR feedback by removing hash equals and avoid testing the timer.Ticker logic, and instead unit test
* Ran make go-mod-tidy
* Use errtypes in the test
* Add changelog
* add safety check for exporter endpoint
* remove require.Contains by using error types, fix structure logging, and fix success metric typo in exporter
* Fixed race test to have changing config values
* Send success metric before modifying config
* Avoid the defer and move the success metric under
* [CC-5719] Add support for builtin global-read-only policy
* Add changelog
* Add read-only to docs
* Fix some minor issues.
* Change from ReplaceAll to Sprintf
* Change IsValidPolicy name to return an error instead of bool
* Fix PolicyList test
* Fix other tests
* Apply suggestions from code review
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* Fix state store test for policy list.
* Fix naming issues
* Update acl/validation.go
Co-authored-by: Chris Thain <32781396+cthain@users.noreply.github.com>
* Update agent/consul/acl_endpoint.go
---------
Co-authored-by: Paul Glass <pglass@hashicorp.com>
Co-authored-by: Chris Thain <32781396+cthain@users.noreply.github.com>
Prevent partial application of Envoy extensions
Ensure that non-required extensions do not change xDS resources before
exiting on failure by cloning proto messages prior to applying each
extension.
To support this change, also move `CanApply` checks up a layer and make
them prior to attempting extension application, s.t. we avoid
unnecessary copies where extensions can't be applied.
Last, ensure that we do not allow panics from `CanApply` or `Extend`
checks to escape the attempted extension application.
* Fix topoloy intention with mixed connect-native/normal services.
If a service is registered twice, once with connect-native and once
without, the topology views would prune the existing intentions. This
change brings the code more in line with the transparent proxy behavior.
* Dedupe nodes in the ServiceTopology ui endpoint (like done with tags).
* Consider a service connect-native as soon as one instance is.
* api-gateway: subscribe to bound-api-gateway only after receiving api-gateway
This fixes a race condition due to our dependency on having the listener(s) from the api-gateway config entry in order to fully and properly process the resources on the bound-api-gateway config entry.
* Apply suggestions from code review
* Add changelog entry
Backfill changelog entry for c2bbe67 and 7402d06
Add a changelog entry for the follow-up PR since it was specific to the
fix and references the original change.
Update Go version to 1.20.6
This resolves [CVE-2023-29406]
(https://nvd.nist.gov/vuln/detail/CVE-2023-29406) for uses of the
`net/http` standard library.
Note that until the follow-up to #18124 is done, the version of Go used
in those impacted tests will need to remain on 1.20.5.
Bump golang.org/x/net to 0.12.0
While not necessary to directly address CVE-2023-29406 (which should be
handled by using a patched version of Go when building), an
accompanying change to HTTP/2 error handling does impact agent code.
See https://go-review.googlesource.com/c/net/+/506995 for the HTTP/2
change.
Bump this dependency across our submodules as well for the sake of
potential indirect consumers of `x/net/http`.
Updating RootPKIPath but not IntermediatePKIPath would not update
leaf signing certs with the new root. Unsure if this happens in practice
but manual testing showed it is a bug that would break mesh and agent
connections once the old root is pruned.
* update UINodes and UINodeInfo response with consul-version info added as NodeMeta, fetched from serf members
* update test cases TestUINodes, TestUINodeInfo
* added nil check for map
* add consul-version in local agent node metadata
* get consul version from serf member and add this as node meta in catalog register request
* updated ui mock response to include consul versions as node meta
* updated ui trans and added version as query param to node list route
* updates in ui templates to display consul version with filter and sorts
* updates in ui - model class, serializers,comparators,predicates for consul version feature
* added change log for Consul Version Feature
* updated to get version from consul service, if for some reason not available from serf
* updated changelog text
* updated dependent testcases
* multiselection version filter
* Update agent/consul/state/catalog.go
comments updated
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
---------
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
* fix(cli): remove failing check from 'connect envoy' registration for api gateway
* test(integration): add tests to check catalog statsus of gateways on startup
* remove extra sleep comment
* Update test/integration/consul-container/libs/assert/service.go
* changelog
This PR fixes a bug that was introduced in:
https://github.com/hashicorp/consul/pull/16021
A user setting a protocol in proxy-defaults would cause tproxy implicit
upstreams to not honor the upstream service's protocol set in its
`ServiceDefaults.Protocol` field, and would instead always use the
proxy-defaults value.
Due to the fact that upstreams configured with "tcp" can successfully contact
upstream "http" services, this issue was not recognized until recently (a
proxy-defaults with "tcp" and a listening service with "http" would make
successful requests, but not the opposite).
As a temporary work-around, users experiencing this issue can explicitly set
the protocol on the `ServiceDefaults.UpstreamConfig.Overrides`, which should
take precedence.
The fix in this PR removes the proxy-defaults protocol from the wildcard
upstream that tproxy uses to configure implicit upstreams. When the protocol
was included, it would always overwrite the value during discovery chain
compilation, which was not correct. The discovery chain compiler also consumes
proxy defaults to determine the protocol, so simply excluding it from the
wildcard upstream config map resolves the issue.
* # This is a combination of 9 commits.
# This is the 1st commit message:
init without tests
# This is the commit message #2:
change log
# This is the commit message #3:
fix tests
# This is the commit message #4:
fix tests
# This is the commit message #5:
added tests
# This is the commit message #6:
change log breaking change
# This is the commit message #7:
removed breaking change
# This is the commit message #8:
fix test
# This is the commit message #9:
keeping the test behaviour same
* # This is a combination of 12 commits.
# This is the 1st commit message:
init without tests
# This is the commit message #2:
change log
# This is the commit message #3:
fix tests
# This is the commit message #4:
fix tests
# This is the commit message #5:
added tests
# This is the commit message #6:
change log breaking change
# This is the commit message #7:
removed breaking change
# This is the commit message #8:
fix test
# This is the commit message #9:
keeping the test behaviour same
# This is the commit message #10:
made enable debug atomic bool
# This is the commit message #11:
fix lint
# This is the commit message #12:
fix test true enable debug
* parent 10f500e895d92cc3691ade7b74a33db755d22039
author absolutelightning <ashesh.vidyut@hashicorp.com> 1687352587 +0530
committer absolutelightning <ashesh.vidyut@hashicorp.com> 1687352592 +0530
init without tests
change log
fix tests
fix tests
added tests
change log breaking change
removed breaking change
fix test
keeping the test behaviour same
made enable debug atomic bool
fix lint
fix test true enable debug
using enable debug in agent as atomic bool
test fixes
fix tests
fix tests
added update on correct locaiton
fix tests
fix reloadable config enable debug
fix tests
fix init and acl 403
* revert commit
* Ensure RSA keys are at least 2048 bits in length
* Add changelog
* update key length check for FIPS compliance
* Fix no new variables error and failing to return when error exists from
validating
* clean up code for better readability
* actually return value
* Fix a bug that wrongly trims domains when there is an overlap with DC name
Before this change, when DC name and domain/alt-domain overlap, the domain name incorrectly trimmed from the query.
Example:
Given: datacenter = dc-test, alt-domain = test.consul.
Querying for "test-node.node.dc-test.consul" will faile, because the
code was trimming "test.consul" instead of just ".consul"
This change, fixes the issue by adding dot (.) before trimming
* trimDomain: ensure domain trimmed without modyfing original domains
* update changelog
---------
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Update CA provider docs
Clarify that providers can differ between
primary and secondary datacenters
Provide a comparison chart for consul vs
vault CA providers
Loosen Vault CA provider validation for RootPKIPath
Update Vault CA provider documentation
* Reject inbound Prop Override patch with Services
Services filtering is only supported for outbound TrafficDirection patches.
* Improve Prop Override unexpected type validation
- Guard against additional invalid parent and target types
- Add specific error handling for Any fields (unsupported)
Fix issue with streaming service health watches.
This commit fixes an issue where the health streams were unaware of service
export changes. Whenever an exported-services config entry is modified, it is
effectively an ACL change.
The bug would be triggered by the following situation:
- no services are exported
- an upstream watch to service X is spawned
- the streaming backend filters out data for service X (due to lack of exports)
- service X is finally exported
In the situation above, the streaming backend does not trigger a refresh of its
data. This means that any events that were supposed to have been received prior
to the export are NOT backfilled, and the watches never see service X spawning.
We currently have decided to not trigger a stream refresh in this situation due
to the potential for a thundering herd effect (touching exports would cause a
re-fetch of all watches for that partition, potentially). Therefore, a local
blocking-query approach was added by this commit for agentless.
It's also worth noting that the streaming subscription is currently bypassed
most of the time with agentful, because proxycfg has a `req.Source.Node != ""`
which prevents the `streamingEnabled` check from passing. This means that while
agents should technically have this same issue, they don't experience it with
mesh health watches.
Note that this is a temporary fix that solves the issue for proxycfg, but not
service-discovery use cases.
* agent: remove agent cache dependency from service mesh leaf certificate management
This extracts the leaf cert management from within the agent cache.
This code was produced by the following process:
1. All tests in agent/cache, agent/cache-types, agent/auto-config,
agent/consul/servercert were run at each stage.
- The tests in agent matching .*Leaf were run at each stage.
- The tests in agent/leafcert were run at each stage after they
existed.
2. The former leaf cert Fetch implementation was extracted into a new
package behind a "fake RPC" endpoint to make it look almost like all
other cache type internals.
3. The old cache type was shimmed to use the fake RPC endpoint and
generally cleaned up.
4. I selectively duplicated all of Get/Notify/NotifyCallback/Prepopulate
from the agent/cache.Cache implementation over into the new package.
This was renamed as leafcert.Manager.
- Code that was irrelevant to the leaf cert type was deleted
(inlining blocking=true, refresh=false)
5. Everything that used the leaf cert cache type (including proxycfg
stuff) was shifted to use the leafcert.Manager instead.
6. agent/cache-types tests were moved and gently replumbed to execute
as-is against a leafcert.Manager.
7. Inspired by some of the locking changes from derek's branch I split
the fat lock into N+1 locks.
8. The waiter chan struct{} was eventually replaced with a
singleflight.Group around cache updates, which was likely the biggest
net structural change.
9. The awkward two layers or logic produced as a byproduct of marrying
the agent cache management code with the leaf cert type code was
slowly coalesced and flattened to remove confusion.
10. The .*Leaf tests from the agent package were copied and made to work
directly against a leafcert.Manager to increase direct coverage.
I have done a best effort attempt to port the previous leaf-cert cache
type's tests over in spirit, as well as to take the e2e-ish tests in the
agent package with Leaf in the test name and copy those into the
agent/leafcert package to get more direct coverage, rather than coverage
tangled up in the agent logic.
There is no net-new test coverage, just coverage that was pushed around
from elsewhere.
* backport ent changes to oss
* Update .changelog/_5669.txt
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
---------
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>