Ultimately we will have to rectify wan federation with v2 catalog adjacent
experiments, but for now blanket prevent usage of the resource-apis,
v2dns, and v2tenancy experiments in secondary datacenters.
This add a fix to properly verify the gateway mode before creating a watch specific to mesh gateways. This watch have a high performance cost and when mesh gateways are not used is not used.
This also adds an optimization to only return the nodes when watching the Internal.ServiceDump RPC to avoid unnecessary disco chain compilation. As watches in proxy config only need the nodes.
* Upgrade Go to 1.21
* ci: detect Go backwards compatibility test version automatically
For our submodules and other places we choose to test against previous
Go versions, detect this version automatically from the current one
rather than hard-coding it.
* NET-6945 - Replace usage of deprecated Envoy field envoy.config.core.v3.HeaderValueOption.append
* update proto for v2 and then update xds v2 logic
* add changelog
* Update 20078.txt to be consistent with existing changelog entries
* swap enum values tomatch envoy.
* NET-6946 - Replace usage of deprecated Envoy field envoy.config.route.v3.HeaderMatcher.safe_regex_match
* removing unrelated changes
* update golden files
* do not set engine type
This commit fixes an issue where the partition was not properly set
on the peering query failover target created from sameness-groups.
Before this change, it was always empty, meaning that the data
would be queried with respect to the default partition always. This
resulted in a situation where a PQ that was attempting to use a
sameness-group for failover would select peers from the default
partition, rather than the partition of the sameness-group itself.
* cli: Deprecate the `-admin-access-log-path` flag from `consul connect envoy` command in favor of: `-admin-access-log-config`.
* fix changelog
* add in documentation change.
* updating usage of http2_protocol_options and access_log_path
* add changelog
* update template for AdminAccessLogConfig
* remove mucking with AdminAccessLogConfig
* add a hash to config entries when normalizing
* add GetHash and implement comparing hashes
* only update if the Hash is different
* only update if the Hash is different and not 0
* fix proto to include the Hash
* fix proto gen
* buf format
* add SetHash and fix tests
* fix config load tests
* fix state test and config test
* recalculate hash when restoring config entries
* fix snapshot restore test
* add changelog
* fix missing normalize, fix proto indexes and add normalize test
When a large number of upstreams are configured on a single envoy
proxy, there was a chance that it would timeout when waiting for
ClusterLoadAssignments. While this doesn't always immediately cause
issues, consul-dataplane instances appear to consistently drop
endpoints from their configurations after an xDS connection is
re-established (the server dies, random disconnect, etc).
This commit adds an `xds_fetch_timeout_ms` config to service registrations
so that users can set the value higher for large instances that have
many upstreams. The timeout can be disabled by setting a value of `0`.
This configuration was introduced to reduce the risk of causing a
breaking change for users if there is ever a scenario where endpoints
would never be received. Rather than just always blocking indefinitely
or for a significantly longer period of time, this config will affect
only the service instance associated with it.
This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster
Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.
The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.
An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects. Visit the following thread for some
extra context on why certainty around these race conditions is difficult:
https://github.com/envoyproxy/envoy/issues/13009
This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.
Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
* Add CE version of gateway-upstream-disambiguation
* Use NamespaceOrDefault and PartitionOrDefault
* Add Changelog entry
* Remove the unneeded reassignment
* Use c.ID()
* Adding cli command to list exported services to a peer
* Changelog added
* Addressing docs comments
* Adding test case for no exported services scenario
The client.rpc metric now excludes internal retries for consistency
with client.rpc.exceeded and client.rpc.failed. All of these metrics
now increment at most once per RPC method call, allowing for
accurate calculation of failure / rate limit application occurrence.
Additionally, if an RPC fails because no servers are present,
client.rpc.failed is now incremented.
* Upgrade hcp-sdk-go to latest version v0.73
Changes:
- go get github.com/hashicorp/hcp-sdk-go
- go mod tidy
* From upgrade: regenerate protobufs for upgrade from 1.30 to 1.31
Ran: `make proto`
Slack: https://hashicorp.slack.com/archives/C0253EQ5B40/p1701105418579429
* From upgrade: fix mock interface implementation
After upgrading, there is the following compile error:
cannot use &mockHCPCfg{} (value of type *mockHCPCfg) as "github.com/hashicorp/hcp-sdk-go/config".HCPConfig value in return statement: *mockHCPCfg does not implement "github.com/hashicorp/hcp-sdk-go/config".HCPConfig (missing method Logout)
Solution: update the mock to have the missing Logout method
* From upgrade: Lint: remove usage of deprecated req.ServerState.TLS
Due to upgrade, linting is erroring due to usage of a newly deprecated field
22:47:56 [consul]: make lint
--> Running golangci-lint (.)
agent/hcp/testing.go:157:24: SA1019: req.ServerState.TLS is deprecated: use server_tls.internal_rpc instead. (staticcheck)
time.Until(time.Time(req.ServerState.TLS.CertExpiry)).Hours()/24,
^
* From upgrade: adjust oidc error message
From the upgrade, this test started failing:
=== FAIL: internal/go-sso/oidcauth TestOIDC_ClaimsFromAuthCode/failed_code_exchange (re-run 2) (0.01s)
oidc_test.go:393: unexpected error: Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"
Prior to the upgrade, the error returned was:
```
Provider login failed: Error exchanging oidc code: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"invalid_grant\",\"error_description\":\"unexpected auth code\"}\n
```
Now the error returned is as below and does not contain "cannot fetch token"
```
Provider login failed: Error exchanging oidc code: oauth2: "invalid_grant" "unexpected auth code"
```
* Update AgentPushServerState structs with new fields
HCP-side changes for the new fields are in:
https://github.com/hashicorp/cloud-global-network-manager-service/pull/1195/files
* Minor refactor for hcpServerStatus to abstract tlsInfo into struct
This will make it easier to set the same tls-info information to both
- status.TLS (deprecated field)
- status.ServerTLSMetadata (new field to use instead)
* Update hcpServerStatus to parse out information for new fields
Changes:
- Improve error message and handling (encountered some issues and was confused)
- Set new field TLSInfo.CertIssuer
- Collect certificate authority metadata and set on TLSInfo.CertificateAuthorities
- Set TLSInfo on both server.TLS and server.ServerTLSMetadata.InternalRPC
* Update serverStatusToHCP to convert new fields to GNM rpc
* Add changelog
* Feedback: connect.ParseCert, caCerts
* Feedback: refactor and unit test server status
* Feedback: test to use expected struct
* Feedback: certificate with intermediate
* Feedback: catch no leaf, remove expectedErr
* Feedback: update todos with jira ticket
* Feedback: mock tlsConfigurator
security: Bump github.com/golang-jwt/jwt/v4 to 4.5.0
This version is accepted by Prisma/Twistlock, resolving scan results for
issue PRISMA-2022-0270. Chosen over later versions to avoid a major
version with breaking changes that is otherwise unnecessary.
Note that in practice this is a false positive (see
https://github.com/golang-jwt/jwt/issues/258), but we should update the
version to aid customers relying on scanners that flag it.
* Set default of 1m for StatsFlushInterval when the collector is setup
* Add documentation on the stats_flush_interval value
* Do not default in two conditions 1) preconfigured sinks exist 2) preconfigured flush interval exists
* Fix wording of docs
* Add changelog
* Fix docs
* Initial work for sidenav
* Use HDS::Text
* Add resolution for ember-element-helper
* WIP dc selector
* Update HCP Home link
* DC selector
* Hook up remaining selectors
* Fix settings and tutorial links
* Remove comments
* Remove skip-links
* Replace auth with new dropdown
* Use href-to helper for sidenav links
* Changelog
* Add description to NavSelector
* Wrap version in footer and role
* Fix login tests
* Add data-test selectors for namespaces
* Fix datacenter disclosure menu test
* Stop rendering auth dialog if acls are disabled
* Update disabled selector state and token selector
* Fix logic in ACL selector
* Fix HCP Home integration test
* Remove toggling the sidenav in tests
* Add sidenav to eng docs
* Re-add debug navigation for eng docs
* Remove ember-in-viewport
* Remove unused styles
* Upgrade @hashicorp/design-system-componentseee
* Add translations for side-nav
* Only show back to hcp link if url is present
* Disable responsive due to a11y-dialog issue
Fix issue with wanfed lan ip conflicts.
Prior to this commit, the connection pools were unaware which datacenter the
connection was associated with. This meant that any time servers with
overlapping LAN IP addresses and node shortnames existed, they would be
incorrectly co-located in the same pool. Whenever this occurred, the servers
would get stuck in an infinite loop of forwarding RPCs to themselves (rather
than the intended remote DC) until they eventually run out of memory.
Most notably, this issue can occur whenever wan federation through mesh
gateways is enabled.
This fix adds extra metadata to specify which DC the connection is associated
with in the pool.
Prior to the introduction of this configuration, grpc keepalive messages were
sent after 2 hours of inactivity on the stream. This posed issues in various
scenarios where the server-side xds connection balancing was unaware that envoy
instances were uncleanly killed / force-closed, since the connections would
only be cleaned up after ~5 minutes of TCP timeouts occurred. Setting this
config to a 30 second interval with a 20 second timeout ensures that at most,
it should take up to 50 seconds for a dead xds connection to be closed.
This PR fixes an issue where upstreams did not correctly inherit the proper
namespace / partition from the parent service when attempting to fetch the
upstream protocol due to inconsistent normalization.
Some of the merge-service-configuration logic would normalize to default, while
some of the proxycfg logic would normalize to match the parent service. Due to
this mismatch in logic, an incorrect service-defaults configuration entry would
be fetched and have its protocol applied to the upstream.
* Add InboundPeerTrustBundle maps to Terminating Gateway
* Add notify and cancelation of watch for inbound peer trust bundles
* Pass peer trust bundles to the RBAC creation function
* Regenerate Golden Files
* add changelog, also adds another spot that needed peeredTrustBundles
* Add basic test for terminating gateway with peer trust bundle
* Add intention to cluster peered golden test
* rerun codegen
* update changelog
* really update the changelog
---------
Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
Ongoing work to support Nomad Workload Identity for authenticating with Consul
will mean that Nomad's service registration sync with Consul will want to use
Consul tokens scoped to individual workloads for registering services and
checks. The `CheckRegister` method in the API doesn't have an option to pass the
token in, which prevent us from sharing the same Consul connection for all
workloads. Add a `CheckRegisterOpts` to match the behavior of
`ServiceRegisterOpts`.
Ongoing work to support Nomad Workload Identity for authenticating with Consul
will mean that Nomad's service registration sync with Consul will want to use
Consul tokens scoped to individual workloads for registering services and
checks. The `ServiceRegisterOpts` type in the API doesn't have an option to pass
the token in, which prevent us from sharing the same Consul connection for all
workloads. Add a `Token` field to match the behavior of `ServiceDeregisterOpts`.
* dns token
fix whitespace for docs and comments
fix test cases
fix test cases
remove tabs in help text
Add changelog
Peering dns test
Peering dns test
Partial implementation of Peered DNS test
Swap to new topology lib
expose dns port for integration tests on client
remove partial test implementation
remove extra port exposure
remove changelog from the ent pr
Add dns token to set-agent-token switch
Add enterprise golden file
Use builtin/dns template in tests
Update ent dns policy
Update ent dns template test
remove local gen certs
fix templated policy specs
* add changelog
* go mod tidy
Configure Envoy to use the same HTTP protocol version used by the
downstream caller when forwarding requests to a local application that
is configured with the protocol set to either `http2` or `grpc`.
This allows upstream applications that support both HTTP/1.1 and
HTTP/2 on a single port to receive requests using either protocol. This
is beneficial when the application primarily communicates using HTTP/2,
but also needs to support HTTP/1.1, such as to respond to Kubernetes
HTTP readiness/liveness probes.
Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
* debug since
* fix docs
* chagelog added
* fix go mod
* debug test fix
* fix test
* tabs test fix
* Update .changelog/18797.txt
Co-authored-by: Ganesh S <ganesh.seetharaman@hashicorp.com>
---------
Co-authored-by: Ganesh S <ganesh.seetharaman@hashicorp.com>
* Add response header filters to http-route config entry definitions
* Map response header filters from config entry when constructing route destination
* Support response header modifiers at the service level as well
* Update protobuf definitions
* Update existing unit tests
* Add response filters to route consolidation logic
* Make existing unit tests more robust
* Add missing docstring
* Add changelog entry
* Add response filter modifiers to existing integration test
* Add more robust testing for response header modifiers in the discovery chain
* Add more robust testing for request header modifiers in the discovery chain
* Modify test to verify that service filter modifiers take precedence over rule filter modifiers
* [NET-5325] ACL templated policies support in tokens and roles
- Add API support for creating tokens/roles with templated-policies
- Add CLI support for creating tokens/roles with templated-policies
* adding changelog
This PR enables the GetEnvoyBootstrapParams endpoint to construct envoy bootstrap parameters from v2 catalog and mesh resources.
* Make bootstrap request and response parameters less specific to services so that we can re-use them for workloads or service instances.
* Remove ServiceKind from bootstrap params response. This value was unused previously and is not needed for V2.
* Make access logs generation generic so that we can generate them using v1 or v2 resources.
Add support for querying tokens by service name
The consul-k8s endpoints controller has a workflow where it fetches all tokens.
This is not performant for large clusters, where there may be a sizable number
of tokens. This commit attempts to alleviate that problem and introduces a new
way to query by the token's service name.
Fix issue where agentless endpoints would fail to populate after snapshot restore.
Fixes an issue that was introduced in #17775. This issue happens because
a long-lived pointer to the state store is held, which is unsafe to do.
Snapshot restorations will swap out this state store, meaning that the
proxycfg watches would break for agentless.
* init
* tests added and few fixes
* revert arg message
* changelog added
* removed var declaration
* fix CI
* fix test
* added node name and status
* updated save.mdx
* added example
* fix tense
* fix description
* fix for #18406 , non presence of consul-version meta
* removed redundant checks
* updated mock-api to mimic api response for synthetic nodes
* added test to test getDistinctConsulVersions method with synthetic-node case
* updated typo in comments
* added change log
* Added oss config entries for Policy and JWT on APIGW
* Updated structs for config entry
* Updated comments, ran deep-copy
* Move JWT configuration into OSS file
* Add in the config entry OSS file for jwts
* Added changelog
* fixing proto spacing
* Moved to using manually written deep copy method
* Use pointers for override/default fields in apigw config entries
* Run gen scripts for changed types
* Add logging to locality policy application
In OSS, this is currently a no-op.
* Inherit locality when registering sidecars
When sidecar locality is not explicitly configured, inherit locality
from the proxied service.