This PR fixes a bug that was introduced in:
https://github.com/hashicorp/consul/pull/16021
A user setting a protocol in proxy-defaults would cause tproxy implicit
upstreams to not honor the upstream service's protocol set in its
`ServiceDefaults.Protocol` field, and would instead always use the
proxy-defaults value.
Due to the fact that upstreams configured with "tcp" can successfully contact
upstream "http" services, this issue was not recognized until recently (a
proxy-defaults with "tcp" and a listening service with "http" would make
successful requests, but not the opposite).
As a temporary work-around, users experiencing this issue can explicitly set
the protocol on the `ServiceDefaults.UpstreamConfig.Overrides`, which should
take precedence.
The fix in this PR removes the proxy-defaults protocol from the wildcard
upstream that tproxy uses to configure implicit upstreams. When the protocol
was included, it would always overwrite the value during discovery chain
compilation, which was not correct. The discovery chain compiler also consumes
proxy defaults to determine the protocol, so simply excluding it from the
wildcard upstream config map resolves the issue.
* # This is a combination of 9 commits.
# This is the 1st commit message:
init without tests
# This is the commit message #2:
change log
# This is the commit message #3:
fix tests
# This is the commit message #4:
fix tests
# This is the commit message #5:
added tests
# This is the commit message #6:
change log breaking change
# This is the commit message #7:
removed breaking change
# This is the commit message #8:
fix test
# This is the commit message #9:
keeping the test behaviour same
* # This is a combination of 12 commits.
# This is the 1st commit message:
init without tests
# This is the commit message #2:
change log
# This is the commit message #3:
fix tests
# This is the commit message #4:
fix tests
# This is the commit message #5:
added tests
# This is the commit message #6:
change log breaking change
# This is the commit message #7:
removed breaking change
# This is the commit message #8:
fix test
# This is the commit message #9:
keeping the test behaviour same
# This is the commit message #10:
made enable debug atomic bool
# This is the commit message #11:
fix lint
# This is the commit message #12:
fix test true enable debug
* parent 10f500e895d92cc3691ade7b74a33db755d22039
author absolutelightning <ashesh.vidyut@hashicorp.com> 1687352587 +0530
committer absolutelightning <ashesh.vidyut@hashicorp.com> 1687352592 +0530
init without tests
change log
fix tests
fix tests
added tests
change log breaking change
removed breaking change
fix test
keeping the test behaviour same
made enable debug atomic bool
fix lint
fix test true enable debug
using enable debug in agent as atomic bool
test fixes
fix tests
fix tests
added update on correct locaiton
fix tests
fix reloadable config enable debug
fix tests
fix init and acl 403
* revert commit
* Ensure RSA keys are at least 2048 bits in length
* Add changelog
* update key length check for FIPS compliance
* Fix no new variables error and failing to return when error exists from
validating
* clean up code for better readability
* actually return value
* Fix a bug that wrongly trims domains when there is an overlap with DC name
Before this change, when DC name and domain/alt-domain overlap, the domain name incorrectly trimmed from the query.
Example:
Given: datacenter = dc-test, alt-domain = test.consul.
Querying for "test-node.node.dc-test.consul" will faile, because the
code was trimming "test.consul" instead of just ".consul"
This change, fixes the issue by adding dot (.) before trimming
* trimDomain: ensure domain trimmed without modyfing original domains
* update changelog
---------
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
For consistency, resource type names must follow these rules:
- `Group` must be snake case, and in most cases a single word.
- `GroupVersion` must be lowercase, start with a "v" and end with a number.
- `Kind` must be pascal case.
These were chosen because they map to our protobuf type naming
conventions.
Update CA provider docs
Clarify that providers can differ between
primary and secondary datacenters
Provide a comparison chart for consul vs
vault CA providers
Loosen Vault CA provider validation for RootPKIPath
Update Vault CA provider documentation
* Reject inbound Prop Override patch with Services
Services filtering is only supported for outbound TrafficDirection patches.
* Improve Prop Override unexpected type validation
- Guard against additional invalid parent and target types
- Add specific error handling for Any fields (unsupported)
Fix issue with streaming service health watches.
This commit fixes an issue where the health streams were unaware of service
export changes. Whenever an exported-services config entry is modified, it is
effectively an ACL change.
The bug would be triggered by the following situation:
- no services are exported
- an upstream watch to service X is spawned
- the streaming backend filters out data for service X (due to lack of exports)
- service X is finally exported
In the situation above, the streaming backend does not trigger a refresh of its
data. This means that any events that were supposed to have been received prior
to the export are NOT backfilled, and the watches never see service X spawning.
We currently have decided to not trigger a stream refresh in this situation due
to the potential for a thundering herd effect (touching exports would cause a
re-fetch of all watches for that partition, potentially). Therefore, a local
blocking-query approach was added by this commit for agentless.
It's also worth noting that the streaming subscription is currently bypassed
most of the time with agentful, because proxycfg has a `req.Source.Node != ""`
which prevents the `streamingEnabled` check from passing. This means that while
agents should technically have this same issue, they don't experience it with
mesh health watches.
Note that this is a temporary fix that solves the issue for proxycfg, but not
service-discovery use cases.
* agent: remove agent cache dependency from service mesh leaf certificate management
This extracts the leaf cert management from within the agent cache.
This code was produced by the following process:
1. All tests in agent/cache, agent/cache-types, agent/auto-config,
agent/consul/servercert were run at each stage.
- The tests in agent matching .*Leaf were run at each stage.
- The tests in agent/leafcert were run at each stage after they
existed.
2. The former leaf cert Fetch implementation was extracted into a new
package behind a "fake RPC" endpoint to make it look almost like all
other cache type internals.
3. The old cache type was shimmed to use the fake RPC endpoint and
generally cleaned up.
4. I selectively duplicated all of Get/Notify/NotifyCallback/Prepopulate
from the agent/cache.Cache implementation over into the new package.
This was renamed as leafcert.Manager.
- Code that was irrelevant to the leaf cert type was deleted
(inlining blocking=true, refresh=false)
5. Everything that used the leaf cert cache type (including proxycfg
stuff) was shifted to use the leafcert.Manager instead.
6. agent/cache-types tests were moved and gently replumbed to execute
as-is against a leafcert.Manager.
7. Inspired by some of the locking changes from derek's branch I split
the fat lock into N+1 locks.
8. The waiter chan struct{} was eventually replaced with a
singleflight.Group around cache updates, which was likely the biggest
net structural change.
9. The awkward two layers or logic produced as a byproduct of marrying
the agent cache management code with the leaf cert type code was
slowly coalesced and flattened to remove confusion.
10. The .*Leaf tests from the agent package were copied and made to work
directly against a leafcert.Manager to increase direct coverage.
I have done a best effort attempt to port the previous leaf-cert cache
type's tests over in spirit, as well as to take the e2e-ish tests in the
agent package with Leaf in the test name and copy those into the
agent/leafcert package to get more direct coverage, rather than coverage
tangled up in the agent logic.
There is no net-new test coverage, just coverage that was pushed around
from elsewhere.
This includes prioritize by localities on disco chain targets rather than
resolvers, allowing different targets within the same partition to have
different policies.
* Add header filter to api-gateway xDS golden test
* Stop adding all header filters to virtual host when generating xDS for api-gateway
* Regenerate xDS golden file for api-gateway w/ header filter
Ensure that the embedded api struct is properly parsed when
deserializing config containing a set ResourceFilter.Services field.
Also enhance existing integration test to guard against bugs and
exercise this field.
TLDR with many modules the versions included in each diverged quite a bit. Attempting to use Go Workspaces produces a bunch of errors.
This commit:
1. Fixes envoy-library-references.sh to work again
2. Ensures we are pulling in go-control-plane@v0.11.0 everywhere (previously it was at that version in some modules and others were much older)
3. Remove one usage of golang/protobuf that caused us to have a direct dependency on it.
4. Remove deprecated usage of the Endpoint field in the grpc resolver.Target struct. The current version of grpc (v1.55.0) has removed that field and recommended replacement with URL.Opaque and calls to the Endpoint() func when needing to consume the previous field.
4. `go work init <all the paths to go.mod files>` && `go work sync`. This syncrhonized versions of dependencies from the main workspace/root module to all submodules
5. Updated .gitignore to ignore the go.work and go.work.sum files. This seems to be standard practice at the moment.
6. Update doc comments in protoc-gen-consul-rate-limit to be go fmt compatible
7. Upgraded makefile infra to perform linting, testing and go mod tidy on all modules in a flexible manner.
8. Updated linter rules to prevent usage of golang/protobuf
9. Updated a leader peering test to account for an extra colon in a grpc error message.
When UpstreamEnvoyExtender was introduced, some code was left duplicated
between it and BasicEnvoyExtender. One path in that code panics when a
TProxy listener patch is attempted due to no upstream data in
RuntimeConfig matching the local service (which would only happen in
rare cases).
Instead, we can remove the special handling of upstream VIPs from
BasicEnvoyExtender entirely, greatly simplifying the listener filter
patch code and avoiding the panic. UpstreamEnvoyExtender, which needs
this code to function, is modified to ensure a panic does not occur.
This also fixes a second regression in which the Lua extension was not
applied to TProxy outbound listeners.