Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Previously we would associate the address of a discovery chain target
with the discovery chain's filter chain. This was broken for a few reasons:
- If the upstream is a virtual service, the client proxy has no way of
dialing it because virtual services are not targets of their discovery
chains. The targets are distinct services. This is addressed by watching
the endpoints of all upstream services, not just their discovery chain
targets.
- If multiple discovery chains resolve to the same target, that would
lead to multiple filter chains attempting to match on the target's
virtual IP. This is addressed by only matching on the upstream's virtual
IP.
NOTE: this implementation requires an intention to the redirecting
virtual service and not just to the final destination. This is how
we can know that the virtual service is an upstream to watch.
A later PR will look into traversing discovery chains when computing
upstreams so that intentions are only required to the discovery chain
targets.
* WIP reloadable raft config
* Pre-define new raft gauges
* Update go-metrics to change gauge reset behaviour
* Update raft to pull in new metric and reloadable config
* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important
* Update telemetry docs
* Update config and telemetry docs
* Add note to oldestLogAge on when it is visible
* Add changelog entry
* Update website/content/docs/agent/options.mdx
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* Give descriptive error if auth method not found
Previously during a `consul login -method=blah`, if the auth method was not found, the
error returned would be "ACL not found". This is potentially confusing
because there may be many different ACLs involved in a login: the ACL of
the Consul client, perhaps the binding rule or the auth method.
Now the error will be "auth method blah not found", which is much easier
to debug.
Initially we were loading every potential upstream address into Envoy
and then routing traffic to the logical upstream service. The downside
of this behavior is that traffic meant to go to a specific instance
would be load balanced across ALL instances.
Traffic to specific instance IPs should be forwarded to the original
destination and if it's a destination in the mesh then we should ensure
the appropriate certificates are used.
This PR makes transparent proxying a Kubernetes-only feature for now
since support for other environments requires generating virtual IPs,
and Consul does not do that at the moment.
If an automatic backport fails to more than one release branch we need
to crate a PR to backport it. So far we've had to create a backport PR
for each target release branch.
With this change, we may be able to create only a single PR, and then
run the backport automation to cherry-pick it into other release
branches.
The idea is that if a change introduced in version n-1 caused a
conflict, and there are no other changes, then the backport automation
should be able to use the same commit for version n-2 and n-3.
* Update header logo and inline icon
* Update full logos + layout on loading screen
* Update favicon assets and strategy
- Switches to serve an ico file alongside an SVG file
- Introduces an apple-touch-icon
* Removes unused favicon/meta assets
* Changelog item for ui
* Create component for logo
* Simplify logo component, set brand color
* Fix docs loading state CSS issue
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.
In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
The only thing that needed fixing up pertained to this section of the 1.18.x release notes:
> grpc_stats: the default value for stats_for_all_methods is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default.
For now to maintain status-quo I'm explicitly setting `stats_for_all_methods=true` in all versions to avoid relying upon the default.
Additionally the naming of the emitted metrics for these gRPC requests changed slightly so the integration test assertions for `case-grpc` needed adjusting.
This ensures that if someone does include some extension Consul does not currently make use of, that extension is actually usable. Without linking these envoy protobufs into the main binary it can't round trip the escape hatches to send them down to envoy.
Whenenver the go-control-plane library is upgraded next we just have to re-run 'make envoy-library'.
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.
Union of all commit messages follows to give an overarching summary:
xds: exclusively support incremental xDS when using xDS v3
Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support
Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit
xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings
In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.
This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.
xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
* Update website/content/docs/discovery/services.mdx with address field behavior.
Co-authored-by: Jono Sosulska <42216911+jsosulska@users.noreply.github.com>
Co-authored-by: Jono Sosulska <42216911+jsosulska@users.noreply.github.com>
* Use proxy outbound port from TransparentProxyConfig if provided
* If -proxy-id is provided to the redirect-traffic command, exclude any listener ports
from inbound traffic redirection. This includes envoy_prometheus_bind_addr,
envoy_stats_bind_addr, and the ListenerPort from the Expose configuration.
* Allow users to provide additional inbound and outbound ports, outbound CIDRs
and additional user IDs to be excluded from traffic redirection.
This affects both the traffic-redirect command and the iptables SDK package.
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.
Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
We noticed that TestUpstreamListener would deadlock sometimes when run
with the race detector. While debugging this issue I found and fixed the
following problems.
1. the net.Listener was not being closed properly when Listener.Stop was
called. This caused the Listener.Serve goroutine to run forever.
Fixed by storing a reference to net.Listener and closing it properly
when Listener.Stop is called.
2. call connWG.Add in the correct place. WaitGroup.Add must be called
before starting a goroutine, not from inside the goroutine.
3. Set metrics config EnableRuntimeMetrics to `false` so that we don't
start a background goroutine in each test for no reason. There is no
way to shutdown this goroutine, and it was an added distraction while
debugging these timeouts.
5. two tests were calling require.NoError from a goroutine.
require.NoError calls t.FailNow, which MUST be called from the main
test goroutine. Instead use t.Errorf, which can be called from other
goroutines and will still fail the test.
6. `assertCurrentGaugeValue` wass breaking out of a for loop, which
would cause the `RWMutex.RUnlock` to be missed. Fixed by calling
unlock before `break`.
The core issue of a deadlock was fixed by https://github.com/armon/go-metrics/pull/124.
Previous getFromView would call view.Result when the result may not have been returned
(because the index is updated past the minIndex. This would allocate a slice and sort it
for no reason, because the values would never be returned.
Fix this by re-ordering the operations in getFromView.
The test changes in this commit were an attempt to cover the case where
an update is received but the index does not exceed the minIndex.
Also rename it to readEntry now that it doesn't return the entire entry. Based on feedback
in PR review, the full entry is not used by the caller, and accessing the fields wouldn't be
safe outside the lock, so it is safer to return only the Materializer
grpclog.SetLoggerV2 is meant to be called only once before any gRPC requests are received, but
each test that uses TestAgent will call NewBaseDeps again. Use a sync.Once to prevent the grpc
logging from being re-initialized by each test.
This will mean that a test can't use a fake logger to capture logs from the gRPC server.
These tests can flake when we get a notification for an earlier event.
Retry the read from update channel a few times to make sure we get the
event we expect.
Split the TestStreamingClient into the two logical components the real
client uses. This allows us to test multiple clients properly.
Previously writing of ctx from multiple Subscribe calls was showing a
data race.
Once this was fixed a test started to fail because the request had to be
made with a greater index, so that the store.Get call did not return
immediately.
The idleTTL was being written and read concurrently. Instead move the idleTTL to a struct
field so that when one test patches the TTL it does not impact others.
The background goroutines for the store can outlive a test because context cancellation
is async.