Ensure that the peer stream replication rpc can successfully be used with TLS activated.
Also:
- If key material is configured for the gRPC port but HTTPS is not
enabled now TLS will still be activated for the gRPC port.
- peerstream replication stream opened by the establishing-side will now
ignore grpc.WithBlock so that TLS errors will bubble up instead of
being awkwardly delayed or suppressed
Peer replication is intended to be between separate Consul installs and
effectively should be considered "external". This PR moves the peer
stream replication bidirectional RPC endpoint to the external gRPC
server and ensures that things continue to function.
Require use of mesh gateways in order for service mesh data plane
traffic to flow between peers.
This also adds plumbing for envoy integration tests involving peers, and
one starter peering test.
* tidy code and add some doc strings
* add doc strings to tests
* add partitions tests, need to adapt to run in both oss and ent
* split oss and enterprise versions
* remove parallel tests
* add error
* fix queryBackend in test
* revert unneeded change
* fix failing tests
* add a sample
* Consul cluster test
* add build dockerfile
* add tests to cover mixed versions tests
* use flag to pass docker image name
* remove default config and rely on flags to inject the right image to test
* add cluster abstraction
* fix imports and remove old files
* fix imports and remove old files
* fix dockerIgnore
* make a `Node interface` and encapsulate ConsulContainer
* fix a test bug where we only check the leader against a single node.
* add upgrade tests to CI
* fix yaml alignment
* fix alignment take 2
* fix flag naming
* fix image to build
* fix test run and go mod tidy
* add a debug command
* run without RYUK
* fix parallel run
* add skip reaper code
* make tempdir in local dir
* chmod the temp dir to 0777
* chmod the right dir name
* change executor to use machine instead of docker
* add docker layer caching
* remove setup docker
* add gotestsum
* install go version
* use variable for GO installed version
* add environment
* add environment in the right place
* do not disable RYUK in CI
* add service check to tests
* assertions outside routines
* add queryBackend to the api query meta.
* check if we are using the right backend for those tests (streaming)
* change the tested endpoint to use one that have streaming.
* refactor to test multiple scenarios for streaming
* Fix dockerfile
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
* rename Clients to clients
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
* check if cluster have 0 node
* tidy code and add some doc strings
* use uuid instead of random string
* add doc strings to tests
* add queryBackend to the api query meta.
* add a changelog
* fix for api backend query
* add missing require
* fix q.QueryBackend
* Revert "fix q.QueryBackend"
This reverts commit cd0e5f7b1a1730e191673d624f8e89b591871c05.
* fix circle ci config
* tidy go mod after merging main
* rename package and fix test scenario
* update go download url
* address review comments
* rename flag in CI
* add readme to the upgrade tests
* fix golang download url
* fix golang arch downloaded
* fix AddNodes to handle an empty cluster case
* use `parseBool`
* rename circle job and add comment
* update testcontainer to 0.13
* fix circle ci config
* remove build docker file and use `make dev-docker` instead
* Apply suggestions from code review
Co-authored-by: Dan Upton <daniel@floppy.co>
* fix a typo
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
* Add partition fields to targets like service route destinations
* Update validation to prevent cross-DC + cross-partition references
* Handle partitions when reading config entries for disco chain
* Encode partition in compiled targets
The secondary DC now takes longer to populate the MGW snapshot because
it needs to wait for the secondary CA to be initialized before it can
receive roots and generate xDS config.
Previously MGWs could receive empty roots before the CA was
initialized. This wasn't necessarily a problem since the cluster ID in
the trust domain isn't verified.
We launch one container as part of the test with --pid=host but
apparently within that container it launches a copy of "tini" as a
process supervisor that prefers to be PID 1.
Because it's not PID 1 it logs a warning message about this to the envoy
integration test logs that can lead to thinking somehow that a test
failure is related when in fact it's completely unrelated.
Adding this environment variable avoids the warning.
The only thing that needed fixing up pertained to this section of the 1.18.x release notes:
> grpc_stats: the default value for stats_for_all_methods is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default.
For now to maintain status-quo I'm explicitly setting `stats_for_all_methods=true` in all versions to avoid relying upon the default.
Additionally the naming of the emitted metrics for these gRPC requests changed slightly so the integration test assertions for `case-grpc` needed adjusting.
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.
Union of all commit messages follows to give an overarching summary:
xds: exclusively support incremental xDS when using xDS v3
Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support
Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit
xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings
In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.
This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.
xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
Note that this does NOT upgrade to xDS v3. That will come in a future PR.
Additionally:
- Ignored staticcheck warnings about how github.com/golang/protobuf is deprecated.
- Shuffled some agent/xds imports in advance of a later xDS v3 upgrade.
- Remove support for envoy 1.13.x but don't add in 1.17.x yet. We have to wait until the xDS v3 support is added in a follow-up PR.
Fixes#8425
To fix failing integration tests. The latest version (`1.7.4.0-r0`)
appears to not be catting all the bytes, so the expected metrics are
missing in the output.
This PR updates the tags that we generate for Envoy stats.
Several of these come with breaking changes, since we can't keep two stats prefixes for a filter.
This has the biggest impact on enterprise test cases that use namespaced
registrations, which prior to this change sometimes failed the initial
registration because the namespace was not yet created.
Added a new option `ui_config.metrics_proxy.path_allowlist`. This defaults to `["/api/v1/query", "/api/v1/query_range"]` when the metrics provider is set to `prometheus`.
Requests that do not use one of the allow-listed paths (via exact match) get a 403 Forbidden response instead.
The key thing here is to use `curl --no-keepalive` so that envoy
pre-1.15 tests will reliably use the latest listener every time.
Extra:
- Switched away from editing line-item intentions the legacy way.
- Removed some teardown scripts, as we don't share anything between cases anyway
- Removed unnecessary use of `run` in some places.
There is a delay between an intentions change being made, and it being
reflected in the Envoy runtime configuration. Now that the enforcement
happens inside of Envoy instead of over in the agent, our tests need to
explicitly wait until the xDS reconfiguration is complete before
attempting to assert intentions worked.
Also remove a few double retry loops.
This speeds up individual envoy integration test runs from ~23m to ~14m.
It's also a pre-req for possibly switching to doing the tests entirely within Go (no shell-outs).
Extend Consul’s intentions model to allow for request-based access control enforcement for HTTP-like protocols in addition to the existing connection-based enforcement for unspecified protocols (e.g. tcp).
Related changes:
- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
A port can be sent in the Host header as defined in the HTTP RFC, so we
take any hosts that we want to match traffic to and also add another
host with the listener port added.
Also fix an issue with envoy integration tests not running the
case-ingress-gateway-tls test.
Previously, we did not require the 'service-name.*' host header value
when on a single http service was exposed. However, this allows a user
to get into a situation where, if they add another service to the
listener, suddenly the previous service's traffic might not be routed
correctly. Thus, we always require the Host header, even if there is
only 1 service.
Also, we add the make the default domain matching more restrictive by
matching "service-name.ingress.*" by default. This lines up better with
the namespace case and more accurately matches the Consul DNS value we
expect people to use in this case.
The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option.
If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS.
Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.
The previous change, which moved test running to Go, appears to have
broken log capturing. I am not entirely sure why, but the run_tests
function seems to exit on the first error.
This change moves test teardown and log capturing out of run_test, and
has the go test runner call them when necessary.