Note that this does NOT upgrade to xDS v3. That will come in a future PR.
Additionally:
- Ignored staticcheck warnings about how github.com/golang/protobuf is deprecated.
- Shuffled some agent/xds imports in advance of a later xDS v3 upgrade.
- Remove support for envoy 1.13.x but don't add in 1.17.x yet. We have to wait until the xDS v3 support is added in a follow-up PR.
Fixes#8425
* Allow setting the mesh gateway mode for an upstream in config files
* Add envoy integration test for mesh gateways
This necessitated many supporting changes in most of the other test cases.
Add remote mode mesh gateways integration test
Additionally:
- wait for bootstrap config entries to be applied
- run the verify container in the host's PID namespace so we can kill
envoys without mounting the docker socket
* assert that we actually send HEALTHY and UNHEALTHY endpoints down in EDS during failover
When the envoy healthy panic threshold was explicitly disabled as part
of L7 traffic management it changed how envoy decided to load balance to
endpoints in a cluster. This only matters when envoy is in "panic mode"
aka "when you have a bunch of unhealthy endpoints". Panic mode sends
traffic to unhealthy instances in certain circumstances.
Note: Prior to explicitly disabling the healthy panic threshold, the
default value is 50%.
What was happening is that the test harness was bringing up consul the
sidecars, and the service instances all at once and sometimes the
proxies wouldn't have time to be checked by consul to be labeled as
'passing' in the catalog before a round of EDS happened.
The xDS server in consul effectively queries /v1/health/connect/s2 and
gets 1 result, but that one result has a 'critical' check so the xDS
server sends back that endpoint labeled as UNHEALTHY.
Envoy sees that 100% of the endpoints in the cluster are unhealthy and
would enter panic mode and still send traffic to s2. This is why the
test suites PRIOR to disabling the healthy panic threshold worked. They
were _incorrectly_ passing.
When the healthy panic threshol is disabled, envoy never enters panic
mode in this situation and thus the cluster has zero healthy endpoints
so load balancing goes nowhere and the tests fail.
Why does this only affect the test suites for envoy 1.8.0? My guess is
that https://github.com/envoyproxy/envoy/pull/4442 was merged into the
1.9.x series and somehow that plays a role.
This PR modifies the bats scripts to explicitly wait until the upstream
sidecar is healthy as measured by /v1/health/connect/s2?passing BEFORE
trying to interrogate envoy which should make the tests less racy.
* Make exec test assert Envoy version - it was not rebuilding before and so often ran against wrong version. This makes 1.10 fail consistenty.
* Switch Envoy exec to use a named pipe rather than FD magic since Envoy 1.10 doesn't support that.
* Refactor to use an internal shim command for piping the bootstrap through.
* Fmt. So sad that vscode golang fails so often these days.
* go mod tidy
* revert go mod tidy changes
* Revert "ignore consul-exec tests until fixed (#5986)"
This reverts commit 683262a686.
* Review cleanups
* Add support for HTTP proxy listeners
* Add customizable bootstrap configuration options
* Debug logging for xDS AuthZ
* Add Envoy Integration test suite with basic test coverage
* Add envoy command tests to cover new cases
* Add tracing integration test
* Add gRPC support WIP
* Merged changes from master Docker. get CI integration to work with same Dockerfile now
* Make docker build optional for integration
* Enable integration tests again!
* http2 and grpc integration tests and fixes
* Fix up command config tests
* Store all container logs as artifacts in circle on fail
* Add retries to outer part of stats measurements as we keep missing them in CI
* Only dump logs on failing cases
* Fix typos from code review
* Review tidying and make tests pass again
* Add debug logs to exec test.
* Fix legit test failure caused by upstream rename in envoy config
* Attempt to reduce cases of bad TLS handshake in CI integration tests
* bring up the right service
* Add prometheus integration test
* Add test for denied AuthZ both HTTP and TCP
* Try ANSI term for Circle