consul/test/integration/connect/envoy/windows-troubleshooting.md

91 lines
6.6 KiB
Markdown
Raw Permalink Normal View History

Envoy Integration Test Windows (#18007) * [CONSUL-395] Update check_hostport and Usage (#40) * [CONSUL-397] Copy envoy binary from Image (#41) * [CONSUL-382] Support openssl in unique test dockerfile (#43) * [CONSUL-405] Add bats to single container (#44) * [CONSUL-414] Run Prometheus Test Cases and Validate Changes (#46) * [CONSUL-410] Run Jaeger in Single container (#45) * [CONSUL-412] Run test-sds-server in single container (#48) * [CONSUL-408] Clean containers (#47) * [CONSUL-384] Rebase and sync fork (#50) * [CONSUL-415] Create Scenarios Troubleshooting Docs (#49) * [CONSUL-417] Update Docs Single Container (#51) * [CONSUL-428] Add Socat to single container (#54) * [CONSUL-424] Replace pkill in kill_envoy function (#52) * [CONSUL-434] Modify Docker run functions in Helper script (#53) * [CONSUL-435] Replace docker run in set_ttl_check_state & wait_for_agent_service_register functions (#55) * [CONSUL-438] Add netcat (nc) in the Single container Dockerfile (#56) * [CONSUL-429] Replace Docker run with Docker exec (#57) * [CONSUL-436] Curl timeout and run tests (#58) * [CONSUL-443] Create dogstatsd Function (#59) * [CONSUL-431] Update Docs Netcat (#60) * [CONSUL-439] Parse nc Command in function (#61) * [CONSUL-463] Review curl Exec and get_ca_root Func (#63) * [CONSUL-453] Docker hostname in Helper functions (#64) * [CONSUL-461] Test wipe volumes without extra cont (#66) * [CONSUL-454] Check ports in the Server and Agent containers (#65) * [CONSUL-441] Update windows dockerfile with version (#62) * [CONSUL-466] Review case-grpc Failing Test (#67) * [CONSUL-494] Review case-cfg-resolver-svc-failover (#68) * [CONSUL-496] Replace docker_wget & docker_curl (#69) * [CONSUL-499] Cleanup Scripts - Remove nanoserver (#70) * [CONSUL-500] Update Troubleshooting Docs (#72) * [CONSUL-502] Pull & Tag Envoy Windows Image (#73) * [CONSUL-504] Replace docker run in docker_consul (#76) * [CONSUL-505] Change admin_bind * [CONSUL-399] Update envoy to 1.23.1 (#78) * [CONSUL-510] Support case-wanfed-gw on Windows (#79) * [CONSUL-506] Update troubleshooting Documentation (#80) * [CONSUL-512] Review debug_dump_volumes Function (#81) * [CONSUL-514] Add zipkin to Docker Image (#82) * [CONSUL-515] Update Documentation (#83) * [CONSUL-529] Support case-consul-exec (#86) * [CONSUL-530] Update Documentation (#87) * [CONSUL-530] Update default consul version 1.13.3 * [CONSUL-539] Cleanup (#91) * [CONSUL-546] Scripts Clean-up (#92) * [CONSUL-491] Support admin_access_log_path value for Windows (#71) * [CONSUL-519] Implement mkfifo Alternative (#84) * [CONSUL-542] Create OS Specific Files for Envoy Package (#88) * [CONSUL-543] Create exec_supported.go (#89) * [CONSUL-544] Test and Build Changes (#90) * Implement os.DevNull * using mmap instead of disk files * fix import in exec-unix * fix nmap open too many arguemtn * go fmt on file * changelog file * fix go mod * Update .changelog/17694.txt Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> * different mmap library * fix bootstrap json * some fixes * chocolatey version fix and image fix * using different library * fix Map funciton call * fix mmap call * fix tcp dump * fix tcp dump * windows tcp dump * Fix docker run * fix tests * fix go mod * fix version 16.0 * fix version * fix version dev * sleep to debug * fix sleep * fix permission issue * fix permission issue * fix permission issue * fix command * fix command * fix funciton * fix assert config entry status command not found * fix command not found assert_cert_has_cn * fix command not found assert_upstream_missing * fix command not found assert_upstream_missing_once * fix command not found get_upstream_endpoint * fix command not found get_envoy_public_listener_once * fix command not found * fix test cases * windows integration test workflow github * made code similar to unix using npipe * fix go.mod * fix dialing of npipe * dont wait * check size of written json * fix undefined n * running * fix dep * fix syntax error * fix workflow file * windows runner * fix runner * fix from json * fix runs on * merge connect envoy * fix cin path * build * fix file name * fix file name * fix dev build * remove unwanted code * fix upload * fix bin name * fix path * checkout current branch * fix path * fix tests * fix shell bash for windows sh files * fix permission of run-test.sh * removed docker dev * added shell bash for tests * fix tag * fix win=true * fix cd * added dev * fix variable undefined * removed failing tests * fix tcp dump image * fix curl * fix curl * tcp dump path * fix tcpdump path * fix curl * fix curl install * stop removing intermediate containers * fix tcpdump docker image * revert -rm * --rm=false * makeing docker image before * fix tcpdump * removed case consul exec * removed terminating gateway simple * comment case wasm * removed data dog * comment out upload coverage * uncomment case-consul-exec * comment case consul exec * if always * logs * using consul 1.17.0 * fix quotes * revert quotes * redirect to dev null * Revert version * revert consul connect * fix version * removed envoy connect * not using function * change log * docker logs * fix logs * restructure bad authz * rmeoved dev null * output * fix file descriptor * fix cacert * fix cacert * fix ca cert * cacert does not work in windows curl * fix func * removed docker logs * added sleep * fix tls * commented case-consul-exec * removed echo * retry docker consul * fix upload bin * uncomment consul exec * copying consul.exe to docker image * copy fix * fix paths * fix path * github workspace path * latest version * Revert "latest version" This reverts commit 5a7d7b82d9e7553bcb01b02557ec8969f9deba1d. * commented consul exec * added ssl revoke best effort * revert best effort * removed unused files * rename var name and change dir * windows runner * permission * needs setup fix * swtich to github runner * fix file path * fix path * fix path * fix path * fix path * fix path * fix build paths * fix tag * nightly runs * added matrix in github workflow, renamed files * fix job * fix matrix * removed brackes * from json * without using job matrix * fix quotes * revert job matrix * fix workflow * fix comment * added comment * nightly runs * removed datadog ci as it is already measured in linux one * running test * Revert "running test" This reverts commit 7013d15a23732179d18ec5d17336e16b26fab5d4. * pr comment fixes * running test now * running subset of test * running subset of test * job matrix * shell bash * removed bash shell * linux machine for job matrix * fix output * added cat to debug * using ubuntu latest * fix job matrix * fix win true * fix go test * revert job matrix --------- Co-authored-by: Jose Ignacio Lorenzo <74208929+joselo85@users.noreply.github.com> Co-authored-by: Franco Bruno Lavayen <cocolavayen@gmail.com> Co-authored-by: Ivan K Berlot <ivanberlot@gmail.com> Co-authored-by: Ezequiel Fernández Ponce <20102608+ezfepo@users.noreply.github.com> Co-authored-by: joselo85 <joseignaciolorenzo85@gmail.com> Co-authored-by: Ezequiel Fernández Ponce <ezequiel.fernandez@southworks.com> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2023-07-21 14:56:00 +00:00
# Envoy Integration Tests on Windows
## Index
- [About this Guide](#about-this-guide)
- [Prerequisites](#prerequisites)
- [Running the Tests](#running-the-tests)
- [Troubleshooting](#troubleshooting)
- [About Envoy Integration Tests on Windows](#about-envoy-integration-tests-on-windows)
- [Common Errors](#common-errors)
- [Windows Scripts Changes](#windows-scripts-changes)
- [Volume Issues](#volume-issues)
## About this Guide
On this guide you will find all the information required to run the Envoy integration tests on Windows.
## Prerequisites
To run the integration tests yo will need to have the following installed on your System:
- GO v1.18(or later).
- Gotestsum library [installation](https://pkg.go.dev/gotest.tools/gotestsum).
- Docker.
Before running the tests, you will need to build the required Docker images, to do so, you can use the script provided [here](../../../../build-support-windows/build-images.sh):
- Build Images Script Execution
- From a Bash console (GitBash or WSL) execute: `./build-images.sh`
## Running the Tests
To execute the tests you need to run the following command depending on the shell you are using:
**On Powershell**:
`go test -v -timeout=30m -tags integration ./test/integration/connect/envoy -run="TestEnvoy/<TEST CASE>" -win=true`
Where **TEST CASE** is the individual test case we want to execute (e.g. case-badauthz).
**On Git Bash**:
`ENVOY_VERSION=<ENVOY VERSION> go test -v -timeout=30m -tags integration ./test/integration/connect/envoy -run="TestEnvoy/<TEST CASE>" -win=true`
Where **TEST CASE** is the individual test case we want to execute (e.g. case-badauthz), and **ENVOY VERSION** is the version which you are currently testing.
> [!TIP]
> When executing the integration tests using **Powershell** you may need to set the ENVOY_VERSION value manually in line 20 of the [run-tests.windows.sh](run-tests.windows.sh) file.
> [!WARNING]
> When executing the integration tests for Windows environments, the **End of Line Sequence** of every related file and/or script will be changed from **LF** to **CRLF**.
### About Envoy Integration Tests on Windows
Integration tests on Linux run a multi-container architecture that take advantage of the Host Network Docker feature, using this feature means that the container's network stack is not isolated from the Docker host (the container shares the hosts networking namespace), and the container does not get its own IP-address allocated (read more about this [here](https://docs.docker.com/network/host/)). This feature is only available for Linux, which made migrating the tests to Windows challenging, since replicating the same architecture created more issues, that's why a **single container** architecture was chosen to run the Envoy integration tests.
Using a single container architecture meant that we could use the same tests as on linux, moreover we were able to speed-up their execution by replacing *docker run* commands which started utility containers, for *docker exec* commands.
### Common errors
If the tests are executed without docker running, the following error will be seen:
```powershell
error during connect: This error may indicate that the docker daemon is not running.: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile-bats-windows&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&shmsize=0&t=bats-verify&target=&ulimits=null&version=1": open //./pipe/docker_engine: The system cannot find the file specified.
```
If any of the docker images does not exist or is mistagged, an error similar to the following will be displayed:
```powershell
Error response from daemon: No such container: envoy_workdir_1
```
If you run the Windows tests from WSL you will get the following error message:
```bash
main_test.go:34: command failed: exec: "cmd": executable file not found in $PATH
```
## Windows Scripts Changes
- The "http-addr", "grpc-addr" and "admin-access-log-path" flags were added to the creation of the Envoy Bootstrap files.
- To execute commands sh was replaced by bash on our Windows container.
- All paths were updated to use Windows format.
- Created *stop_and_copy_files* function to copy files into the shared volume (see [volume issues](#volume-issues)).
- Changed the *-admin-bind* value from `0.0.0.0` to `127.0.0.1` when generating the Envoy Bootstrap files.
- Removed the *&&* from the *common_run_container_service's* docker exec command and replaced it with *\*.
- Removed *docker_wget* and *docker_curl* functions from [helpers.windows.bash](helpers.windows.bash) file and replaced them with **docker_consul_exec**, this way we avoid starting intermediate containers when capturing logs.
- The function *wipe_volumes* uses a `docker exec` command instead of the original `docker run`, this way we speed up test execution by avoiding to start a new container just to delete volume content before each test run.
- For **case-grpc** we increased the `envoy_stats_flush_interval` value from 1s to 5s, on Windows, the original value caused the test to pass or fail randomly.
- For **case-wanfed-gw** a new script was created: **global-setup-windows.sh**, this file replaces global-setup.sh when running this test in Windows. The new script uses the windows/consul:local Docker image to generate the required TLS files and copies them into host's workdir directory.
- To use the **debug_dump_volumes** function, you need to use it via Powershell and execute the following command: `bash run-tests.windows.sh debug_dump_volumes` Make sure to be positioned with your terminal in the correct directory.
- For **case-consul-exec** this case can only be run when using the consul-dev Docker image on this repository, since it relies on features implemented only here. These features are: Windows valid default value for "-admin-access-log-path" and `consul connect envoy` command starts Envoy. This features have also been submitted in [PR#15114](https://github.com/hashicorp/consul/pull/15114).
## Volume Issues
Another difference that arose when migrating the tests from Linux to Windows, is that file system operations can't be executed while Windows containers are running. Currently, when running the tests a **named volume** is created and all of the required files are copied into that volume. Because of the constraint mentioned before, the workaround we implemented was creating a function (**stop_and_copy_files**) that stops the *kubernetes/pause* container and executes a script to copy the required files and finally starts the container again.