Now, STORAGEDOCKERIMAGE is:
- logosstorage/logos-storage-nim:latest-dist-tests for workflow_dispatch on a branch
- logosstorage/logos-storage-nim:v0.1.8-dist-tests for a v0.1.8 tag push
- Configure runners-ci node pool inline in the cluster resource instead
of using remove_default_node_pool=true, eliminating the
provision-then-delete cycle that added ~5 min to terraform apply
- Remove the separate infra pool; runners-ci is now the only pool on
the critical path of cluster creation
- Set tests-pods pool min_node_count=0 so no node is provisioned at
apply time — nodes scale up only when test pods are scheduled
- Enable spot instances on the tests-pods pool for ~60-91% cost saving
- Add 60 min job timeout to release-tests to bound hung cluster cost
- Add Terraform plugin cache keyed on the lock file to skip provider
re-downloads on subsequent runs (~30-60s saved)
- Install gke-gcloud-auth-plugin via setup-gcloud to fix kubectl auth
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch cluster and all node pools from regional to zonal (`europe-west4-b`) to avoid the 40+ minute provisioning time of a regional (multi-zone) cluster. Adds a `zone` variable to the GKE module and cluster config, and updates the workflow's `gcloud get-credentials` call to use `--zone` instead of `--region`.
Replace all "Codex" branding in the release test workflow and supporting
files: rename the K8s cluster, Terraform state key, secret, log paths,
env var (CODEXDOCKERIMAGE → STORAGEDOCKERIMAGE), and test runner image
(cs-codex-dist-tests → logos-storage-dist-tests) to align with the
already-updated logos-storage-nim-cs-dist-tests repo in https://github.com/logos-storage/logos-storage-nim-cs-dist-tests/pull/124. Also fix the
dotnet test path to the correct Tests/LogosStorageReleaseTests directory.
Replace the use of doctl as a credential manager for executing k8s calls with a freshly created bearer token (expires after 2h). Avoids passing a DO personal access token to the cs-dist-tests runner pod.
The change to the cs-dist-tests image name was to test if installing doctl to the image would fix the release tests not being able to authenticate into the cluster. This is mainly due to the kubeconfig being generated and stored in a DO secret, as opposed to a static kubeconfig for a permanent cluster as before.
IMPORTANT:
The image tag should be changed back to 'latest'!
- remove RUNNER_IMAGE because the cs-dist-tests image is dumb -- it clones the cs-dist-tests repo, checkouts the branch in BRANCH and then runs the release tests. So instead, always use the :latest image (which is built when there are commits to master)
- add the BRANCH workflow input so you can test cs-dist-test changes in the runner if needed
- remove COMMAND arg, it's always going to be 'dotnet test Tests/CodexReleaseTests'
- remove NAMESPACE env variable and just use 'default'. The cluster is ephermal and so all resources deployed are for the release tests, no namespaces needed.
By default, the logosstorage/logos-storage-nim-cs-dist-tests:latest image will be used for the test runner in the release tests. However, if developers want to run the release tests and test changes to the runner (eg changes in the logos-storage-nim-cs-dist-tests repo), they can push their changes to a branch and manually run the `docker-runner` workflow in the logos-storage-nim-cs-dist-tests repo. This will create an image like logosstorage/logos-storage-nim-cs-dist-tests:sha-c0465a5. This image can then be used as a release tests workflow input for 'cs-dist-tests runner image'
Adds a workflow for release tests:
- builds a docker image for launching nodes in the tests (basically has additional nimflags set)
- creates a K8s cluster in Digital Ocean
- one pod in the cluster is dedicated as the test runner (uses the logos-storage-nim-cs-dist-tests:latest image)
- the release will fail if the docker image build or the release tests fail
- the K8s cluster is torn down after the tests finish (failure or not)
Because on MacOS Nix does not have proper filesystem namespacing like on
linux the /tmp filesystem is the real /tmp on the machine.
This can cause permission issues when different Nix build users creat it:
> cannot create directory: /tmp/nim/koch_d
Signed-off-by: Jakub Sokołowski <jakub@status.im>