logos-storage-nim-cs-dist-t.../docs/RunningReleaseTests.md
Eric 13d453d5ed
chore: Docker updates to support release tests in logos-storage-nim, and remove Codex references (#124)
* ci(docker): build dist-tests images

* Update to .net 10, kubernetes client 18.0.13

Kubernetes client 18.0.13 is compatible with Kubernetes 1.34.x. The Kubernetes version is selected automatically by kubeadm in docker desktop (v1.34.1). See https://github.com/kubernetes-client/csharp#version-compatibility for a compatibility table.

* Updates to support Kubernetes upgrade

* bump openapi.yaml to match openapi.yaml in the logos-storage-nim docker image

* bump doc to .net 10

* bump docker to .net 10

* Build image with latest tag always

Always build an image with a latest tag (as well as a sha commit hash) when there's a push to master

* docker image tag as "latest" only when pushing to master

* Update docker image to install doctl

* Remove doctl install

kubeconfig is now created and uses a plain bearer token instead of using doctl as a credential mgr

* Rename and remove all instances of Codex

* Further remove CodexNetDeployer as it is no longer needed

---------

Co-authored-by: Adam Uhlíř <adam@uhlir.dev>
2026-04-17 15:03:22 +10:00

277 lines
10 KiB
Markdown

# Running Release Tests
This guide covers running the release tests both **locally** (against a local Kubernetes cluster) and **on Digital Ocean** (or any remote Kubernetes cluster).
---
## 1. Running Locally (Docker Desktop Kubernetes)
### Prerequisites
1. Install [.NET 10.0+](https://dotnet.microsoft.com/download). (If you install a newer version, update all `net10.0` references in the `.csproj` files to match.)
2. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/).
3. Enable Kubernetes in Docker Desktop: **Settings → Kubernetes → Enable Kubernetes (kubeadm) → Apply & Restart**. This may take a few minutes.
> **Note on Kubernetes client compatibility:** The `KubernetesClient` package version in the `KubernetesWorkflow` project [must be compatible](https://github.com/kubernetes-client/csharp#version-compatibility) with the Kubernetes version that kubeadm exposes. For example, if kubeadm exposes Kubernetes `1.34.1`, then `KubernetesClient` must be version `18.x`.
### How it works
When you run `dotnet test` from your machine, the framework detects it is running **outside** the cluster (by checking whether the `KUBERNETES_PORT` and `KUBERNETES_SERVICE_HOST` environment variables are set). In this mode it connects to the cluster via your local `~/.kube/config`, which Docker Desktop automatically configures.
Each test creates its own isolated namespace in the cluster, starts the required Storage nodes as pods, runs the test, then tears everything down.
### Verify the cluster is working
Before running tests, confirm Docker Desktop's Kubernetes context is active:
```bash
kubectl config current-context # should show "docker-desktop"
kubectl get nodes # should show a Ready node
```
If the current context is not `docker-desktop` (e.g. it points to a remote cluster), switch it:
```bash
kubectl config use-context docker-desktop
```
### Run the tests
Most IDEs let you run individual tests or test fixtures directly from the code file. To run from the command line:
```bash
cd /path/to/logos-storage-nim-cs-dist-tests
# Run all release tests
dotnet test Tests/LogosStorageReleaseTests
# Run a specific test by name
dotnet test Tests/LogosStorageReleaseTests --filter=OneClientTest
# Run with verbose output
dotnet test Tests/LogosStorageReleaseTests --logger="console;verbosity=detailed"
```
### Useful environment variables
| Variable | Default | Description |
|---|---|---|
| `STORAGEDOCKERIMAGE` | `logosstorage/logos-storage-nim:latest-dist-tests` | Storage node image to test |
| `KUBECONFIG` | `~/.kube/config` | Path to kubeconfig file (optional when using Docker Desktop) |
| `LOGPATH` | `LogosStorageTestLogs` (relative) | Directory for test logs |
| `DATAFILEPATH` | `TestDataFiles` (relative) | Directory for test data files |
| `ALWAYS_LOGS` | _(unset)_ | Set to any non-empty value to always download container logs (not just on failure) |
| `TEST_TYPE` | _(unset)_ | Set to `release-tests` only when running inside the cluster (see §2c). Do **not** set this for local runs — it activates long in-cluster timeouts. |
Example — run against a specific Storage image:
```bash
STORAGEDOCKERIMAGE=logosstorage/logos-storage-nim:v0.1.8 dotnet test Tests/LogosStorageReleaseTests
```
### Troubleshooting
**`NullReferenceException` at `DistTest..ctor()`**
If every test fails immediately with a stack trace ending at `DistTest.GetWebCallTimeSet()` or `DistTest.GetK8sTimeSet()`, check whether `TEST_TYPE` is set to `release-tests` in your shell:
```bash
echo $TEST_TYPE
```
If it is, unset it before running locally:
```bash
unset TEST_TYPE
dotnet test Tests/LogosStorageReleaseTests
```
Setting `TEST_TYPE=release-tests` triggers in-cluster detection which tries to log a message through an object that hasn't been constructed yet, causing the crash.
**Tests fail with `kubectl` errors / wrong cluster**
Ensure the active context is `docker-desktop`, not a remote cluster:
```bash
kubectl config current-context # confirm "docker-desktop"
kubectl config use-context docker-desktop # switch if needed
kubectl get nodes # should show a Ready node
```
**Image not found**
The `STORAGEDOCKERIMAGE` must be pullable from Docker Desktop. Either use a published image or build and push locally:
```bash
docker pull logosstorage/logos-storage-nim:latest-dist-tests
```
---
## 2. Running on Digital Ocean (Remote Kubernetes)
### Overview
On a remote cluster the test runner itself must run **inside** the cluster, because the framework needs direct pod-to-pod networking. The CI workflow does this automatically by creating a Kubernetes Job that runs the test runner image. You can also do it manually.
### Prerequisites
- A running Digital Ocean Kubernetes (DOKS) cluster
- `kubectl` configured to talk to that cluster
- The cluster must be pre-configured (see below)
### 2a. Cluster pre-configuration
Do these steps once per cluster.
**1. Label the worker nodes that will run the test runner pod**
```bash
kubectl label node <node-name> workload-type=tests-runners-ci
```
The job manifest uses a `nodeSelector` for this label, so at least one node must have it.
**2. Create the kubeconfig secret for the test runner**
The test runner pod itself needs a kubeconfig to manage pods inside the cluster. Use a static service-account-based kubeconfig — avoid copying your local `~/.kube/config` directly, as it likely uses an exec credential plugin (e.g. `doctl`) that won't be available inside the pod.
```bash
# Create a service account and grant it cluster-admin access
kubectl create serviceaccount dist-tests-app
kubectl create clusterrolebinding dist-tests-app \
--clusterrole=cluster-admin \
--serviceaccount=default:dist-tests-app
# Create a long-lived static token
kubectl create token dist-tests-app --duration=8760h > /tmp/sa-token.txt
# Build a static kubeconfig using the token
CLUSTER=$(kubectl config view --minify -o jsonpath='{.clusters[0].name}')
SERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')
kubectl config view --minify --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 --decode > /tmp/ca.crt
kubectl --kubeconfig=/tmp/static-kubeconfig.yaml config set-cluster "$CLUSTER" \
--server="$SERVER" \
--certificate-authority=/tmp/ca.crt \
--embed-certs=true
kubectl --kubeconfig=/tmp/static-kubeconfig.yaml config set-credentials dist-tests-app \
--token=$(cat /tmp/sa-token.txt)
kubectl --kubeconfig=/tmp/static-kubeconfig.yaml config set-context default \
--cluster="$CLUSTER" --user=dist-tests-app
kubectl --kubeconfig=/tmp/static-kubeconfig.yaml config use-context default
# Store it as a secret
kubectl create secret generic storage-dist-tests-app-kubeconfig \
--from-file=kubeconfig.yaml=/tmp/static-kubeconfig.yaml \
-n default
```
The pod mounts this at `/opt/kubeconfig.yaml` and passes it via `KUBECONFIG`.
**3. (If not already present) Create the `system-node-critical` priority class**
The job manifest requests `priorityClassName: system-node-critical`. On most clusters this exists already; check with:
```bash
kubectl get priorityclass system-node-critical
```
If missing, create it or change the priority class name in [docker/job-release-tests.yaml](../docker/job-release-tests.yaml).
### 2b. Running via GitHub Actions (recommended)
This is the standard automated path.
**Required GitHub secret:**
| Secret | Value |
|---|---|
| `KUBE_CONFIG` | Base64-encoded kubeconfig with permissions to create Jobs/Pods in the cluster |
Encode your kubeconfig:
```bash
base64 -i ~/.kube/do-cluster-config | pbcopy # macOS
base64 -w0 ~/.kube/do-cluster-config # Linux
```
**Trigger the workflow:**
Go to **Actions → Run Release Tests → Run workflow** and provide:
| Input | Description |
|---|---|
| `storagedockerimage` | Image to test, e.g. `logosstorage/logos-storage-nim:v0.1.8-dist-tests` |
Optional inputs (for non-default branches/repos):
| Input | Description |
|---|---|
| `source` | Repository URL (defaults to current repo) |
| `branch` | Branch to clone for running tests (defaults to current branch) |
**What happens:**
1. GitHub Actions decodes `KUBE_CONFIG` and runs `kubectl`
2. Creates a Kubernetes Job from [docker/job-release-tests.yaml](../docker/job-release-tests.yaml) with substituted variables
3. The Job runs the `logosstorage/logos-storage-dist-tests:latest` image
4. The entrypoint clones this repo, runs `dotnet test Tests/LogosStorageReleaseTests`
5. The workflow streams pod logs and fails if the Job does not complete successfully
### 2c. Running manually with kubectl
Useful for debugging or one-off runs without CI.
**Set the required variables:**
```bash
export NAMESPACE=default
export NAMEPREFIX=r-tests-manual
export RUNID=$(date +%Y%m%d-%H%M%S)
export TESTID=$(git rev-parse --short HEAD)
export TEST_TYPE=release-tests
export SOURCE=https://github.com/logos-storage/logos-storage-nim-cs-dist-tests.git
export BRANCH=master
export STORAGEDOCKERIMAGE=logosstorage/logos-storage-nim:latest-dist-tests
export COMMAND='["dotnet","test","Tests/LogosStorageReleaseTests"]'
```
**Apply the job:**
```bash
envsubst < docker/job-release-tests.yaml | kubectl apply -f -
```
**Follow the logs:**
```bash
# Wait for pod to start
kubectl get pod --selector job-name=$NAMEPREFIX -w
# Stream logs
POD=$(kubectl get pod --selector job-name=$NAMEPREFIX -o jsonpath='{.items[0].metadata.name}')
kubectl logs $POD -f
# Check final job status
kubectl get job $NAMEPREFIX -o jsonpath='{.status.conditions[0].type}'
# Should print "Complete"
```
**Cleanup:**
Jobs are auto-deleted after 24 hours (TTL configured in the manifest). To delete immediately:
```bash
kubectl delete job $NAMEPREFIX
```
### Key differences: local vs. remote
| | Local (Docker Desktop) | Remote (Digital Ocean) |
|---|---|---|
| Runner location | Your machine (external to cluster) | Inside a pod in the cluster |
| Kubeconfig | `~/.kube/config` (auto) | Mounted secret `storage-dist-tests-app-kubeconfig` |
| Network access to pods | Via `kubectl port-forward` / node IP | Direct pod-to-pod |
| `RUNNERLOCATION` detection | `ExternalToCluster` (automatic) | `InternalToCluster` (automatic inside pod) |
| How to run | `dotnet test` on your machine | Kubernetes Job |
| Image required | No (builds from source) | `logosstorage/logos-storage-dist-tests:latest` |