mirror of
https://github.com/logos-blockchain/logos-blockchain-testing.git
synced 2026-01-29 02:23:08 +00:00
526 lines
22 KiB
Markdown
526 lines
22 KiB
Markdown
# Operations
|
||
|
||
Operational readiness focuses on prerequisites, environment fit, and clear
|
||
signals:
|
||
|
||
- **Prerequisites**:
|
||
- **`versions.env` file** at repository root (required by helper scripts; defines VERSION, NOMOS_NODE_REV, NOMOS_BUNDLE_VERSION)
|
||
- Keep a sibling `nomos-node` checkout available, or use `scripts/run-examples.sh` which clones/builds on demand
|
||
- Ensure the chosen runner's platform needs are met (Docker for compose, cluster access for k8s)
|
||
- CI uses prebuilt binary artifacts from the `build-binaries` workflow
|
||
- **Artifacts**: DA scenarios require KZG parameters (circuit assets) located at
|
||
`testing-framework/assets/stack/kzgrs_test_params`. Fetch them via
|
||
`scripts/setup-nomos-circuits.sh` or override the path with `NOMOS_KZGRS_PARAMS_PATH`.
|
||
- **Environment flags**: `POL_PROOF_DEV_MODE=true` is **required for all runners**
|
||
(local, compose, k8s) unless you want expensive Groth16 proof generation that
|
||
will cause tests to timeout. Configure logging via `NOMOS_LOG_DIR`, `NOMOS_LOG_LEVEL`,
|
||
and `NOMOS_LOG_FILTER` (see [Logging and Observability](#logging-and-observability)
|
||
for details). Note that nodes ignore `RUST_LOG` and only respond to `NOMOS_*` variables.
|
||
- **Readiness checks**: verify runners report node readiness before starting
|
||
workloads; this avoids false negatives from starting too early.
|
||
- **Failure triage**: map failures to missing prerequisites (wallet seeding,
|
||
node control availability), runner platform issues, or unmet expectations.
|
||
Start with liveness signals, then dive into workload-specific assertions.
|
||
|
||
Treat operational hygiene—assets present, prerequisites satisfied, observability
|
||
reachable—as the first step to reliable scenario outcomes.
|
||
|
||
## CI Usage
|
||
|
||
Both **LocalDeployer** and **ComposeDeployer** work in CI environments:
|
||
|
||
**LocalDeployer in CI:**
|
||
- Faster (no Docker overhead)
|
||
- Good for quick smoke tests
|
||
- **Trade-off:** Less isolation (processes share host)
|
||
|
||
**ComposeDeployer in CI (recommended):**
|
||
- Better isolation (containerized)
|
||
- Reproducible environment
|
||
- Includes Prometheus/observability
|
||
- **Trade-off:** Slower startup (Docker image build)
|
||
- **Trade-off:** Requires Docker daemon
|
||
|
||
See `.github/workflows/lint.yml` (jobs: `host_smoke`, `compose_smoke`) for CI examples running the demo scenarios.
|
||
|
||
## Running Examples
|
||
|
||
The framework provides three runner modes: **host** (local processes), **compose** (Docker Compose), and **k8s** (Kubernetes).
|
||
|
||
**Recommended:** Use `scripts/run-examples.sh` for all modes:
|
||
|
||
```bash
|
||
# Host mode (local processes)
|
||
scripts/run-examples.sh -t 60 -v 1 -e 1 host
|
||
|
||
# Compose mode (Docker Compose)
|
||
scripts/run-examples.sh -t 60 -v 1 -e 1 compose
|
||
|
||
# K8s mode (Kubernetes)
|
||
scripts/run-examples.sh -t 60 -v 1 -e 1 k8s
|
||
```
|
||
|
||
This script handles circuit setup, binary building/bundling, image building, and execution.
|
||
|
||
**Environment overrides:**
|
||
- `VERSION=v0.3.1` — Circuit version
|
||
- `NOMOS_NODE_REV=<commit>` — nomos-node git revision
|
||
- `NOMOS_BINARIES_TAR=path/to/bundle.tar.gz` — Use prebuilt bundle
|
||
- `NOMOS_SKIP_IMAGE_BUILD=1` — Skip image rebuild (compose/k8s)
|
||
- `NOMOS_BUNDLE_DOCKER_PLATFORM=linux/arm64|linux/amd64` — Docker platform used when building a Linux bundle on non-Linux hosts (macOS/Windows)
|
||
- `COMPOSE_CIRCUITS_PLATFORM=linux-aarch64|linux-x86_64` — Circuits platform used when building the compose/k8s image (defaults based on host arch)
|
||
- `SLOW_TEST_ENV=true` — Doubles built-in readiness timeouts (useful in slower CI / constrained laptops)
|
||
- `TESTNET_PRINT_ENDPOINTS=1` — Print `TESTNET_ENDPOINTS` / `TESTNET_PPROF` lines during deploy (set automatically by `scripts/run-examples.sh`)
|
||
- `COMPOSE_RUNNER_HTTP_TIMEOUT_SECS=<secs>` — Override compose node HTTP readiness timeout
|
||
- `K8S_RUNNER_DEPLOYMENT_TIMEOUT_SECS=<secs>` — Override k8s deployment readiness timeout
|
||
- `K8S_RUNNER_HTTP_TIMEOUT_SECS=<secs>` — Override k8s HTTP readiness timeout for port-forwards
|
||
- `K8S_RUNNER_HTTP_PROBE_TIMEOUT_SECS=<secs>` — Override k8s HTTP readiness timeout for NodePort probes
|
||
- `K8S_RUNNER_PROMETHEUS_HTTP_TIMEOUT_SECS=<secs>` — Override k8s Prometheus readiness timeout
|
||
- `K8S_RUNNER_PROMETHEUS_HTTP_PROBE_TIMEOUT_SECS=<secs>` — Override k8s Prometheus NodePort probe timeout
|
||
|
||
### Cleanup Helper
|
||
|
||
If you hit Docker build failures, mysterious I/O errors, or are running out of disk space:
|
||
|
||
```bash
|
||
scripts/clean
|
||
```
|
||
|
||
For extra Docker cache cleanup:
|
||
|
||
```bash
|
||
scripts/clean --docker
|
||
```
|
||
|
||
### Host Runner (Direct Cargo Run)
|
||
|
||
For manual control, you can run the `local_runner` binary directly:
|
||
|
||
```bash
|
||
POL_PROOF_DEV_MODE=true \
|
||
NOMOS_NODE_BIN=/path/to/nomos-node \
|
||
NOMOS_EXECUTOR_BIN=/path/to/nomos-executor \
|
||
cargo run -p runner-examples --bin local_runner
|
||
```
|
||
|
||
**Environment variables:**
|
||
- `NOMOS_DEMO_VALIDATORS=3` — Number of validators (default: 1, or use legacy `LOCAL_DEMO_VALIDATORS`)
|
||
- `NOMOS_DEMO_EXECUTORS=2` — Number of executors (default: 1, or use legacy `LOCAL_DEMO_EXECUTORS`)
|
||
- `NOMOS_DEMO_RUN_SECS=120` — Run duration in seconds (default: 60, or use legacy `LOCAL_DEMO_RUN_SECS`)
|
||
- `NOMOS_NODE_BIN` / `NOMOS_EXECUTOR_BIN` — Paths to binaries (required for direct run)
|
||
- `NOMOS_LOG_DIR=/tmp/logs` — Directory for per-node log files (works across runners)
|
||
- `NOMOS_TESTS_KEEP_LOGS=1` — Keep per-run temporary directories (useful for debugging/CI artifacts)
|
||
- `NOMOS_TESTS_TRACING=true` — Enable the debug tracing preset (optional; combine with `NOMOS_LOG_DIR` unless you have external tracing backends configured)
|
||
- `NOMOS_LOG_LEVEL=debug` — Set log level (default: info)
|
||
- `NOMOS_LOG_FILTER=consensus=trace,da=debug` — Fine-grained module filtering
|
||
|
||
**Note:** Requires circuit assets and host binaries. Use `scripts/run-examples.sh host` to handle setup automatically.
|
||
|
||
### Compose Runner (Direct Cargo Run)
|
||
|
||
For manual control, you can run the `compose_runner` binary directly. Compose requires a Docker image with embedded assets.
|
||
|
||
**Recommended setup:** Use a prebuilt bundle:
|
||
|
||
```bash
|
||
# Build a Linux bundle (includes binaries + circuits)
|
||
scripts/build-bundle.sh --platform linux
|
||
# Creates .tmp/nomos-binaries-linux-v0.3.1.tar.gz
|
||
|
||
# Build image (embeds bundle assets)
|
||
export NOMOS_BINARIES_TAR=.tmp/nomos-binaries-linux-v0.3.1.tar.gz
|
||
testing-framework/assets/stack/scripts/build_test_image.sh
|
||
|
||
# Run
|
||
NOMOS_TESTNET_IMAGE=logos-blockchain-testing:local \
|
||
POL_PROOF_DEV_MODE=true \
|
||
cargo run -p runner-examples --bin compose_runner
|
||
```
|
||
|
||
**Platform note (macOS / Apple silicon):**
|
||
- Docker Desktop runs a `linux/arm64` engine. If Linux bundle builds are slow/unstable when producing `.tmp/nomos-binaries-linux-*.tar.gz`, prefer `NOMOS_BUNDLE_DOCKER_PLATFORM=linux/arm64` for local compose/k8s runs.
|
||
- If you need amd64 images/binaries specifically (e.g., deploying to amd64-only environments), set `NOMOS_BUNDLE_DOCKER_PLATFORM=linux/amd64` and expect slower builds via emulation.
|
||
|
||
**Alternative:** Manual circuit/image setup (rebuilds during image build):
|
||
|
||
```bash
|
||
# Fetch and copy circuits
|
||
scripts/setup-nomos-circuits.sh v0.3.1 /tmp/nomos-circuits
|
||
cp -r /tmp/nomos-circuits/* testing-framework/assets/stack/kzgrs_test_params/
|
||
|
||
# Build image
|
||
testing-framework/assets/stack/scripts/build_test_image.sh
|
||
|
||
# Run
|
||
NOMOS_TESTNET_IMAGE=logos-blockchain-testing:local \
|
||
POL_PROOF_DEV_MODE=true \
|
||
cargo run -p runner-examples --bin compose_runner
|
||
```
|
||
|
||
**Environment variables:**
|
||
- `NOMOS_TESTNET_IMAGE=logos-blockchain-testing:local` — Image tag (required, must match built image)
|
||
- `POL_PROOF_DEV_MODE=true` — **Required** for all runners
|
||
- `NOMOS_DEMO_VALIDATORS=3` / `NOMOS_DEMO_EXECUTORS=2` / `NOMOS_DEMO_RUN_SECS=120` — Topology overrides
|
||
- `COMPOSE_NODE_PAIRS=1x1` — Alternative topology format: "validators×executors"
|
||
- `TEST_FRAMEWORK_PROMETHEUS_PORT=9091` — Override Prometheus port (default: 9090)
|
||
- `COMPOSE_RUNNER_HOST=127.0.0.1` — Host address for port mappings
|
||
- `COMPOSE_RUNNER_PRESERVE=1` — Keep containers running after test
|
||
- `NOMOS_LOG_DIR=/tmp/compose-logs` — Write logs to files inside containers
|
||
|
||
**Compose-specific features:**
|
||
- **Node control support**: Only runner that supports chaos testing (`.enable_node_control()` + chaos workloads)
|
||
- **Prometheus observability**: Metrics at `http://localhost:9090`
|
||
|
||
**Important:**
|
||
- Containers expect KZG parameters at `/kzgrs_test_params/kzgrs_test_params` (note the repeated filename)
|
||
- Use `scripts/run-examples.sh compose` to handle all setup automatically
|
||
|
||
### K8s Runner (Direct Cargo Run)
|
||
|
||
For manual control, you can run the `k8s_runner` binary directly. K8s requires the same image setup as Compose.
|
||
|
||
**Prerequisites:**
|
||
1. **Kubernetes cluster** with `kubectl` configured
|
||
2. **Test image built** (same as Compose, preferably with prebuilt bundle)
|
||
3. **Image available in cluster** (loaded or pushed to registry)
|
||
|
||
**Build and load image:**
|
||
```bash
|
||
# Build image with bundle (recommended)
|
||
scripts/build-bundle.sh --platform linux
|
||
export NOMOS_BINARIES_TAR=.tmp/nomos-binaries-linux-v0.3.1.tar.gz
|
||
testing-framework/assets/stack/scripts/build_test_image.sh
|
||
|
||
# Load into cluster
|
||
export NOMOS_TESTNET_IMAGE=logos-blockchain-testing:local
|
||
kind load docker-image logos-blockchain-testing:local # For kind
|
||
# OR: minikube image load logos-blockchain-testing:local # For minikube
|
||
# OR: docker push your-registry/logos-blockchain-testing:local # For remote
|
||
```
|
||
|
||
**Run the example:**
|
||
```bash
|
||
export NOMOS_TESTNET_IMAGE=logos-blockchain-testing:local
|
||
export POL_PROOF_DEV_MODE=true
|
||
cargo run -p runner-examples --bin k8s_runner
|
||
```
|
||
|
||
**Environment variables:**
|
||
- `NOMOS_TESTNET_IMAGE` — Image tag (required)
|
||
- `POL_PROOF_DEV_MODE=true` — **Required** for all runners
|
||
- `NOMOS_DEMO_VALIDATORS` / `NOMOS_DEMO_EXECUTORS` / `NOMOS_DEMO_RUN_SECS` — Topology overrides
|
||
|
||
**Important:**
|
||
- K8s runner mounts `testing-framework/assets/stack/kzgrs_test_params` as a hostPath volume with file `/kzgrs_test_params/kzgrs_test_params` inside pods
|
||
- **No node control support yet**: Chaos workloads (`.enable_node_control()`) will fail
|
||
- Use `scripts/run-examples.sh k8s` to handle all setup automatically
|
||
|
||
## Circuit Assets (KZG Parameters)
|
||
|
||
DA workloads require KZG cryptographic parameters for polynomial commitment schemes.
|
||
|
||
### Asset Location
|
||
|
||
**Default path:** `testing-framework/assets/stack/kzgrs_test_params/kzgrs_test_params`
|
||
|
||
Note the repeated filename: the directory `kzgrs_test_params/` contains a file named `kzgrs_test_params`. This is the actual proving key file.
|
||
|
||
**Container path** (compose/k8s): `/kzgrs_test_params/kzgrs_test_params`
|
||
|
||
**Override:** Set `NOMOS_KZGRS_PARAMS_PATH` to use a custom location (must point to the file):
|
||
```bash
|
||
NOMOS_KZGRS_PARAMS_PATH=/path/to/custom/params cargo run -p runner-examples --bin local_runner
|
||
```
|
||
|
||
### Directory vs File (KZG)
|
||
|
||
The system uses KZG assets in two distinct ways:
|
||
|
||
| Concept | Used by | Meaning |
|
||
|--------|---------|---------|
|
||
| **KZG directory** | deployers/scripts | A directory that contains the KZG file (and related artifacts). Defaults to `testing-framework/assets/stack/kzgrs_test_params` and is controlled by `NOMOS_KZG_DIR_REL` (relative to the workspace root). |
|
||
| **KZG file path** | node processes | A single file path passed to nodes via `NOMOS_KZGRS_PARAMS_PATH` (inside containers/pods this is typically `/kzgrs_test_params/kzgrs_test_params`). |
|
||
|
||
### Getting Circuit Assets
|
||
|
||
**Option 1: Use helper script** (recommended):
|
||
```bash
|
||
# From the repository root
|
||
chmod +x scripts/setup-nomos-circuits.sh
|
||
scripts/setup-nomos-circuits.sh v0.3.1 /tmp/nomos-circuits
|
||
|
||
# Copy to default location
|
||
cp -r /tmp/nomos-circuits/* testing-framework/assets/stack/kzgrs_test_params/
|
||
```
|
||
|
||
**Option 2: Build locally** (advanced):
|
||
```bash
|
||
# This repository does not provide a `make kzgrs_test_params` target.
|
||
# If you need to regenerate KZG params from source, follow upstream tooling
|
||
# instructions (unspecified here) or use the helper scripts above to fetch a
|
||
# known-good bundle.
|
||
```
|
||
|
||
### CI Workflow
|
||
|
||
The CI automatically fetches and places assets:
|
||
```yaml
|
||
- name: Install circuits for host build
|
||
run: |
|
||
scripts/setup-nomos-circuits.sh v0.3.1 "$TMPDIR/nomos-circuits"
|
||
cp -a "$TMPDIR/nomos-circuits"/. testing-framework/assets/stack/kzgrs_test_params/
|
||
```
|
||
|
||
### When Are Assets Needed?
|
||
|
||
| Runner | When Required |
|
||
|--------|---------------|
|
||
| **Local** | Always (for DA workloads) |
|
||
| **Compose** | During image build (baked into `NOMOS_TESTNET_IMAGE`) |
|
||
| **K8s** | During image build + deployed to cluster via hostPath volume |
|
||
|
||
**Error without assets:**
|
||
```
|
||
Error: missing KZG parameters at testing-framework/assets/stack/kzgrs_test_params/kzgrs_test_params
|
||
```
|
||
|
||
If you see this error, the file `kzgrs_test_params` is missing from the directory. Use `scripts/run-examples.sh` or `scripts/setup-nomos-circuits.sh` to fetch it.
|
||
|
||
## Logging and Observability
|
||
|
||
### Node Logging vs Framework Logging
|
||
|
||
**Critical distinction:** Node logs and framework logs use different configuration mechanisms.
|
||
|
||
| Component | Controlled By | Purpose |
|
||
|-----------|--------------|---------|
|
||
| **Framework binaries** (`cargo run -p runner-examples --bin local_runner`) | `RUST_LOG` | Runner orchestration, deployment logs |
|
||
| **Node processes** (validators, executors spawned by runner) | `NOMOS_LOG_LEVEL`, `NOMOS_LOG_FILTER`, `NOMOS_LOG_DIR` | Consensus, DA, mempool, network logs |
|
||
|
||
**Common mistake:** Setting `RUST_LOG=debug` only increases verbosity of the runner binary itself. Node logs remain at their default level unless you also set `NOMOS_LOG_LEVEL=debug`.
|
||
|
||
**Example:**
|
||
```bash
|
||
# This only makes the RUNNER verbose, not the nodes:
|
||
RUST_LOG=debug cargo run -p runner-examples --bin local_runner
|
||
|
||
# This makes the NODES verbose:
|
||
NOMOS_LOG_LEVEL=debug cargo run -p runner-examples --bin local_runner
|
||
|
||
# Both verbose (typically not needed):
|
||
RUST_LOG=debug NOMOS_LOG_LEVEL=debug cargo run -p runner-examples --bin local_runner
|
||
```
|
||
|
||
### Logging Environment Variables
|
||
|
||
| Variable | Default | Effect |
|
||
|----------|---------|--------|
|
||
| `NOMOS_LOG_DIR` | None (console only) | Directory for per-node log files. If unset, logs go to stdout/stderr. |
|
||
| `NOMOS_LOG_LEVEL` | `info` | Global log level: `error`, `warn`, `info`, `debug`, `trace` |
|
||
| `NOMOS_LOG_FILTER` | None | Fine-grained target filtering (e.g., `consensus=trace,da=debug`) |
|
||
| `NOMOS_TESTS_TRACING` | `false` | Enable the debug tracing preset (optional; combine with `NOMOS_LOG_DIR` unless you have external tracing backends configured) |
|
||
| `NOMOS_OTLP_ENDPOINT` | None | OTLP trace endpoint (optional, disables OTLP noise if unset) |
|
||
| `NOMOS_OTLP_METRICS_ENDPOINT` | None | OTLP metrics endpoint (optional) |
|
||
|
||
**Example:** Full debug logging to files:
|
||
```bash
|
||
NOMOS_TESTS_TRACING=true \
|
||
NOMOS_LOG_DIR=/tmp/test-logs \
|
||
NOMOS_LOG_LEVEL=debug \
|
||
NOMOS_LOG_FILTER="nomos_consensus=trace,nomos_da_sampling=debug" \
|
||
POL_PROOF_DEV_MODE=true \
|
||
cargo run -p runner-examples --bin local_runner
|
||
```
|
||
|
||
### Per-Node Log Files
|
||
|
||
When `NOMOS_LOG_DIR` is set, each node writes logs to separate files:
|
||
|
||
**File naming pattern:**
|
||
- **Validators**: Prefix `nomos-node-0`, `nomos-node-1`, etc. (may include timestamp suffix)
|
||
- **Executors**: Prefix `nomos-executor-0`, `nomos-executor-1`, etc. (may include timestamp suffix)
|
||
|
||
**Local runner note:** The local runner uses per-run temporary directories under the current working directory and removes them after the run unless `NOMOS_TESTS_KEEP_LOGS=1`. Use `NOMOS_LOG_DIR=/path/to/logs` to write per-node log files to a stable location.
|
||
|
||
### Filter Target Names
|
||
|
||
Common target prefixes for `NOMOS_LOG_FILTER`:
|
||
|
||
| Target Prefix | Subsystem |
|
||
|---------------|-----------|
|
||
| `nomos_consensus` | Consensus (Cryptarchia) |
|
||
| `nomos_da_sampling` | DA sampling service |
|
||
| `nomos_da_dispersal` | DA dispersal service |
|
||
| `nomos_da_verifier` | DA verification |
|
||
| `nomos_mempool` | Transaction mempool |
|
||
| `nomos_blend` | Mix network/privacy layer |
|
||
| `chain_network` | P2P networking |
|
||
| `chain_leader` | Leader election |
|
||
|
||
**Example filter:**
|
||
```bash
|
||
NOMOS_LOG_FILTER="nomos_consensus=trace,nomos_da_sampling=debug,chain_network=info"
|
||
```
|
||
|
||
### Accessing Logs Per Runner
|
||
|
||
#### Local Runner
|
||
|
||
**Default (temporary directories, auto-cleanup):**
|
||
```bash
|
||
POL_PROOF_DEV_MODE=true cargo run -p runner-examples --bin local_runner
|
||
# Logs written to temporary directories in working directory
|
||
# Automatically cleaned up after test completes
|
||
```
|
||
|
||
**Persistent file output:**
|
||
```bash
|
||
NOMOS_LOG_DIR=/tmp/local-logs \
|
||
POL_PROOF_DEV_MODE=true \
|
||
cargo run -p runner-examples --bin local_runner
|
||
|
||
# After test completes:
|
||
ls /tmp/local-logs/
|
||
# Files with prefix: nomos-node-0*, nomos-node-1*, nomos-executor-0*
|
||
# May include timestamps in filename
|
||
```
|
||
|
||
**Tip:** Use `NOMOS_LOG_DIR` for persistent per-node log files, and `NOMOS_TESTS_KEEP_LOGS=1` if you want to keep the per-run temporary directories (configs/state) for post-mortem inspection.
|
||
|
||
#### Compose Runner
|
||
|
||
**Via Docker logs (default, recommended):**
|
||
```bash
|
||
# List containers (note the UUID prefix in names)
|
||
docker ps --filter "name=nomos-compose-"
|
||
|
||
# Stream logs from specific container
|
||
docker logs -f <container-id-or-name>
|
||
|
||
# Or use name pattern matching:
|
||
docker logs -f $(docker ps --filter "name=nomos-compose-.*-validator-0" -q | head -1)
|
||
```
|
||
|
||
**Via file collection (advanced):**
|
||
|
||
Setting `NOMOS_LOG_DIR` writes files **inside the container**. To access them, you must either:
|
||
|
||
1. **Copy files out after the run:**
|
||
```bash
|
||
NOMOS_LOG_DIR=/logs \
|
||
NOMOS_TESTNET_IMAGE=logos-blockchain-testing:local \
|
||
POL_PROOF_DEV_MODE=true \
|
||
cargo run -p runner-examples --bin compose_runner
|
||
|
||
# After test, copy files from containers:
|
||
docker ps --filter "name=nomos-compose-"
|
||
docker cp <container-id>:/logs/nomos-node-0* /tmp/
|
||
```
|
||
|
||
2. **Mount a host volume** (requires modifying compose template):
|
||
```yaml
|
||
volumes:
|
||
- /tmp/host-logs:/logs # Add to docker-compose.yml.tera
|
||
```
|
||
|
||
**Recommendation:** Use `docker logs` by default. File collection inside containers is complex and rarely needed.
|
||
|
||
**Keep containers for debugging:**
|
||
```bash
|
||
COMPOSE_RUNNER_PRESERVE=1 \
|
||
NOMOS_TESTNET_IMAGE=logos-blockchain-testing:local \
|
||
cargo run -p runner-examples --bin compose_runner
|
||
# Containers remain running after test—inspect with docker logs or docker exec
|
||
```
|
||
|
||
**Compose networking/debug knobs:**
|
||
- `COMPOSE_RUNNER_HOST=127.0.0.1` — host used for readiness probes (override for remote Docker daemons / VM networking)
|
||
- `COMPOSE_RUNNER_HOST_GATEWAY=host.docker.internal:host-gateway` — controls the `extra_hosts` entry injected into compose (set to `disable` to omit)
|
||
- `TESTNET_RUNNER_PRESERVE=1` — alias for `COMPOSE_RUNNER_PRESERVE=1`
|
||
- `COMPOSE_GRAFANA_PORT=<port>` — pin Grafana to a fixed host port instead of ephemeral assignment
|
||
- `COMPOSE_RUNNER_HTTP_TIMEOUT_SECS=<secs>` — override compose node HTTP readiness timeout
|
||
|
||
**Note:** Container names follow pattern `nomos-compose-{uuid}-validator-{index}-1` where `{uuid}` changes per run.
|
||
|
||
#### K8s Runner
|
||
|
||
**Via kubectl logs (use label selectors):**
|
||
```bash
|
||
# List pods
|
||
kubectl get pods
|
||
|
||
# Stream logs using label selectors (recommended)
|
||
# Helm chart labels:
|
||
# - nomos/logical-role=validator|executor
|
||
# - nomos/validator-index / nomos/executor-index
|
||
kubectl logs -l nomos/logical-role=validator -f
|
||
kubectl logs -l nomos/logical-role=executor -f
|
||
|
||
# Stream logs from specific pod
|
||
kubectl logs -f nomos-validator-0
|
||
|
||
# Previous logs from crashed pods
|
||
kubectl logs --previous -l nomos/logical-role=validator
|
||
```
|
||
|
||
**Download logs for offline analysis:**
|
||
```bash
|
||
# Using label selectors
|
||
kubectl logs -l nomos/logical-role=validator --tail=1000 > all-validators.log
|
||
kubectl logs -l nomos/logical-role=executor --tail=1000 > all-executors.log
|
||
|
||
# Specific pods
|
||
kubectl logs nomos-validator-0 > validator-0.log
|
||
kubectl logs nomos-executor-1 > executor-1.log
|
||
```
|
||
|
||
**K8s environment notes:**
|
||
- The k8s runner is optimized for local clusters (Docker Desktop Kubernetes / minikube / kind):
|
||
- The default image `logos-blockchain-testing:local` must be available on the cluster’s nodes (Docker Desktop shares the local daemon; kind/minikube often requires an explicit image load step).
|
||
- The Helm chart mounts KZG params via a `hostPath` to your workspace path; this typically won’t work on remote/managed clusters without replacing it with a PV/CSI volume or baking the params into an image.
|
||
- Debug helpers:
|
||
- `K8S_RUNNER_DEBUG=1` — logs Helm stdout/stderr for install commands.
|
||
- `K8S_RUNNER_PRESERVE=1` — keep the namespace/release after the run.
|
||
- `K8S_RUNNER_NODE_HOST=<ip|hostname>` — override NodePort host resolution for non-local clusters.
|
||
- `K8S_RUNNER_NAMESPACE=<name>` / `K8S_RUNNER_RELEASE=<name>` — pin namespace/release instead of random IDs (useful for debugging)
|
||
|
||
**Specify namespace (if not using default):**
|
||
```bash
|
||
kubectl logs -n my-namespace -l nomos/logical-role=validator -f
|
||
```
|
||
|
||
### OTLP and Telemetry
|
||
|
||
**OTLP exporters are optional.** If you see errors about unreachable OTLP endpoints, it's safe to ignore them unless you're actively collecting traces/metrics.
|
||
|
||
**To enable OTLP:**
|
||
```bash
|
||
NOMOS_OTLP_ENDPOINT=http://localhost:4317 \
|
||
NOMOS_OTLP_METRICS_ENDPOINT=http://localhost:4318 \
|
||
cargo run -p runner-examples --bin local_runner
|
||
```
|
||
|
||
**To silence OTLP errors:** Simply leave these variables unset (the default).
|
||
|
||
### Observability: Prometheus and Node APIs
|
||
|
||
Runners expose metrics and node HTTP endpoints for expectation code and debugging:
|
||
|
||
**Prometheus (Compose only):**
|
||
- Default: `http://localhost:9090`
|
||
- Override: `TEST_FRAMEWORK_PROMETHEUS_PORT=9091`
|
||
- Note: the host port can vary if `9090` is unavailable; prefer the printed `TESTNET_ENDPOINTS` line as the source of truth.
|
||
- Access from expectations: `ctx.telemetry().prometheus().map(|p| p.base_url())`
|
||
|
||
**Node APIs:**
|
||
- Access from expectations: `ctx.node_clients().validator_clients().get(0)`
|
||
- Endpoints: consensus info, network info, DA membership, etc.
|
||
- See `testing-framework/core/src/nodes/api_client.rs` for available methods
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Expose[Runner exposes endpoints/ports] --> Collect[Runtime collects block/health signals]
|
||
Collect --> Consume[Expectations consume signals<br/>decide pass/fail]
|
||
Consume --> Inspect[Operators inspect logs/metrics<br/>when failures arise]
|
||
```
|