logos-blockchain-testing/book/src/operations-overview.md
andrussal 222436ed8d Reorganize scripts into subdirectories
Move helper scripts under scripts/{run,build,setup,ops,lib} and update all references across docs, CI, Docker, and Rust call sites.
2025-12-18 17:26:02 +01:00

2.8 KiB

Operations & Deployment Overview

Operational readiness focuses on prerequisites, environment fit, and clear signals that ensure your test scenarios run reliably across different deployment targets.

Core Principles

  • Prerequisites First: Ensure all required files, binaries, and assets are in place before attempting to run scenarios
  • Environment Fit: Choose the right deployment target (host, compose, k8s) based on your isolation, reproducibility, and resource needs
  • Clear Signals: Verify runners report node readiness before starting workloads to avoid false negatives
  • Failure Triage: Map failures to specific causes—missing prerequisites, platform issues, or unmet expectations

Key Operational Concerns

Prerequisites:

  • versions.env file at repository root (required by helper scripts)
  • Node binaries (nomos-node, nomos-executor) available or built on demand
  • Platform requirements met (Docker for compose, cluster access for k8s)
  • Circuit assets for DA workloads

Artifacts:

  • KZG parameters (circuit assets) for Data Availability scenarios
  • Docker images for compose/k8s deployments
  • Binary bundles for reproducible builds

Environment Configuration:

  • POL_PROOF_DEV_MODE=true is REQUIRED for all runners to avoid expensive proof generation
  • Logging configured via NOMOS_LOG_* variables
  • Observability endpoints (Prometheus, Grafana) optional but useful

Readiness & Health:

  • Runners verify node readiness before starting workloads
  • Health checks prevent premature workload execution
  • Consensus liveness expectations validate basic operation

Operational Workflow

flowchart LR
    Setup[Prerequisites & Setup] --> Run[Run Scenarios]
    Run --> Monitor[Monitor & Observe]
    Monitor --> Debug{Success?}
    Debug -->|No| Triage[Failure Triage]
    Triage --> Setup
    Debug -->|Yes| Done[Complete]
  1. Setup: Verify prerequisites, configure environment, prepare assets
  2. Run: Execute scenarios using appropriate runner (host/compose/k8s)
  3. Monitor: Collect logs, metrics, and observability signals
  4. Triage: When failures occur, map to root causes and fix prerequisites

Documentation Structure

This Operations & Deployment section covers:

Philosophy: Treat operational hygiene—assets present, prerequisites satisfied, observability reachable—as the first step to reliable scenario outcomes.