diff --git a/book/src/architecture-overview.md b/book/src/architecture-overview.md index eac3bf8..dff7118 100644 --- a/book/src/architecture-overview.md +++ b/book/src/architecture-overview.md @@ -13,6 +13,114 @@ flowchart LR E --> F(Expectations
verify outcomes) ``` +## Crate Architecture + +```mermaid +flowchart TB + subgraph Examples["Runner Examples"] + LocalBin[local_runner.rs] + ComposeBin[compose_runner.rs] + K8sBin[k8s_runner.rs] + CucumberBin[cucumber_*.rs] + end + + subgraph Workflows["Workflows (Batteries Included)"] + DSL[ScenarioBuilderExt
Fluent API] + TxWorkload[Transaction Workload] + DAWorkload[DA Workload] + ChaosWorkload[Chaos Workload] + Expectations[Built-in Expectations] + end + + subgraph Core["Core Framework"] + ScenarioModel[Scenario Model] + Traits[Deployer + Runner Traits] + BlockFeed[BlockFeed] + NodeClients[Node Clients] + Topology[Topology Generation] + end + + subgraph Deployers["Runner Implementations"] + LocalDeployer[LocalDeployer] + ComposeDeployer[ComposeDeployer] + K8sDeployer[K8sDeployer] + end + + subgraph Support["Supporting Crates"] + Configs[Configs & Topology] + Nodes[Node API Clients] + Cucumber[Cucumber Extensions] + end + + Examples --> Workflows + Examples --> Deployers + Workflows --> Core + Deployers --> Core + Deployers --> Support + Core --> Support + Workflows --> Support + + style Examples fill:#e1f5ff + style Workflows fill:#e1ffe1 + style Core fill:#fff4e1 + style Deployers fill:#ffe1f5 + style Support fill:#f0f0f0 +``` + +### Layer Responsibilities + +**Runner Examples (Entry Points)** +- Executable binaries that demonstrate framework usage +- Wire together deployers, scenarios, and execution +- Provide CLI interfaces for different modes + +**Workflows (High-Level API)** +- `ScenarioBuilderExt` trait provides fluent DSL +- Built-in workloads (transactions, DA, chaos) +- Common expectations (liveness, inclusion) +- Simplifies scenario authoring + +**Core Framework (Foundation)** +- `Scenario` model and lifecycle orchestration +- `Deployer` and `Runner` traits (extension points) +- `BlockFeed` for real-time block observation +- `RunContext` providing node clients and metrics +- Topology generation and validation + +**Runner Implementations** +- `LocalDeployer` - spawns processes on host +- `ComposeDeployer` - orchestrates Docker Compose +- `K8sDeployer` - deploys to Kubernetes cluster +- Each implements `Deployer` trait + +**Supporting Crates** +- `configs` - Topology configuration and generation +- `nodes` - HTTP/RPC client for node APIs +- `cucumber` - BDD/Gherkin integration + +### Extension Points + +```mermaid +flowchart LR + Custom[Your Code] -.implements.-> Workload[Workload Trait] + Custom -.implements.-> Expectation[Expectation Trait] + Custom -.implements.-> Deployer[Deployer Trait] + + Workload --> Core[Core Framework] + Expectation --> Core + Deployer --> Core + + style Custom fill:#ffe1f5 + style Core fill:#fff4e1 +``` + +**Extend by implementing:** +- `Workload` - Custom traffic generation patterns +- `Expectation` - Custom success criteria +- `Deployer` - Support for new deployment targets + +See [Extending the Framework](extending.md) for details. + ### Components - **Topology** describes the cluster: how many nodes, their roles, and the high-level network and data-availability parameters they should follow. diff --git a/book/src/examples-advanced.md b/book/src/examples-advanced.md index 8bff59c..490ea2b 100644 --- a/book/src/examples-advanced.md +++ b/book/src/examples-advanced.md @@ -3,9 +3,9 @@ Realistic advanced scenarios demonstrating framework capabilities for production testing. **Adapt from Complete Source:** -- [compose_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/compose_runner.rs) — Compose examples with workloads -- [k8s_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/k8s_runner.rs) — K8s production patterns -- [Chaos testing patterns](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/workflows/src/chaos.rs) — Node control implementation +- [compose_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/compose_runner.rs) — Compose examples with workloads +- [k8s_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/k8s_runner.rs) — K8s production patterns +- [Chaos testing patterns](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/testing-framework/workflows/src/workloads/chaos.rs) — Node control implementation ## Summary diff --git a/book/src/examples.md b/book/src/examples.md index 966e112..e315388 100644 --- a/book/src/examples.md +++ b/book/src/examples.md @@ -4,9 +4,9 @@ Concrete scenario shapes that illustrate how to combine topologies, workloads, and expectations. **View Complete Source Code:** -- [local_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/local_runner.rs) — Host processes (local) -- [compose_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/compose_runner.rs) — Docker Compose -- [k8s_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/k8s_runner.rs) — Kubernetes +- [local_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/local_runner.rs) — Host processes (local) +- [compose_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/compose_runner.rs) — Docker Compose +- [k8s_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/k8s_runner.rs) — Kubernetes **Runnable examples:** The repo includes complete binaries in `examples/src/bin/`: - `local_runner.rs` — Host processes (local) diff --git a/book/src/project-context-primer.md b/book/src/project-context-primer.md index 14a844c..79fb892 100644 --- a/book/src/project-context-primer.md +++ b/book/src/project-context-primer.md @@ -1,16 +1,143 @@ -# Project Context Primer +# Nomos Testing Framework -This book focuses on the Nomos Testing Framework. It assumes familiarity with -the Nomos architecture, but for completeness, here is a short primer. +**Declarative, multi-node blockchain testing for the Logos network** -- **Nomos** is a modular blockchain protocol composed of validators, executors, - and a data-availability (DA) subsystem. -- **Validators** participate in consensus and produce blocks. -- **Executors** are validators with the DA dispersal service enabled. They perform - all validator functions plus submit blob data to the DA network. -- **Data Availability (DA)** ensures that blob data submitted via channel operations - in transactions is published and retrievable by the network. +The Nomos Testing Framework enables you to test consensus, data availability, and transaction workloads across local processes, Docker Compose, and Kubernetes deployments—all with a unified scenario API. -These roles interact tightly, which is why meaningful testing must be performed -in multi-node environments that include real networking, timing, and DA -interaction. +[**Get Started**](quickstart.md) + +--- + +## How It Works + +```mermaid +flowchart LR + Build[Define Scenario] --> Deploy[Deploy Topology] + Deploy --> Execute[Run Workloads] + Execute --> Evaluate[Check Expectations] + + style Build fill:#e1f5ff + style Deploy fill:#fff4e1 + style Execute fill:#ffe1f5 + style Evaluate fill:#e1ffe1 +``` + +1. **Define Scenario** — Describe your test: topology, workloads, and success criteria +2. **Deploy Topology** — Launch validators and executors using host, compose, or k8s runners +3. **Run Workloads** — Drive transactions, DA traffic, and chaos operations +4. **Check Expectations** — Verify consensus liveness, inclusion, and system health + +--- + +## Key Features + +**Declarative API** +- Express scenarios as topology + workloads + expectations +- Reuse the same test definition across different deployment targets +- Compose complex tests from modular components + +**Multiple Deployment Modes** +- **Host Runner**: Local processes for fast iteration +- **Compose Runner**: Containerized environments with node control +- **Kubernetes Runner**: Production-like cluster testing + +**Built-in Workloads** +- Transaction submission with configurable rates +- Data availability (DA) blob dispersal and sampling +- Chaos testing with controlled node restarts + +**Comprehensive Observability** +- Real-time block feed for monitoring consensus progress +- Prometheus/Grafana integration for metrics +- Per-node log collection and debugging + +--- + +## Quick Example + +```rust +use testing_framework_core::scenario::ScenarioBuilder; +use testing_framework_runner_local::LocalDeployer; +use testing_framework_workflows::ScenarioBuilderExt; + +#[tokio::main] +async fn main() -> anyhow::Result<()> { + let mut scenario = ScenarioBuilder::topology_with(|t| { + t.network_star() + .validators(3) + .executors(1) + }) + .transactions_with(|tx| tx.rate(10.0).users(5)) + .expect_consensus_liveness() + .with_run_duration(Duration::from_secs(60)) + .build(); + + let deployer = LocalDeployer::default(); + let runner = deployer.deploy(&scenario).await?; + runner.run(&mut scenario).await?; + + Ok(()) +} +``` + +[View complete examples](examples.md) + +--- + +## Choose Your Path + +### New to the Framework? + +Start with the **[Quickstart Guide](quickstart.md)** for a hands-on introduction that gets you running tests in minutes. + +### Ready to Write Tests? + +Explore the **[User Guide](part-ii.md)** to learn about authoring scenarios, workloads, expectations, and deployment strategies. + +### Setting Up CI/CD? + +Jump to **[Operations & Deployment](part-v.md)** for prerequisites, environment configuration, and continuous integration patterns. + +### Extending the Framework? + +Check the **[Developer Reference](part-iii.md)** to implement custom workloads, expectations, and runners. + +--- + +## Project Context + +**Logos** is a modular blockchain protocol composed of validators, executors, and a data-availability (DA) subsystem: + +- **Validators** participate in consensus and produce blocks +- **Executors** are validators with the DA dispersal service enabled. They perform all validator functions plus submit blob data to the DA network +- **Data Availability (DA)** ensures that blob data submitted via channel operations in transactions is published and retrievable by the network + +These roles interact tightly, which is why meaningful testing must be performed in multi-node environments that include real networking, timing, and DA interaction. + +The Nomos Testing Framework provides the infrastructure to orchestrate these multi-node scenarios reliably across development, CI, and production-like environments. + +--- + +## Documentation Structure + +| Section | Description | +|---------|-------------| +| **[Foundations](part-i.md)** | Architecture, philosophy, and design principles | +| **[User Guide](part-ii.md)** | Writing and running scenarios, workloads, and expectations | +| **[Developer Reference](part-iii.md)** | Extending the framework with custom components | +| **[Operations & Deployment](part-v.md)** | Setup, CI integration, and environment configuration | +| **[Appendix](part-vi.md)** | Quick reference, troubleshooting, FAQ, and glossary | + +--- + +## Quick Links + +- **[What You Will Learn](what-you-will-learn.md)** — Overview of book contents and learning path +- **[Quickstart](quickstart.md)** — Get up and running in 10 minutes +- **[Examples](examples.md)** — Concrete scenario patterns +- **[Troubleshooting](troubleshooting.md)** — Common issues and solutions +- **[Environment Variables](environment-variables.md)** — Complete configuration reference + +--- + +**Ready to start?** Head to the **[Quickstart](quickstart.md)** diff --git a/book/src/runners.md b/book/src/runners.md index 4ea2df3..695efae 100644 --- a/book/src/runners.md +++ b/book/src/runners.md @@ -44,10 +44,106 @@ environment and operational considerations, see [Operations Overview](operations - Environment flags can relax timeouts or increase tracing when diagnostics are needed. +## Runner Comparison + +```mermaid +flowchart TB + subgraph Host["Host Runner (Local)"] + H1["Speed: Fast"] + H2["Isolation: Shared host"] + H3["Setup: Minimal"] + H4["Chaos: Not supported"] + H5["CI: Quick smoke tests"] + end + + subgraph Compose["Compose Runner (Docker)"] + C1["Speed: Medium"] + C2["Isolation: Containerized"] + C3["Setup: Image build required"] + C4["Chaos: Supported"] + C5["CI: Recommended"] + end + + subgraph K8s["K8s Runner (Cluster)"] + K1["Speed: Slower"] + K2["Isolation: Pod-level"] + K3["Setup: Cluster + image"] + K4["Chaos: Not yet supported"] + K5["CI: Large-scale tests"] + end + + Decision{Choose Based On} + Decision -->|Fast iteration| Host + Decision -->|Reproducibility| Compose + Decision -->|Production-like| K8s + + style Host fill:#e1f5ff + style Compose fill:#e1ffe1 + style K8s fill:#ffe1f5 +``` + +## Detailed Feature Matrix + +| Feature | Host | Compose | K8s | +|---------|------|---------|-----| +| **Speed** | Fastest | Medium | Slowest | +| **Setup Time** | < 1 min | 2-5 min | 5-10 min | +| **Isolation** | Process-level | Container | Pod + namespace | +| **Node Control** | No | Yes | Not yet | +| **Observability** | Basic | External stack | Cluster-wide | +| **CI Integration** | Smoke tests | Recommended | Heavy tests | +| **Resource Usage** | Low | Medium | High | +| **Reproducibility** | Environment-dependent | High | Highest | +| **Network Fidelity** | Localhost only | Virtual network | Real cluster | +| **Parallel Runs** | Port conflicts | Isolated | Namespace isolation | + +## Decision Guide + ```mermaid flowchart TD - Plan[Scenario Plan] --> RunSel[Runner
host, compose, or k8s] - RunSel --> Provision[Provision & readiness] - Provision --> Runtime[Runtime + observability] - Runtime --> Exec[Workloads & Expectations execute] + Start[Need to run tests?] --> Q1{Local development?} + Q1 -->|Yes| Q2{Testing chaos?} + Q1 -->|No| Q5{Have cluster access?} + + Q2 -->|Yes| UseCompose[Use Compose] + Q2 -->|No| Q3{Need isolation?} + + Q3 -->|Yes| UseCompose + Q3 -->|No| UseHost[Use Host] + + Q5 -->|Yes| Q6{Large topology?} + Q5 -->|No| Q7{CI pipeline?} + + Q6 -->|Yes| UseK8s[Use K8s] + Q6 -->|No| UseCompose + + Q7 -->|Yes| Q8{Docker available?} + Q7 -->|No| UseHost + + Q8 -->|Yes| UseCompose + Q8 -->|No| UseHost + + style UseHost fill:#e1f5ff + style UseCompose fill:#e1ffe1 + style UseK8s fill:#ffe1f5 ``` + +### Quick Recommendations + +**Use Host Runner when:** +- Iterating rapidly during development +- Running quick smoke tests +- Testing on a laptop with limited resources +- Don't need chaos testing + +**Use Compose Runner when:** +- Need reproducible test environments +- Testing chaos scenarios (node restarts) +- Running in CI pipelines +- Want containerized isolation + +**Use K8s Runner when:** +- Testing large-scale topologies (10+ nodes) +- Need production-like environment +- Have cluster access in CI +- Testing cluster-specific behaviors diff --git a/book/src/scenario-lifecycle.md b/book/src/scenario-lifecycle.md index 839fbd5..8f98f70 100644 --- a/book/src/scenario-lifecycle.md +++ b/book/src/scenario-lifecycle.md @@ -1,18 +1,133 @@ # Scenario Lifecycle -1. **Build the plan**: Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen. -2. **Deploy**: Hand the plan to a deployer. It provisions the environment on the chosen backend, waits for nodes to signal readiness, and returns a runner. -3. **Drive workloads**: The runner starts traffic and behaviors (transactions, data-availability activity, restarts) for the planned duration. -4. **Observe blocks and signals**: Track block progression and other high-level metrics during or after the run window to ground assertions in protocol time. -5. **Evaluate expectations**: Once activity stops (and optional cooldown completes), the runner checks liveness and workload-specific outcomes to decide pass or fail. -6. **Cleanup**: Tear down resources so successive runs start fresh and do not inherit leaked state. +A scenario progresses through six distinct phases, each with a specific responsibility: ```mermaid -flowchart LR - P[Plan
topology + workloads + expectations] --> D[Deploy
deployer provisions] - D --> R[Runner
orchestrates execution] - R --> W[Drive Workloads] - W --> O[Observe
blocks/metrics] - O --> E[Evaluate Expectations] - E --> C[Cleanup] +flowchart TB + subgraph Phase1["1. Build Phase"] + Build[Define Scenario] + BuildDetails["• Declare topology
• Attach workloads
• Add expectations
• Set run duration"] + Build --> BuildDetails + end + + subgraph Phase2["2. Deploy Phase"] + Deploy[Provision Environment] + DeployDetails["• Launch nodes
• Wait for readiness
• Establish connectivity
• Return Runner"] + Deploy --> DeployDetails + end + + subgraph Phase3["3. Capture Phase"] + Capture[Baseline Metrics] + CaptureDetails["• Snapshot initial state
• Start BlockFeed
• Initialize expectations"] + Capture --> CaptureDetails + end + + subgraph Phase4["4. Execution Phase"] + Execute[Drive Workloads] + ExecuteDetails["• Submit transactions
• Disperse DA blobs
• Trigger chaos events
• Run for duration"] + Execute --> ExecuteDetails + end + + subgraph Phase5["5. Evaluation Phase"] + Evaluate[Check Expectations] + EvaluateDetails["• Verify liveness
• Check inclusion
• Validate outcomes
• Aggregate results"] + Evaluate --> EvaluateDetails + end + + subgraph Phase6["6. Cleanup Phase"] + Cleanup[Teardown] + CleanupDetails["• Stop nodes
• Remove containers
• Collect logs
• Release resources"] + Cleanup --> CleanupDetails + end + + Phase1 --> Phase2 + Phase2 --> Phase3 + Phase3 --> Phase4 + Phase4 --> Phase5 + Phase5 --> Phase6 + + style Phase1 fill:#e1f5ff + style Phase2 fill:#fff4e1 + style Phase3 fill:#f0ffe1 + style Phase4 fill:#ffe1f5 + style Phase5 fill:#e1ffe1 + style Phase6 fill:#ffe1e1 ``` + +## Phase Details + +### 1. Build the Plan + +Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen. + +**Key actions:** +- Define cluster shape (validators, executors, network topology) +- Configure workloads (transaction rate, DA traffic, chaos patterns) +- Attach expectations (liveness, inclusion, custom checks) +- Set timing parameters (run duration, cooldown period) + +**Output:** Immutable `Scenario` plan + +### 2. Deploy + +Hand the plan to a deployer. It provisions the environment on the chosen backend, waits for nodes to signal readiness, and returns a runner. + +**Key actions:** +- Provision infrastructure (processes, containers, or pods) +- Launch validator and executor nodes +- Wait for readiness probes (HTTP endpoints respond) +- Establish node connectivity and metrics endpoints +- Spawn BlockFeed for real-time block observation + +**Output:** `Runner` + `RunContext` (with node clients, metrics, control handles) + +### 3. Capture Baseline + +Expectations snapshot initial state before workloads begin. + +**Key actions:** +- Record starting block height +- Initialize counters and trackers +- Subscribe to BlockFeed +- Capture baseline metrics + +**Output:** Captured state for later comparison + +### 4. Drive Workloads + +The runner starts traffic and behaviors for the planned duration. + +**Key actions:** +- Submit transactions at configured rates +- Disperse and sample DA blobs +- Trigger chaos events (node restarts, network partitions) +- Run concurrently for the specified duration +- Observe blocks and metrics in real-time + +**Duration:** Controlled by `with_run_duration()` + +### 5. Evaluate Expectations + +Once activity stops (and optional cooldown completes), the runner checks liveness and workload-specific outcomes. + +**Key actions:** +- Verify consensus liveness (minimum block production) +- Check transaction inclusion rates +- Validate DA dispersal and sampling +- Assess system recovery after chaos events +- Aggregate pass/fail results + +**Output:** Success or detailed failure report + +### 6. Cleanup + +Tear down resources so successive runs start fresh and do not inherit leaked state. + +**Key actions:** +- Stop all node processes/containers/pods +- Remove temporary directories and volumes +- Collect and archive logs (if `NOMOS_TESTS_KEEP_LOGS=1`) +- Release ports and network resources +- Cleanup observability stack (if spawned) + +**Guarantee:** Runs even on panic via `CleanupGuard`