docs(book): update docs

This commit is contained in:
andrussal 2025-12-18 18:36:38 +01:00
parent 027fd598a7
commit 972daa451a
6 changed files with 482 additions and 36 deletions

View File

@ -13,6 +13,114 @@ flowchart LR
E --> F(Expectations<br/>verify outcomes) E --> F(Expectations<br/>verify outcomes)
``` ```
## Crate Architecture
```mermaid
flowchart TB
subgraph Examples["Runner Examples"]
LocalBin[local_runner.rs]
ComposeBin[compose_runner.rs]
K8sBin[k8s_runner.rs]
CucumberBin[cucumber_*.rs]
end
subgraph Workflows["Workflows (Batteries Included)"]
DSL[ScenarioBuilderExt<br/>Fluent API]
TxWorkload[Transaction Workload]
DAWorkload[DA Workload]
ChaosWorkload[Chaos Workload]
Expectations[Built-in Expectations]
end
subgraph Core["Core Framework"]
ScenarioModel[Scenario Model]
Traits[Deployer + Runner Traits]
BlockFeed[BlockFeed]
NodeClients[Node Clients]
Topology[Topology Generation]
end
subgraph Deployers["Runner Implementations"]
LocalDeployer[LocalDeployer]
ComposeDeployer[ComposeDeployer]
K8sDeployer[K8sDeployer]
end
subgraph Support["Supporting Crates"]
Configs[Configs & Topology]
Nodes[Node API Clients]
Cucumber[Cucumber Extensions]
end
Examples --> Workflows
Examples --> Deployers
Workflows --> Core
Deployers --> Core
Deployers --> Support
Core --> Support
Workflows --> Support
style Examples fill:#e1f5ff
style Workflows fill:#e1ffe1
style Core fill:#fff4e1
style Deployers fill:#ffe1f5
style Support fill:#f0f0f0
```
### Layer Responsibilities
**Runner Examples (Entry Points)**
- Executable binaries that demonstrate framework usage
- Wire together deployers, scenarios, and execution
- Provide CLI interfaces for different modes
**Workflows (High-Level API)**
- `ScenarioBuilderExt` trait provides fluent DSL
- Built-in workloads (transactions, DA, chaos)
- Common expectations (liveness, inclusion)
- Simplifies scenario authoring
**Core Framework (Foundation)**
- `Scenario` model and lifecycle orchestration
- `Deployer` and `Runner` traits (extension points)
- `BlockFeed` for real-time block observation
- `RunContext` providing node clients and metrics
- Topology generation and validation
**Runner Implementations**
- `LocalDeployer` - spawns processes on host
- `ComposeDeployer` - orchestrates Docker Compose
- `K8sDeployer` - deploys to Kubernetes cluster
- Each implements `Deployer` trait
**Supporting Crates**
- `configs` - Topology configuration and generation
- `nodes` - HTTP/RPC client for node APIs
- `cucumber` - BDD/Gherkin integration
### Extension Points
```mermaid
flowchart LR
Custom[Your Code] -.implements.-> Workload[Workload Trait]
Custom -.implements.-> Expectation[Expectation Trait]
Custom -.implements.-> Deployer[Deployer Trait]
Workload --> Core[Core Framework]
Expectation --> Core
Deployer --> Core
style Custom fill:#ffe1f5
style Core fill:#fff4e1
```
**Extend by implementing:**
- `Workload` - Custom traffic generation patterns
- `Expectation` - Custom success criteria
- `Deployer` - Support for new deployment targets
See [Extending the Framework](extending.md) for details.
### Components ### Components
- **Topology** describes the cluster: how many nodes, their roles, and the high-level network and data-availability parameters they should follow. - **Topology** describes the cluster: how many nodes, their roles, and the high-level network and data-availability parameters they should follow.

View File

@ -3,9 +3,9 @@
Realistic advanced scenarios demonstrating framework capabilities for production testing. Realistic advanced scenarios demonstrating framework capabilities for production testing.
**Adapt from Complete Source:** **Adapt from Complete Source:**
- [compose_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/compose_runner.rs) — Compose examples with workloads - [compose_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/compose_runner.rs) — Compose examples with workloads
- [k8s_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/k8s_runner.rs) — K8s production patterns - [k8s_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/k8s_runner.rs) — K8s production patterns
- [Chaos testing patterns](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/workflows/src/chaos.rs) — Node control implementation - [Chaos testing patterns](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/testing-framework/workflows/src/workloads/chaos.rs) — Node control implementation
## Summary ## Summary

View File

@ -4,9 +4,9 @@ Concrete scenario shapes that illustrate how to combine topologies, workloads,
and expectations. and expectations.
**View Complete Source Code:** **View Complete Source Code:**
- [local_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/local_runner.rs) — Host processes (local) - [local_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/local_runner.rs) — Host processes (local)
- [compose_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/compose_runner.rs) — Docker Compose - [compose_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/compose_runner.rs) — Docker Compose
- [k8s_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/k8s_runner.rs) — Kubernetes - [k8s_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/k8s_runner.rs) — Kubernetes
**Runnable examples:** The repo includes complete binaries in `examples/src/bin/`: **Runnable examples:** The repo includes complete binaries in `examples/src/bin/`:
- `local_runner.rs` — Host processes (local) - `local_runner.rs` — Host processes (local)

View File

@ -1,16 +1,143 @@
# Project Context Primer # Nomos Testing Framework
This book focuses on the Nomos Testing Framework. It assumes familiarity with **Declarative, multi-node blockchain testing for the Logos network**
the Nomos architecture, but for completeness, here is a short primer.
- **Nomos** is a modular blockchain protocol composed of validators, executors, The Nomos Testing Framework enables you to test consensus, data availability, and transaction workloads across local processes, Docker Compose, and Kubernetes deployments—all with a unified scenario API.
and a data-availability (DA) subsystem.
- **Validators** participate in consensus and produce blocks.
- **Executors** are validators with the DA dispersal service enabled. They perform
all validator functions plus submit blob data to the DA network.
- **Data Availability (DA)** ensures that blob data submitted via channel operations
in transactions is published and retrievable by the network.
These roles interact tightly, which is why meaningful testing must be performed [**Get Started**](quickstart.md)
in multi-node environments that include real networking, timing, and DA
interaction. ---
## How It Works
```mermaid
flowchart LR
Build[Define Scenario] --> Deploy[Deploy Topology]
Deploy --> Execute[Run Workloads]
Execute --> Evaluate[Check Expectations]
style Build fill:#e1f5ff
style Deploy fill:#fff4e1
style Execute fill:#ffe1f5
style Evaluate fill:#e1ffe1
```
1. **Define Scenario** — Describe your test: topology, workloads, and success criteria
2. **Deploy Topology** — Launch validators and executors using host, compose, or k8s runners
3. **Run Workloads** — Drive transactions, DA traffic, and chaos operations
4. **Check Expectations** — Verify consensus liveness, inclusion, and system health
---
## Key Features
**Declarative API**
- Express scenarios as topology + workloads + expectations
- Reuse the same test definition across different deployment targets
- Compose complex tests from modular components
**Multiple Deployment Modes**
- **Host Runner**: Local processes for fast iteration
- **Compose Runner**: Containerized environments with node control
- **Kubernetes Runner**: Production-like cluster testing
**Built-in Workloads**
- Transaction submission with configurable rates
- Data availability (DA) blob dispersal and sampling
- Chaos testing with controlled node restarts
**Comprehensive Observability**
- Real-time block feed for monitoring consensus progress
- Prometheus/Grafana integration for metrics
- Per-node log collection and debugging
---
## Quick Example
```rust
use testing_framework_core::scenario::ScenarioBuilder;
use testing_framework_runner_local::LocalDeployer;
use testing_framework_workflows::ScenarioBuilderExt;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let mut scenario = ScenarioBuilder::topology_with(|t| {
t.network_star()
.validators(3)
.executors(1)
})
.transactions_with(|tx| tx.rate(10.0).users(5))
.expect_consensus_liveness()
.with_run_duration(Duration::from_secs(60))
.build();
let deployer = LocalDeployer::default();
let runner = deployer.deploy(&scenario).await?;
runner.run(&mut scenario).await?;
Ok(())
}
```
[View complete examples](examples.md)
---
## Choose Your Path
### New to the Framework?
Start with the **[Quickstart Guide](quickstart.md)** for a hands-on introduction that gets you running tests in minutes.
### Ready to Write Tests?
Explore the **[User Guide](part-ii.md)** to learn about authoring scenarios, workloads, expectations, and deployment strategies.
### Setting Up CI/CD?
Jump to **[Operations & Deployment](part-v.md)** for prerequisites, environment configuration, and continuous integration patterns.
### Extending the Framework?
Check the **[Developer Reference](part-iii.md)** to implement custom workloads, expectations, and runners.
---
## Project Context
**Logos** is a modular blockchain protocol composed of validators, executors, and a data-availability (DA) subsystem:
- **Validators** participate in consensus and produce blocks
- **Executors** are validators with the DA dispersal service enabled. They perform all validator functions plus submit blob data to the DA network
- **Data Availability (DA)** ensures that blob data submitted via channel operations in transactions is published and retrievable by the network
These roles interact tightly, which is why meaningful testing must be performed in multi-node environments that include real networking, timing, and DA interaction.
The Nomos Testing Framework provides the infrastructure to orchestrate these multi-node scenarios reliably across development, CI, and production-like environments.
---
## Documentation Structure
| Section | Description |
|---------|-------------|
| **[Foundations](part-i.md)** | Architecture, philosophy, and design principles |
| **[User Guide](part-ii.md)** | Writing and running scenarios, workloads, and expectations |
| **[Developer Reference](part-iii.md)** | Extending the framework with custom components |
| **[Operations & Deployment](part-v.md)** | Setup, CI integration, and environment configuration |
| **[Appendix](part-vi.md)** | Quick reference, troubleshooting, FAQ, and glossary |
---
## Quick Links
- **[What You Will Learn](what-you-will-learn.md)** — Overview of book contents and learning path
- **[Quickstart](quickstart.md)** — Get up and running in 10 minutes
- **[Examples](examples.md)** — Concrete scenario patterns
- **[Troubleshooting](troubleshooting.md)** — Common issues and solutions
- **[Environment Variables](environment-variables.md)** — Complete configuration reference
---
**Ready to start?** Head to the **[Quickstart](quickstart.md)**

View File

@ -44,10 +44,106 @@ environment and operational considerations, see [Operations Overview](operations
- Environment flags can relax timeouts or increase tracing when diagnostics are - Environment flags can relax timeouts or increase tracing when diagnostics are
needed. needed.
## Runner Comparison
```mermaid
flowchart TB
subgraph Host["Host Runner (Local)"]
H1["Speed: Fast"]
H2["Isolation: Shared host"]
H3["Setup: Minimal"]
H4["Chaos: Not supported"]
H5["CI: Quick smoke tests"]
end
subgraph Compose["Compose Runner (Docker)"]
C1["Speed: Medium"]
C2["Isolation: Containerized"]
C3["Setup: Image build required"]
C4["Chaos: Supported"]
C5["CI: Recommended"]
end
subgraph K8s["K8s Runner (Cluster)"]
K1["Speed: Slower"]
K2["Isolation: Pod-level"]
K3["Setup: Cluster + image"]
K4["Chaos: Not yet supported"]
K5["CI: Large-scale tests"]
end
Decision{Choose Based On}
Decision -->|Fast iteration| Host
Decision -->|Reproducibility| Compose
Decision -->|Production-like| K8s
style Host fill:#e1f5ff
style Compose fill:#e1ffe1
style K8s fill:#ffe1f5
```
## Detailed Feature Matrix
| Feature | Host | Compose | K8s |
|---------|------|---------|-----|
| **Speed** | Fastest | Medium | Slowest |
| **Setup Time** | < 1 min | 2-5 min | 5-10 min |
| **Isolation** | Process-level | Container | Pod + namespace |
| **Node Control** | No | Yes | Not yet |
| **Observability** | Basic | External stack | Cluster-wide |
| **CI Integration** | Smoke tests | Recommended | Heavy tests |
| **Resource Usage** | Low | Medium | High |
| **Reproducibility** | Environment-dependent | High | Highest |
| **Network Fidelity** | Localhost only | Virtual network | Real cluster |
| **Parallel Runs** | Port conflicts | Isolated | Namespace isolation |
## Decision Guide
```mermaid ```mermaid
flowchart TD flowchart TD
Plan[Scenario Plan] --> RunSel[Runner<br/>host, compose, or k8s] Start[Need to run tests?] --> Q1{Local development?}
RunSel --> Provision[Provision & readiness] Q1 -->|Yes| Q2{Testing chaos?}
Provision --> Runtime[Runtime + observability] Q1 -->|No| Q5{Have cluster access?}
Runtime --> Exec[Workloads & Expectations execute]
Q2 -->|Yes| UseCompose[Use Compose]
Q2 -->|No| Q3{Need isolation?}
Q3 -->|Yes| UseCompose
Q3 -->|No| UseHost[Use Host]
Q5 -->|Yes| Q6{Large topology?}
Q5 -->|No| Q7{CI pipeline?}
Q6 -->|Yes| UseK8s[Use K8s]
Q6 -->|No| UseCompose
Q7 -->|Yes| Q8{Docker available?}
Q7 -->|No| UseHost
Q8 -->|Yes| UseCompose
Q8 -->|No| UseHost
style UseHost fill:#e1f5ff
style UseCompose fill:#e1ffe1
style UseK8s fill:#ffe1f5
``` ```
### Quick Recommendations
**Use Host Runner when:**
- Iterating rapidly during development
- Running quick smoke tests
- Testing on a laptop with limited resources
- Don't need chaos testing
**Use Compose Runner when:**
- Need reproducible test environments
- Testing chaos scenarios (node restarts)
- Running in CI pipelines
- Want containerized isolation
**Use K8s Runner when:**
- Testing large-scale topologies (10+ nodes)
- Need production-like environment
- Have cluster access in CI
- Testing cluster-specific behaviors

View File

@ -1,18 +1,133 @@
# Scenario Lifecycle # Scenario Lifecycle
1. **Build the plan**: Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen. A scenario progresses through six distinct phases, each with a specific responsibility:
2. **Deploy**: Hand the plan to a deployer. It provisions the environment on the chosen backend, waits for nodes to signal readiness, and returns a runner.
3. **Drive workloads**: The runner starts traffic and behaviors (transactions, data-availability activity, restarts) for the planned duration.
4. **Observe blocks and signals**: Track block progression and other high-level metrics during or after the run window to ground assertions in protocol time.
5. **Evaluate expectations**: Once activity stops (and optional cooldown completes), the runner checks liveness and workload-specific outcomes to decide pass or fail.
6. **Cleanup**: Tear down resources so successive runs start fresh and do not inherit leaked state.
```mermaid ```mermaid
flowchart LR flowchart TB
P[Plan<br/>topology + workloads + expectations] --> D[Deploy<br/>deployer provisions] subgraph Phase1["1. Build Phase"]
D --> R[Runner<br/>orchestrates execution] Build[Define Scenario]
R --> W[Drive Workloads] BuildDetails["• Declare topology<br/>• Attach workloads<br/>• Add expectations<br/>• Set run duration"]
W --> O[Observe<br/>blocks/metrics] Build --> BuildDetails
O --> E[Evaluate Expectations] end
E --> C[Cleanup]
subgraph Phase2["2. Deploy Phase"]
Deploy[Provision Environment]
DeployDetails["• Launch nodes<br/>• Wait for readiness<br/>• Establish connectivity<br/>• Return Runner"]
Deploy --> DeployDetails
end
subgraph Phase3["3. Capture Phase"]
Capture[Baseline Metrics]
CaptureDetails["• Snapshot initial state<br/>• Start BlockFeed<br/>• Initialize expectations"]
Capture --> CaptureDetails
end
subgraph Phase4["4. Execution Phase"]
Execute[Drive Workloads]
ExecuteDetails["• Submit transactions<br/>• Disperse DA blobs<br/>• Trigger chaos events<br/>• Run for duration"]
Execute --> ExecuteDetails
end
subgraph Phase5["5. Evaluation Phase"]
Evaluate[Check Expectations]
EvaluateDetails["• Verify liveness<br/>• Check inclusion<br/>• Validate outcomes<br/>• Aggregate results"]
Evaluate --> EvaluateDetails
end
subgraph Phase6["6. Cleanup Phase"]
Cleanup[Teardown]
CleanupDetails["• Stop nodes<br/>• Remove containers<br/>• Collect logs<br/>• Release resources"]
Cleanup --> CleanupDetails
end
Phase1 --> Phase2
Phase2 --> Phase3
Phase3 --> Phase4
Phase4 --> Phase5
Phase5 --> Phase6
style Phase1 fill:#e1f5ff
style Phase2 fill:#fff4e1
style Phase3 fill:#f0ffe1
style Phase4 fill:#ffe1f5
style Phase5 fill:#e1ffe1
style Phase6 fill:#ffe1e1
``` ```
## Phase Details
### 1. Build the Plan
Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen.
**Key actions:**
- Define cluster shape (validators, executors, network topology)
- Configure workloads (transaction rate, DA traffic, chaos patterns)
- Attach expectations (liveness, inclusion, custom checks)
- Set timing parameters (run duration, cooldown period)
**Output:** Immutable `Scenario` plan
### 2. Deploy
Hand the plan to a deployer. It provisions the environment on the chosen backend, waits for nodes to signal readiness, and returns a runner.
**Key actions:**
- Provision infrastructure (processes, containers, or pods)
- Launch validator and executor nodes
- Wait for readiness probes (HTTP endpoints respond)
- Establish node connectivity and metrics endpoints
- Spawn BlockFeed for real-time block observation
**Output:** `Runner` + `RunContext` (with node clients, metrics, control handles)
### 3. Capture Baseline
Expectations snapshot initial state before workloads begin.
**Key actions:**
- Record starting block height
- Initialize counters and trackers
- Subscribe to BlockFeed
- Capture baseline metrics
**Output:** Captured state for later comparison
### 4. Drive Workloads
The runner starts traffic and behaviors for the planned duration.
**Key actions:**
- Submit transactions at configured rates
- Disperse and sample DA blobs
- Trigger chaos events (node restarts, network partitions)
- Run concurrently for the specified duration
- Observe blocks and metrics in real-time
**Duration:** Controlled by `with_run_duration()`
### 5. Evaluate Expectations
Once activity stops (and optional cooldown completes), the runner checks liveness and workload-specific outcomes.
**Key actions:**
- Verify consensus liveness (minimum block production)
- Check transaction inclusion rates
- Validate DA dispersal and sampling
- Assess system recovery after chaos events
- Aggregate pass/fail results
**Output:** Success or detailed failure report
### 6. Cleanup
Tear down resources so successive runs start fresh and do not inherit leaked state.
**Key actions:**
- Stop all node processes/containers/pods
- Remove temporary directories and volumes
- Collect and archive logs (if `NOMOS_TESTS_KEEP_LOGS=1`)
- Release ports and network resources
- Cleanup observability stack (if spawned)
**Guarantee:** Runs even on panic via `CleanupGuard`