docs(book): update docs

2026-01-05 23:03:07 +00:00 · 2025-12-18 18:36:38 +01:00 · 2025-12-18 18:36:38 +01:00 · 972daa451a
commit 972daa451a
parent 027fd598a7
6 changed files with 482 additions and 36 deletions
--- a/book/src/architecture-overview.md
+++ b/book/src/architecture-overview.md
@ -13,6 +13,114 @@ flowchart LR
    E --> F(Expectations<br/>verify outcomes)
 ```
 ## Crate Architecture
 ```mermaid
 flowchart TB
    subgraph Examples["Runner Examples"]
        LocalBin[local_runner.rs]
        ComposeBin[compose_runner.rs]
        K8sBin[k8s_runner.rs]
        CucumberBin[cucumber_*.rs]
    end
    subgraph Workflows["Workflows (Batteries Included)"]
        DSL[ScenarioBuilderExt<br/>Fluent API]
        TxWorkload[Transaction Workload]
        DAWorkload[DA Workload]
        ChaosWorkload[Chaos Workload]
        Expectations[Built-in Expectations]
    end
    subgraph Core["Core Framework"]
        ScenarioModel[Scenario Model]
        Traits[Deployer + Runner Traits]
        BlockFeed[BlockFeed]
        NodeClients[Node Clients]
        Topology[Topology Generation]
    end
    subgraph Deployers["Runner Implementations"]
        LocalDeployer[LocalDeployer]
        ComposeDeployer[ComposeDeployer]
        K8sDeployer[K8sDeployer]
    end
    subgraph Support["Supporting Crates"]
        Configs[Configs & Topology]
        Nodes[Node API Clients]
        Cucumber[Cucumber Extensions]
    end
    Examples --> Workflows
    Examples --> Deployers
    Workflows --> Core
    Deployers --> Core
    Deployers --> Support
    Core --> Support
    Workflows --> Support
    style Examples fill:#e1f5ff
    style Workflows fill:#e1ffe1
    style Core fill:#fff4e1
    style Deployers fill:#ffe1f5
    style Support fill:#f0f0f0
 ```
 ### Layer Responsibilities
 **Runner Examples (Entry Points)**
 - Executable binaries that demonstrate framework usage
 - Wire together deployers, scenarios, and execution
 - Provide CLI interfaces for different modes
 **Workflows (High-Level API)**
 - `ScenarioBuilderExt` trait provides fluent DSL
 - Built-in workloads (transactions, DA, chaos)
 - Common expectations (liveness, inclusion)
 - Simplifies scenario authoring
 **Core Framework (Foundation)**
 - `Scenario` model and lifecycle orchestration
 - `Deployer` and `Runner` traits (extension points)
 - `BlockFeed` for real-time block observation
 - `RunContext` providing node clients and metrics
 - Topology generation and validation
 **Runner Implementations**
 - `LocalDeployer` - spawns processes on host
 - `ComposeDeployer` - orchestrates Docker Compose
 - `K8sDeployer` - deploys to Kubernetes cluster
 - Each implements `Deployer` trait
 **Supporting Crates**
 - `configs` - Topology configuration and generation
 - `nodes` - HTTP/RPC client for node APIs
 - `cucumber` - BDD/Gherkin integration
 ### Extension Points
 ```mermaid
 flowchart LR
    Custom[Your Code] -.implements.-> Workload[Workload Trait]
    Custom -.implements.-> Expectation[Expectation Trait]
    Custom -.implements.-> Deployer[Deployer Trait]
    Workload --> Core[Core Framework]
    Expectation --> Core
    Deployer --> Core
    style Custom fill:#ffe1f5
    style Core fill:#fff4e1
 ```
 **Extend by implementing:**
 - `Workload` - Custom traffic generation patterns
 - `Expectation` - Custom success criteria
 - `Deployer` - Support for new deployment targets
 See [Extending the Framework](extending.md) for details.
 ### Components
 - **Topology** describes the cluster: how many nodes, their roles, and the high-level network and data-availability parameters they should follow.
--- a/book/src/examples-advanced.md
+++ b/book/src/examples-advanced.md
@ -3,9 +3,9 @@
 Realistic advanced scenarios demonstrating framework capabilities for production testing.
 **Adapt from Complete Source:**
- [compose_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/compose_runner.rs) — Compose examples with workloads
+- [compose_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/compose_runner.rs) — Compose examples with workloads
- [k8s_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/k8s_runner.rs) — K8s production patterns
+- [k8s_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/k8s_runner.rs) — K8s production patterns
- [Chaos testing patterns](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/workflows/src/chaos.rs) — Node control implementation
+- [Chaos testing patterns](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/testing-framework/workflows/src/workloads/chaos.rs) — Node control implementation
 ## Summary
--- a/book/src/examples.md
+++ b/book/src/examples.md
@ -4,9 +4,9 @@ Concrete scenario shapes that illustrate how to combine topologies, workloads,
 and expectations.
 **View Complete Source Code:**
- [local_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/local_runner.rs) — Host processes (local)
+- [local_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/local_runner.rs) — Host processes (local)
- [compose_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/compose_runner.rs) — Docker Compose
+- [compose_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/compose_runner.rs) — Docker Compose
- [k8s_runner.rs](https://github.com/logos-co/nomos-node/blob/master/testnet/testing-framework/runner-examples/src/bin/k8s_runner.rs) — Kubernetes
+- [k8s_runner.rs](https://github.com/logos-blockchain/logos-blockchain-testing/blob/master/examples/src/bin/k8s_runner.rs) — Kubernetes
 **Runnable examples:** The repo includes complete binaries in `examples/src/bin/`:
 - `local_runner.rs` — Host processes (local)
--- a/book/src/project-context-primer.md
+++ b/book/src/project-context-primer.md
@ -1,16 +1,143 @@
-# Project Context Primer
+# Nomos Testing Framework
-This book focuses on the Nomos Testing Framework. It assumes familiarity with
+**Declarative, multi-node blockchain testing for the Logos network**
 the Nomos architecture, but for completeness, here is a short primer.
- **Nomos** is a modular blockchain protocol composed of validators, executors,
+The Nomos Testing Framework enables you to test consensus, data availability, and transaction workloads across local processes, Docker Compose, and Kubernetes deployments—all with a unified scenario API.
  and a data-availability (DA) subsystem.
 - **Validators** participate in consensus and produce blocks.
 - **Executors** are validators with the DA dispersal service enabled. They perform
  all validator functions plus submit blob data to the DA network.
 - **Data Availability (DA)** ensures that blob data submitted via channel operations
  in transactions is published and retrievable by the network.
-These roles interact tightly, which is why meaningful testing must be performed
+[**Get Started**](quickstart.md)
-in multi-node environments that include real networking, timing, and DA
+
-interaction.
+---
 ## How It Works
 ```mermaid
 flowchart LR
    Build[Define Scenario] --> Deploy[Deploy Topology]
    Deploy --> Execute[Run Workloads]
    Execute --> Evaluate[Check Expectations]
    style Build fill:#e1f5ff
    style Deploy fill:#fff4e1
    style Execute fill:#ffe1f5
    style Evaluate fill:#e1ffe1
 ```
 1. **Define Scenario** — Describe your test: topology, workloads, and success criteria
 2. **Deploy Topology** — Launch validators and executors using host, compose, or k8s runners
 3. **Run Workloads** — Drive transactions, DA traffic, and chaos operations
 4. **Check Expectations** — Verify consensus liveness, inclusion, and system health
 ---
 ## Key Features
 **Declarative API**
 - Express scenarios as topology + workloads + expectations
 - Reuse the same test definition across different deployment targets
 - Compose complex tests from modular components
 **Multiple Deployment Modes**
 - **Host Runner**: Local processes for fast iteration
 - **Compose Runner**: Containerized environments with node control
 - **Kubernetes Runner**: Production-like cluster testing
 **Built-in Workloads**
 - Transaction submission with configurable rates
 - Data availability (DA) blob dispersal and sampling
 - Chaos testing with controlled node restarts
 **Comprehensive Observability**
 - Real-time block feed for monitoring consensus progress
 - Prometheus/Grafana integration for metrics
 - Per-node log collection and debugging
 ---
 ## Quick Example
 ```rust
 use testing_framework_core::scenario::ScenarioBuilder;
 use testing_framework_runner_local::LocalDeployer;
 use testing_framework_workflows::ScenarioBuilderExt;
 #[tokio::main]
 async fn main() -> anyhow::Result<()> {
    let mut scenario = ScenarioBuilder::topology_with(|t| {
        t.network_star()
            .validators(3)
            .executors(1)
    })
    .transactions_with(|tx| tx.rate(10.0).users(5))
    .expect_consensus_liveness()
    .with_run_duration(Duration::from_secs(60))
    .build();
    let deployer = LocalDeployer::default();
    let runner = deployer.deploy(&scenario).await?;
    runner.run(&mut scenario).await?;
    Ok(())
 }
 ```
 [View complete examples](examples.md)
 ---
 ## Choose Your Path
 ### New to the Framework?
 Start with the **[Quickstart Guide](quickstart.md)** for a hands-on introduction that gets you running tests in minutes.
 ### Ready to Write Tests?
 Explore the **[User Guide](part-ii.md)** to learn about authoring scenarios, workloads, expectations, and deployment strategies.
 ### Setting Up CI/CD?
 Jump to **[Operations & Deployment](part-v.md)** for prerequisites, environment configuration, and continuous integration patterns.
 ### Extending the Framework?
 Check the **[Developer Reference](part-iii.md)** to implement custom workloads, expectations, and runners.
 ---
 ## Project Context
 **Logos** is a modular blockchain protocol composed of validators, executors, and a data-availability (DA) subsystem:
 - **Validators** participate in consensus and produce blocks
 - **Executors** are validators with the DA dispersal service enabled. They perform all validator functions plus submit blob data to the DA network
 - **Data Availability (DA)** ensures that blob data submitted via channel operations in transactions is published and retrievable by the network
 These roles interact tightly, which is why meaningful testing must be performed in multi-node environments that include real networking, timing, and DA interaction.
 The Nomos Testing Framework provides the infrastructure to orchestrate these multi-node scenarios reliably across development, CI, and production-like environments.
 ---
 ## Documentation Structure
 | Section | Description |
 |---------|-------------|
 | **[Foundations](part-i.md)** | Architecture, philosophy, and design principles |
 | **[User Guide](part-ii.md)** | Writing and running scenarios, workloads, and expectations |
 | **[Developer Reference](part-iii.md)** | Extending the framework with custom components |
 | **[Operations & Deployment](part-v.md)** | Setup, CI integration, and environment configuration |
 | **[Appendix](part-vi.md)** | Quick reference, troubleshooting, FAQ, and glossary |
 ---
 ## Quick Links
 - **[What You Will Learn](what-you-will-learn.md)** — Overview of book contents and learning path
 - **[Quickstart](quickstart.md)** — Get up and running in 10 minutes
 - **[Examples](examples.md)** — Concrete scenario patterns
 - **[Troubleshooting](troubleshooting.md)** — Common issues and solutions
 - **[Environment Variables](environment-variables.md)** — Complete configuration reference
 ---
 **Ready to start?** Head to the **[Quickstart](quickstart.md)**
--- a/book/src/runners.md
+++ b/book/src/runners.md
@ -44,10 +44,106 @@ environment and operational considerations, see [Operations Overview](operations
 - Environment flags can relax timeouts or increase tracing when diagnostics are
  needed.
 ## Runner Comparison
 ```mermaid
 flowchart TB
    subgraph Host["Host Runner (Local)"]
        H1["Speed: Fast"]
        H2["Isolation: Shared host"]
        H3["Setup: Minimal"]
        H4["Chaos: Not supported"]
        H5["CI: Quick smoke tests"]
    end
    subgraph Compose["Compose Runner (Docker)"]
        C1["Speed: Medium"]
        C2["Isolation: Containerized"]
        C3["Setup: Image build required"]
        C4["Chaos: Supported"]
        C5["CI: Recommended"]
    end
    subgraph K8s["K8s Runner (Cluster)"]
        K1["Speed: Slower"]
        K2["Isolation: Pod-level"]
        K3["Setup: Cluster + image"]
        K4["Chaos: Not yet supported"]
        K5["CI: Large-scale tests"]
    end
    Decision{Choose Based On}
    Decision -->|Fast iteration| Host
    Decision -->|Reproducibility| Compose
    Decision -->|Production-like| K8s
    style Host fill:#e1f5ff
    style Compose fill:#e1ffe1
    style K8s fill:#ffe1f5
 ```
 ## Detailed Feature Matrix
 | Feature | Host | Compose | K8s |
 |---------|------|---------|-----|
 | **Speed** | Fastest | Medium | Slowest |
 | **Setup Time** | < 1 min | 2-5 min | 5-10 min |
 | **Isolation** | Process-level | Container | Pod + namespace |
 | **Node Control** | No | Yes | Not yet |
 | **Observability** | Basic | External stack | Cluster-wide |
 | **CI Integration** | Smoke tests | Recommended | Heavy tests |
 | **Resource Usage** | Low | Medium | High |
 | **Reproducibility** | Environment-dependent | High | Highest |
 | **Network Fidelity** | Localhost only | Virtual network | Real cluster |
 | **Parallel Runs** | Port conflicts | Isolated | Namespace isolation |
 ## Decision Guide
 ```mermaid
 flowchart TD
-    Plan[Scenario Plan] --> RunSel[Runner<br/>host, compose, or k8s]
+    Start[Need to run tests?] --> Q1{Local development?}
-    RunSel --> Provision[Provision & readiness]
+    Q1 -->|Yes| Q2{Testing chaos?}
-    Provision --> Runtime[Runtime + observability]
+    Q1 -->|No| Q5{Have cluster access?}
-    Runtime --> Exec[Workloads & Expectations execute]
+    
    Q2 -->|Yes| UseCompose[Use Compose]
    Q2 -->|No| Q3{Need isolation?}
    Q3 -->|Yes| UseCompose
    Q3 -->|No| UseHost[Use Host]
    Q5 -->|Yes| Q6{Large topology?}
    Q5 -->|No| Q7{CI pipeline?}
    Q6 -->|Yes| UseK8s[Use K8s]
    Q6 -->|No| UseCompose
    Q7 -->|Yes| Q8{Docker available?}
    Q7 -->|No| UseHost
    Q8 -->|Yes| UseCompose
    Q8 -->|No| UseHost
    style UseHost fill:#e1f5ff
    style UseCompose fill:#e1ffe1
    style UseK8s fill:#ffe1f5
 ```
 ### Quick Recommendations
 **Use Host Runner when:**
 - Iterating rapidly during development
 - Running quick smoke tests
 - Testing on a laptop with limited resources
 - Don't need chaos testing
 **Use Compose Runner when:**
 - Need reproducible test environments
 - Testing chaos scenarios (node restarts)
 - Running in CI pipelines
 - Want containerized isolation
 **Use K8s Runner when:**
 - Testing large-scale topologies (10+ nodes)
 - Need production-like environment
 - Have cluster access in CI
 - Testing cluster-specific behaviors
--- a/book/src/scenario-lifecycle.md
+++ b/book/src/scenario-lifecycle.md
@ -1,18 +1,133 @@
 # Scenario Lifecycle
-1. **Build the plan**: Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen.
+A scenario progresses through six distinct phases, each with a specific responsibility:
 2. **Deploy**: Hand the plan to a deployer. It provisions the environment on the chosen backend, waits for nodes to signal readiness, and returns a runner.
 3. **Drive workloads**: The runner starts traffic and behaviors (transactions, data-availability activity, restarts) for the planned duration.
 4. **Observe blocks and signals**: Track block progression and other high-level metrics during or after the run window to ground assertions in protocol time.
 5. **Evaluate expectations**: Once activity stops (and optional cooldown completes), the runner checks liveness and workload-specific outcomes to decide pass or fail.
 6. **Cleanup**: Tear down resources so successive runs start fresh and do not inherit leaked state.
 ```mermaid
-flowchart LR
+flowchart TB
-    P[Plan<br/>topology + workloads + expectations] --> D[Deploy<br/>deployer provisions]
+    subgraph Phase1["1. Build Phase"]
-    D --> R[Runner<br/>orchestrates execution]
+        Build[Define Scenario]
-    R --> W[Drive Workloads]
+        BuildDetails["• Declare topology<br/>• Attach workloads<br/>• Add expectations<br/>• Set run duration"]
-    W --> O[Observe<br/>blocks/metrics]
+        Build --> BuildDetails
-    O --> E[Evaluate Expectations]
+    end
-    E --> C[Cleanup]
+    
    subgraph Phase2["2. Deploy Phase"]
        Deploy[Provision Environment]
        DeployDetails["• Launch nodes<br/>• Wait for readiness<br/>• Establish connectivity<br/>• Return Runner"]
        Deploy --> DeployDetails
    end
    subgraph Phase3["3. Capture Phase"]
        Capture[Baseline Metrics]
        CaptureDetails["• Snapshot initial state<br/>• Start BlockFeed<br/>• Initialize expectations"]
        Capture --> CaptureDetails
    end
    subgraph Phase4["4. Execution Phase"]
        Execute[Drive Workloads]
        ExecuteDetails["• Submit transactions<br/>• Disperse DA blobs<br/>• Trigger chaos events<br/>• Run for duration"]
        Execute --> ExecuteDetails
    end
    subgraph Phase5["5. Evaluation Phase"]
        Evaluate[Check Expectations]
        EvaluateDetails["• Verify liveness<br/>• Check inclusion<br/>• Validate outcomes<br/>• Aggregate results"]
        Evaluate --> EvaluateDetails
    end
    subgraph Phase6["6. Cleanup Phase"]
        Cleanup[Teardown]
        CleanupDetails["• Stop nodes<br/>• Remove containers<br/>• Collect logs<br/>• Release resources"]
        Cleanup --> CleanupDetails
    end
    Phase1 --> Phase2
    Phase2 --> Phase3
    Phase3 --> Phase4
    Phase4 --> Phase5
    Phase5 --> Phase6
    style Phase1 fill:#e1f5ff
    style Phase2 fill:#fff4e1
    style Phase3 fill:#f0ffe1
    style Phase4 fill:#ffe1f5
    style Phase5 fill:#e1ffe1
    style Phase6 fill:#ffe1e1
 ```
 ## Phase Details
 ### 1. Build the Plan
 Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen.
 **Key actions:**
 - Define cluster shape (validators, executors, network topology)
 - Configure workloads (transaction rate, DA traffic, chaos patterns)
 - Attach expectations (liveness, inclusion, custom checks)
 - Set timing parameters (run duration, cooldown period)
 **Output:** Immutable `Scenario` plan
 ### 2. Deploy
 Hand the plan to a deployer. It provisions the environment on the chosen backend, waits for nodes to signal readiness, and returns a runner.
 **Key actions:**
 - Provision infrastructure (processes, containers, or pods)
 - Launch validator and executor nodes
 - Wait for readiness probes (HTTP endpoints respond)
 - Establish node connectivity and metrics endpoints
 - Spawn BlockFeed for real-time block observation
 **Output:** `Runner` + `RunContext` (with node clients, metrics, control handles)
 ### 3. Capture Baseline
 Expectations snapshot initial state before workloads begin.
 **Key actions:**
 - Record starting block height
 - Initialize counters and trackers
 - Subscribe to BlockFeed
 - Capture baseline metrics
 **Output:** Captured state for later comparison
 ### 4. Drive Workloads
 The runner starts traffic and behaviors for the planned duration.
 **Key actions:**
 - Submit transactions at configured rates
 - Disperse and sample DA blobs
 - Trigger chaos events (node restarts, network partitions)
 - Run concurrently for the specified duration
 - Observe blocks and metrics in real-time
 **Duration:** Controlled by `with_run_duration()`
 ### 5. Evaluate Expectations
 Once activity stops (and optional cooldown completes), the runner checks liveness and workload-specific outcomes.
 **Key actions:**
 - Verify consensus liveness (minimum block production)
 - Check transaction inclusion rates
 - Validate DA dispersal and sampling
 - Assess system recovery after chaos events
 - Aggregate pass/fail results
 **Output:** Success or detailed failure report
 ### 6. Cleanup
 Tear down resources so successive runs start fresh and do not inherit leaked state.
 **Key actions:**
 - Stop all node processes/containers/pods
 - Remove temporary directories and volumes
 - Collect and archive logs (if `NOMOS_TESTS_KEEP_LOGS=1`)
 - Release ports and network resources
 - Cleanup observability stack (if spawned)
 **Guarantee:** Runs even on panic via `CleanupGuard`