mirror of
https://github.com/logos-blockchain/logos-blockchain-testing.git
synced 2026-01-02 13:23:13 +00:00
Add testnet image build flow and runner docs
This commit is contained in:
parent
e04af1441a
commit
92e855741a
13
book/book.toml
Normal file
13
book/book.toml
Normal file
@ -0,0 +1,13 @@
|
||||
[book]
|
||||
authors = ["Nomos Testing"]
|
||||
language = "en"
|
||||
multilingual = false
|
||||
src = "src"
|
||||
title = "Nomos Testing Book"
|
||||
|
||||
[build]
|
||||
# Keep book output in target/ to avoid polluting the workspace root.
|
||||
build-dir = "../target/book"
|
||||
|
||||
[output.html]
|
||||
default-theme = "light"
|
||||
549
book/combined.md
Normal file
549
book/combined.md
Normal file
@ -0,0 +1,549 @@
|
||||
# Nomos Testing Framework — Combined Reference
|
||||
|
||||
## Project Context Primer
|
||||
This book focuses on the Nomos Testing Framework. It assumes familiarity with
|
||||
the Nomos architecture, but for completeness, here is a short primer.
|
||||
|
||||
- **Nomos** is a modular blockchain protocol composed of validators, executors,
|
||||
and a data-availability (DA) subsystem.
|
||||
- **Validators** participate in consensus and produce blocks.
|
||||
- **Executors** run application logic or off-chain computations referenced by
|
||||
blocks.
|
||||
- **Data Availability (DA)** ensures that data referenced in blocks is
|
||||
published and retrievable, including blobs or channel data used by workloads.
|
||||
|
||||
These roles interact tightly, which is why meaningful testing must be performed
|
||||
in multi-node environments that include real networking, timing, and DA
|
||||
interaction.
|
||||
|
||||
## What You Will Learn
|
||||
This book gives you a clear mental model for Nomos multi-node testing, shows how
|
||||
to author scenarios that pair realistic workloads with explicit expectations,
|
||||
and guides you to run them across local, containerized, and cluster environments
|
||||
without changing the plan.
|
||||
|
||||
## Part I — Foundations
|
||||
|
||||
### Introduction
|
||||
The Nomos Testing Framework is a purpose-built toolkit for exercising Nomos in
|
||||
realistic, multi-node environments. It solves the gap between small, isolated
|
||||
tests and full-system validation by letting teams describe a cluster layout,
|
||||
drive meaningful traffic, and assert the outcomes in one coherent plan.
|
||||
|
||||
It is for protocol engineers, infrastructure operators, and QA teams who need
|
||||
repeatable confidence that validators, executors, and data-availability
|
||||
components work together under network and timing constraints.
|
||||
|
||||
Multi-node integration testing is required because many Nomos behaviors—block
|
||||
progress, data availability, liveness under churn—only emerge when several
|
||||
roles interact over real networking and time. This framework makes those checks
|
||||
declarative, observable, and portable across environments.
|
||||
|
||||
### Architecture Overview
|
||||
The framework follows a clear flow: **Topology → Scenario → Runner → Workloads → Expectations**.
|
||||
|
||||
- **Topology** describes the cluster: how many nodes, their roles, and the high-level network and data-availability parameters they should follow.
|
||||
- **Scenario** combines that topology with the activities to run and the checks to perform, forming a single plan.
|
||||
- **Deployer/Runner** pair turns the plan into a live environment on the chosen backend (local processes, Docker Compose, or Kubernetes) and brokers readiness.
|
||||
- **Workloads** generate traffic and conditions that exercise the system.
|
||||
- **Expectations** observe the run and judge success or failure once activity completes.
|
||||
|
||||
Conceptual diagram:
|
||||
```
|
||||
Topology → Scenario → Runner → Workloads → Expectations
|
||||
(shape (plan) (deploy (drive (verify
|
||||
cluster) & orchestrate) traffic) outcomes)
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A(Topology<br/>shape cluster) --> B(Scenario<br/>plan)
|
||||
B --> C(Deployer/Runner<br/>deploy & orchestrate)
|
||||
C --> D(Workloads<br/>drive traffic)
|
||||
D --> E(Expectations<br/>verify outcomes)
|
||||
```
|
||||
|
||||
Each layer has a narrow responsibility so that cluster shape, deployment choice, traffic generation, and health checks can evolve independently while fitting together predictably.
|
||||
|
||||
### Testing Philosophy
|
||||
- **Declarative over imperative**: describe the desired cluster shape, traffic, and success criteria; let the framework orchestrate the run.
|
||||
- **Observable health signals**: prefer liveness and inclusion signals that reflect real user impact instead of internal debug state.
|
||||
- **Determinism first**: default scenarios aim for repeatable outcomes with fixed topologies and traffic rates; variability is opt-in.
|
||||
- **Targeted non-determinism**: introduce randomness (e.g., restarts) only when probing resilience or operational robustness.
|
||||
- **Protocol time, not wall time**: reason in blocks and protocol-driven intervals to reduce dependence on host speed or scheduler noise.
|
||||
- **Minimum run window**: always allow enough block production to make assertions meaningful; very short runs risk false confidence.
|
||||
- **Use chaos with intent**: chaos workloads are for recovery and fault-tolerance validation, not for baseline functional checks.
|
||||
|
||||
### Scenario Lifecycle (Conceptual)
|
||||
1. **Build the plan**: Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen.
|
||||
2. **Deploy**: Hand the plan to a runner. It provisions the environment on the chosen backend and waits for nodes to signal readiness.
|
||||
3. **Drive workloads**: Start traffic and behaviors (transactions, data-availability activity, restarts) for the planned duration.
|
||||
4. **Observe blocks and signals**: Track block progression and other high-level metrics during or after the run window to ground assertions in protocol time.
|
||||
5. **Evaluate expectations**: Once activity stops (and optional cooldown completes), check liveness and workload-specific outcomes to decide pass or fail.
|
||||
6. **Cleanup**: Tear down resources so successive runs start fresh and do not inherit leaked state.
|
||||
|
||||
Conceptual lifecycle diagram:
|
||||
```
|
||||
Plan → Deploy → Readiness → Drive Workloads → Observe → Evaluate → Cleanup
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart LR
|
||||
P[Plan<br/>topology + workloads + expectations] --> D[Deploy<br/>runner provisions]
|
||||
D --> R[Readiness<br/>wait for nodes]
|
||||
R --> W[Drive Workloads]
|
||||
W --> O[Observe<br/>blocks/metrics]
|
||||
O --> E[Evaluate Expectations]
|
||||
E --> C[Cleanup]
|
||||
```
|
||||
|
||||
### Design Rationale
|
||||
- **Modular crates** keep configuration, orchestration, workloads, and runners decoupled so each can evolve without breaking the others.
|
||||
- **Pluggable runners** let the same scenario run on a laptop, a Docker host, or a Kubernetes cluster, making validation portable across environments.
|
||||
- **Separated workloads and expectations** clarify intent: what traffic to generate versus how to judge success. This simplifies review and reuse.
|
||||
- **Declarative topology** makes cluster shape explicit and repeatable, reducing surprise when moving between CI and developer machines.
|
||||
- **Maintainability through predictability**: a clear flow from plan to deployment to verification lowers the cost of extending the framework and interpreting failures.
|
||||
|
||||
## Part II — User Guide
|
||||
|
||||
### Workspace Layout
|
||||
The workspace focuses on multi-node integration testing and sits alongside a `nomos-node` checkout. Its crates separate concerns to keep scenarios repeatable and portable:
|
||||
|
||||
- **Configs**: prepares high-level node, network, tracing, and wallet settings used across test environments.
|
||||
- **Core scenario orchestration**: the engine that holds topology descriptions, scenario plans, runtimes, workloads, and expectations.
|
||||
- **Workflows**: ready-made workloads (transactions, data-availability, chaos) and reusable expectations assembled into a user-facing DSL.
|
||||
- **Runners**: deployment backends for local processes, Docker Compose, and Kubernetes, all consuming the same scenario plan.
|
||||
- **Test workflows**: example scenarios and integration checks that show how the pieces fit together.
|
||||
|
||||
This split keeps configuration, orchestration, reusable traffic patterns, and deployment adapters loosely coupled while sharing one mental model for tests.
|
||||
|
||||
### Annotated Tree
|
||||
High-level view of the workspace and how pieces relate:
|
||||
```
|
||||
nomos-testing/
|
||||
├─ testing-framework/
|
||||
│ ├─ configs/ # shared configuration helpers
|
||||
│ ├─ core/ # scenario model, runtime, topology
|
||||
│ ├─ workflows/ # workloads, expectations, DSL extensions
|
||||
│ └─ runners/ # local, compose, k8s deployment backends
|
||||
├─ tests/ # integration scenarios using the framework
|
||||
└─ scripts/ # supporting setup utilities (e.g., assets)
|
||||
```
|
||||
|
||||
Each area maps to a responsibility: describe configs, orchestrate scenarios, package common traffic and assertions, adapt to environments, and demonstrate end-to-end usage.
|
||||
|
||||
### Authoring Scenarios
|
||||
Creating a scenario is a declarative exercise:
|
||||
|
||||
1. **Shape the topology**: decide how many validators and executors to run, and what high-level network and data-availability characteristics matter for the test.
|
||||
2. **Attach workloads**: pick traffic generators that align with your goals (transactions, data-availability blobs, or chaos for resilience probes).
|
||||
3. **Define expectations**: specify the health signals that must hold when the run finishes (e.g., consensus liveness, inclusion of submitted activity; see [Core Content: Workloads & Expectations](workloads.md)).
|
||||
4. **Set duration**: choose a run window long enough to observe meaningful block progression and the effects of your workloads.
|
||||
5. **Choose a runner**: target local processes for fast iteration, Docker Compose for reproducible multi-node stacks, or Kubernetes for cluster-grade validation. For environment considerations, see [Operations](operations.md).
|
||||
|
||||
Keep scenarios small and explicit: make the intended behavior and the success criteria clear so failures are easy to interpret and act upon.
|
||||
|
||||
### Core Content: Workloads & Expectations
|
||||
Workloads describe the activity a scenario generates; expectations describe the signals that must hold when that activity completes. Both are pluggable so scenarios stay readable and purpose-driven.
|
||||
|
||||
#### Workloads
|
||||
- **Transaction workload**: submits user-level transactions at a configurable rate and can limit how many distinct actors participate.
|
||||
- **Data-availability workload**: drives blob and channel activity to exercise data-availability paths.
|
||||
- **Chaos workload**: triggers controlled node restarts to test resilience and recovery behaviors (requires a runner that can control nodes).
|
||||
|
||||
#### Expectations
|
||||
- **Consensus liveness**: verifies the system continues to produce blocks in line with the planned workload and timing window.
|
||||
- **Workload-specific checks**: each workload can attach its own success criteria (e.g., inclusion of submitted activity) so scenarios remain concise.
|
||||
|
||||
Together, workloads and expectations let you express both the pressure applied to the system and the definition of “healthy” for that run.
|
||||
|
||||
Workload pipeline (conceptual):
|
||||
```
|
||||
Inputs (topology + wallets + rates)
|
||||
│
|
||||
▼
|
||||
Workload init → Drive traffic → Collect signals
|
||||
│
|
||||
▼
|
||||
Expectations evaluate
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart TD
|
||||
I[Inputs<br/>(topology + wallets + rates)] --> Init[Workload init]
|
||||
Init --> Drive[Drive traffic]
|
||||
Drive --> Collect[Collect signals]
|
||||
Collect --> Eval[Expectations evaluate]
|
||||
```
|
||||
|
||||
### Core Content: ScenarioBuilderExt Patterns
|
||||
Patterns that keep scenarios readable and reusable:
|
||||
|
||||
- **Topology-first**: start by shaping the cluster (counts, layout) so later steps inherit a clear foundation.
|
||||
- **Bundle defaults**: use the DSL helpers to attach common expectations (like liveness) whenever you add a matching workload, reducing forgotten checks.
|
||||
- **Intentional rates**: express traffic in per-block terms to align with protocol timing rather than wall-clock assumptions.
|
||||
- **Opt-in chaos**: enable restart patterns only in scenarios meant to probe resilience; keep functional smoke tests deterministic.
|
||||
- **Wallet clarity**: seed only the number of actors you need; it keeps transaction scenarios deterministic and interpretable.
|
||||
|
||||
These patterns make scenario definitions self-explanatory while staying aligned with the framework’s block-oriented timing model.
|
||||
|
||||
### Best Practices
|
||||
- **State your intent**: document the goal of each scenario (throughput, DA validation, resilience) so expectation choices are obvious.
|
||||
- **Keep runs meaningful**: choose durations that allow multiple blocks and make timing-based assertions trustworthy.
|
||||
- **Separate concerns**: start with deterministic workloads for functional checks; add chaos in dedicated resilience scenarios to avoid noisy failures.
|
||||
- **Reuse patterns**: standardize on shared topology and workload presets so results are comparable across environments and teams.
|
||||
- **Observe first, tune second**: rely on liveness and inclusion signals to interpret outcomes before tweaking rates or topology.
|
||||
- **Environment fit**: pick runners that match the feedback loop you need—local for speed, compose for reproducible stacks, k8s for cluster-grade fidelity.
|
||||
- **Minimal surprises**: seed only necessary wallets and keep configuration deltas explicit when moving between CI and developer machines.
|
||||
|
||||
### Examples
|
||||
Concrete scenario shapes that illustrate how to combine topologies, workloads, and expectations. Adjust counts, rates, and durations to fit your environment.
|
||||
|
||||
#### Simple 2-validator transaction workload
|
||||
- **Topology**: two validators.
|
||||
- **Workload**: transaction submissions at a modest per-block rate with a small set of wallet actors.
|
||||
- **Expectations**: consensus liveness and inclusion of submitted activity.
|
||||
- **When to use**: smoke tests for consensus and transaction flow on minimal hardware.
|
||||
|
||||
#### DA + transaction workload
|
||||
- **Topology**: validators plus executors if available.
|
||||
- **Workloads**: data-availability blobs/channels and transactions running together to stress both paths.
|
||||
- **Expectations**: consensus liveness and workload-level inclusion/availability checks.
|
||||
- **When to use**: end-to-end coverage of transaction and DA layers in one run.
|
||||
|
||||
#### Chaos + liveness check
|
||||
- **Topology**: validators (optionally executors) with node control enabled.
|
||||
- **Workloads**: baseline traffic (transactions or DA) plus chaos restarts on selected roles.
|
||||
- **Expectations**: consensus liveness to confirm the system keeps progressing despite restarts; workload-specific inclusion if traffic is present.
|
||||
- **When to use**: resilience validation and operational readiness drills.
|
||||
|
||||
### Advanced & Artificial Examples
|
||||
These illustrative scenarios stretch the framework to show how to build new workloads, expectations, deployers, and topology tricks. They are intentionally “synthetic” to teach capabilities rather than prescribe production tests.
|
||||
|
||||
#### Synthetic Delay Workload (Network Latency Simulation)
|
||||
- **Idea**: inject fake latency between node interactions using internal timers, not OS-level tooling.
|
||||
- **Demonstrates**: sequencing control inside a workload, verifying protocol progression under induced lag, using timers to pace submissions.
|
||||
- **Shape**: wrap submissions in delays that mimic slow peers; ensure the expectation checks blocks still progress.
|
||||
|
||||
#### Oscillating Load Workload (Traffic Waves)
|
||||
- **Idea**: traffic rate changes every block or N seconds (e.g., blocks 1–3 low, 4–5 high, 6–7 zero, repeat).
|
||||
- **Demonstrates**: dynamic, stateful workloads that use `RunMetrics` to time phases; modeling real-world burstiness.
|
||||
- **Shape**: schedule per-phase rates; confirm inclusion/liveness across peaks and troughs.
|
||||
|
||||
#### Byzantine Behavior Mock
|
||||
- **Idea**: a workload that drops half its planned submissions, sometimes double-submits, and intentionally triggers expectation failures.
|
||||
- **Demonstrates**: negative testing, resilience checks, and the value of clear expectations when behavior is adversarial by design.
|
||||
- **Shape**: parameterize drop/double-submit probabilities; pair with an expectation that documents what “bad” looks like.
|
||||
|
||||
#### Custom Expectation: Block Finality Drift
|
||||
- **Idea**: assert the last few blocks differ and block time stays within a tolerated drift budget.
|
||||
- **Demonstrates**: consuming `BlockFeed` or time-series metrics to validate protocol cadence; crafting post-run assertions around block diversity and timing.
|
||||
- **Shape**: collect recent blocks, confirm no duplicates, and compare observed intervals to a drift threshold.
|
||||
|
||||
#### Custom Deployer: Dry-Run Deployer
|
||||
- **Idea**: a deployer that never starts nodes; it emits configs, simulates readiness, provides fake blockfeed/metrics.
|
||||
- **Demonstrates**: full power of the deployer interface for CI dry-runs, config verification, and ultra-fast feedback without Nomos binaries.
|
||||
- **Shape**: produce logs/artifacts, stub readiness, and feed synthetic blocks so expectations can still run.
|
||||
|
||||
#### Stochastic Topology Generator
|
||||
- **Idea**: topology parameters change at runtime (random validators, DA settings, network shapes).
|
||||
- **Demonstrates**: randomized property testing and fuzzing approaches to topology building.
|
||||
- **Shape**: pick roles and network layouts randomly per run; keep expectations tolerant to variability while still asserting core liveness.
|
||||
|
||||
#### Multi-Phase Scenario (“Pipelines”)
|
||||
- **Idea**: scenario runs in phases (e.g., phase 1 transactions, phase 2 DA, phase 3 restarts, phase 4 sync check).
|
||||
- **Demonstrates**: multi-stage tests, modular scenario assembly, and deliberate lifecycle control.
|
||||
- **Shape**: drive phase-specific workloads/expectations sequentially; enforce clear boundaries and post-phase checks.
|
||||
|
||||
### Running Scenarios
|
||||
Running a scenario follows the same conceptual flow regardless of environment:
|
||||
|
||||
1. Select or author a scenario plan that pairs a topology with workloads, expectations, and a suitable run window.
|
||||
2. Choose a runner aligned with your environment (local, compose, or k8s) and ensure its prerequisites are available.
|
||||
3. Deploy the plan through the runner; wait for readiness signals before starting workloads.
|
||||
4. Let workloads drive activity for the planned duration; keep observability signals visible so you can correlate outcomes.
|
||||
5. Evaluate expectations and capture results as the primary pass/fail signal.
|
||||
|
||||
Use the same plan across different runners to compare behavior between local development and CI or cluster settings. For environment prerequisites and flags, see [Operations](operations.md).
|
||||
|
||||
### Runners
|
||||
Runners turn a scenario plan into a live environment while keeping the plan unchanged. Choose based on feedback speed, reproducibility, and fidelity. For environment and operational considerations, see [Operations](operations.md):
|
||||
|
||||
#### Local runner
|
||||
- Launches node processes directly on the host.
|
||||
- Fastest feedback loop and minimal orchestration overhead.
|
||||
- Best for development-time iteration and debugging.
|
||||
|
||||
#### Docker Compose runner
|
||||
- Starts nodes in containers to provide a reproducible multi-node stack on a single machine.
|
||||
- Discovers service ports and wires observability for convenient inspection.
|
||||
- Good balance between fidelity and ease of setup.
|
||||
|
||||
#### Kubernetes runner
|
||||
- Deploys nodes onto a cluster for higher-fidelity, longer-running scenarios.
|
||||
- Suits CI or shared environments where cluster behavior and scheduling matter.
|
||||
|
||||
#### Common expectations
|
||||
- All runners require at least one validator and, for transaction scenarios, access to seeded wallets.
|
||||
- Readiness probes gate workload start so traffic begins only after nodes are reachable.
|
||||
- Environment flags can relax timeouts or increase tracing when diagnostics are needed.
|
||||
|
||||
Runner abstraction:
|
||||
```
|
||||
Scenario Plan
|
||||
│
|
||||
▼
|
||||
Runner (local | compose | k8s)
|
||||
│ provisions env + readiness
|
||||
▼
|
||||
Runtime + Observability
|
||||
│
|
||||
▼
|
||||
Workloads / Expectations execute
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Plan[Scenario Plan] --> RunSel{Runner<br/>(local | compose | k8s)}
|
||||
RunSel --> Provision[Provision & readiness]
|
||||
Provision --> Runtime[Runtime + observability]
|
||||
Runtime --> Exec[Workloads & Expectations execute]
|
||||
```
|
||||
|
||||
### Operations
|
||||
Operational readiness focuses on prerequisites, environment fit, and clear signals:
|
||||
|
||||
- **Prerequisites**: keep a sibling `nomos-node` checkout available; ensure the chosen runner’s platform needs are met (local binaries for host runs, Docker for compose, cluster access for k8s).
|
||||
- **Artifacts**: some scenarios depend on prover or circuit assets; fetch them ahead of time with the provided helper scripts when needed.
|
||||
- **Environment flags**: use slow-environment toggles to relax timeouts, enable tracing when debugging, and adjust observability ports to avoid clashes.
|
||||
- **Readiness checks**: verify runners report node readiness before starting workloads; this avoids false negatives from starting too early.
|
||||
- **Failure triage**: map failures to missing prerequisites (wallet seeding, node control availability), runner platform issues, or unmet expectations. Start with liveness signals, then dive into workload-specific assertions.
|
||||
|
||||
Treat operational hygiene—assets present, prerequisites satisfied, observability reachable—as the first step to reliable scenario outcomes.
|
||||
|
||||
Metrics and observability flow:
|
||||
```
|
||||
Runner exposes endpoints/ports
|
||||
│
|
||||
▼
|
||||
Runtime collects block/health signals
|
||||
│
|
||||
▼
|
||||
Expectations consume signals to decide pass/fail
|
||||
│
|
||||
▼
|
||||
Operators inspect logs/metrics when failures arise
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Expose[Runner exposes endpoints/ports] --> Collect[Runtime collects block/health signals]
|
||||
Collect --> Consume[Expectations consume signals<br/>decide pass/fail]
|
||||
Consume --> Inspect[Operators inspect logs/metrics<br/>when failures arise]
|
||||
```
|
||||
|
||||
## Part III — Developer Reference
|
||||
|
||||
### Scenario Model (Developer Level)
|
||||
The scenario model defines clear, composable responsibilities:
|
||||
|
||||
- **Topology**: a declarative description of the cluster—how many nodes, their roles, and the broad network and data-availability characteristics. It represents the intended shape of the system under test.
|
||||
- **Scenario**: a plan combining topology, workloads, expectations, and a run window. Building a scenario validates prerequisites (like seeded wallets) and ensures the run lasts long enough to observe meaningful block progression.
|
||||
- **Workloads**: asynchronous tasks that generate traffic or conditions. They use shared context to interact with the deployed cluster and may bundle default expectations.
|
||||
- **Expectations**: post-run assertions. They can capture baselines before workloads start and evaluate success once activity stops.
|
||||
- **Runtime**: coordinates workloads and expectations for the configured duration, enforces cooldowns when control actions occur, and ensures cleanup so runs do not leak resources.
|
||||
|
||||
Developers extending the model should keep these boundaries strict: topology describes, scenarios assemble, runners deploy, workloads drive, and expectations judge outcomes. For guidance on adding new capabilities, see [Extending the Framework](extending.md).
|
||||
|
||||
### Extending the Framework
|
||||
#### Adding a workload
|
||||
1) Implement the workload contract: provide a name, optional bundled expectations, validate prerequisites up front, and drive asynchronous activity against the deployed cluster.
|
||||
2) Export it through the workflows layer and consider adding DSL helpers for ergonomic wiring.
|
||||
|
||||
#### Adding an expectation
|
||||
1) Implement the expectation contract: capture baselines if needed and evaluate outcomes after workloads finish; report meaningful errors to aid debugging.
|
||||
2) Expose reusable expectations from the workflows layer so scenarios can attach them declaratively.
|
||||
|
||||
#### Adding a runner
|
||||
1) Implement the deployer contract for the target backend, producing a runtime context with client access, metrics endpoints, and optional node control.
|
||||
2) Preserve cleanup guarantees so resources are reclaimed even when runs fail; mirror readiness and observation signals used by existing runners for consistency.
|
||||
|
||||
#### Adding topology helpers
|
||||
Extend the topology description with new layouts or presets while keeping defaults safe and predictable; favor declarative inputs over ad hoc logic so scenarios stay reviewable.
|
||||
|
||||
### Internal Crate Reference
|
||||
High-level roles of the crates that make up the framework:
|
||||
|
||||
- **Configs**: prepares reusable configuration primitives for nodes, networking, tracing, data availability, and wallets, shared by all scenarios and runners.
|
||||
- **Core scenario orchestration**: houses the topology and scenario model, runtime coordination, node clients, and readiness/health probes.
|
||||
- **Workflows**: packages workloads and expectations into reusable building blocks and offers a fluent DSL to assemble them.
|
||||
- **Runners**: implements deployment backends (local host, Docker Compose, Kubernetes) that all consume the same scenario plan.
|
||||
- **Test workflows**: example scenarios and integration checks that exercise the framework end to end and serve as living documentation.
|
||||
|
||||
Use this map to locate where to add new capabilities: configuration primitives in configs, orchestration changes in core, reusable traffic/assertions in workflows, environment adapters in runners, and demonstrations in tests.
|
||||
|
||||
### Example: New Workload & Expectation (Rust)
|
||||
A minimal, end-to-end illustration of adding a custom workload and matching expectation. This shows the shape of the traits and where to plug into the framework; expand the logic to fit your real test.
|
||||
|
||||
#### Workload: simple reachability probe
|
||||
Key ideas:
|
||||
- **name**: identifies the workload in logs.
|
||||
- **expectations**: workloads can bundle defaults so callers don’t forget checks.
|
||||
- **init**: derive inputs from the generated topology (e.g., pick a target node).
|
||||
- **start**: drive async activity using the shared `RunContext`.
|
||||
|
||||
```rust
|
||||
use std::sync::Arc;
|
||||
use async_trait::async_trait;
|
||||
use testing_framework_core::scenario::{
|
||||
DynError, Expectation, RunContext, RunMetrics, Workload,
|
||||
};
|
||||
use testing_framework_core::topology::GeneratedTopology;
|
||||
|
||||
pub struct ReachabilityWorkload {
|
||||
target_idx: usize,
|
||||
bundled: Vec<Box<dyn Expectation>>,
|
||||
}
|
||||
|
||||
impl ReachabilityWorkload {
|
||||
pub fn new(target_idx: usize) -> Self {
|
||||
Self {
|
||||
target_idx,
|
||||
bundled: vec![Box::new(ReachabilityExpectation::new(target_idx))],
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Workload for ReachabilityWorkload {
|
||||
fn name(&self) -> &'static str {
|
||||
"reachability_workload"
|
||||
}
|
||||
|
||||
fn expectations(&self) -> Vec<Box<dyn Expectation>> {
|
||||
self.bundled.clone()
|
||||
}
|
||||
|
||||
fn init(
|
||||
&mut self,
|
||||
topology: &GeneratedTopology,
|
||||
_metrics: &RunMetrics,
|
||||
) -> Result<(), DynError> {
|
||||
if topology.validators().get(self.target_idx).is_none() {
|
||||
return Err("no validator at requested index".into());
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn start(&self, ctx: &RunContext) -> Result<(), DynError> {
|
||||
let client = ctx
|
||||
.clients()
|
||||
.validators()
|
||||
.get(self.target_idx)
|
||||
.ok_or("missing target client")?;
|
||||
|
||||
// Pseudo-action: issue a lightweight RPC to prove reachability.
|
||||
client.health_check().await.map_err(|e| e.into())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Expectation: confirm the target stayed reachable
|
||||
Key ideas:
|
||||
- **start_capture**: snapshot baseline if needed (not used here).
|
||||
- **evaluate**: assert the condition after workloads finish.
|
||||
|
||||
```rust
|
||||
use async_trait::async_trait;
|
||||
use testing_framework_core::scenario::{DynError, Expectation, RunContext};
|
||||
|
||||
pub struct ReachabilityExpectation {
|
||||
target_idx: usize,
|
||||
}
|
||||
|
||||
impl ReachabilityExpectation {
|
||||
pub fn new(target_idx: usize) -> Self {
|
||||
Self { target_idx }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Expectation for ReachabilityExpectation {
|
||||
fn name(&self) -> &str {
|
||||
"target_reachable"
|
||||
}
|
||||
|
||||
async fn evaluate(&mut self, ctx: &RunContext) -> Result<(), DynError> {
|
||||
let client = ctx
|
||||
.clients()
|
||||
.validators()
|
||||
.get(self.target_idx)
|
||||
.ok_or("missing target client")?;
|
||||
|
||||
client.health_check().await.map_err(|e| {
|
||||
format!("target became unreachable during run: {e}").into()
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### How to wire it
|
||||
- Build your scenario as usual and call `.with_workload(ReachabilityWorkload::new(0))`.
|
||||
- The bundled expectation is attached automatically; you can add more with `.with_expectation(...)` if needed.
|
||||
- Keep the logic minimal and fast for smoke tests; grow it into richer probes for deeper scenarios.
|
||||
|
||||
## Part IV — Appendix
|
||||
|
||||
### DSL Cheat Sheet
|
||||
The framework offers a fluent builder style to keep scenarios readable. Common knobs:
|
||||
|
||||
- **Topology shaping**: set validator and executor counts, pick a network layout style, and adjust high-level data-availability traits.
|
||||
- **Wallet seeding**: define how many users participate and the total funds available for transaction workloads.
|
||||
- **Workload tuning**: configure transaction rates, data-availability channel and blob rates, and whether chaos restarts should include validators, executors, or both.
|
||||
- **Expectations**: attach liveness and workload-specific checks so success is explicit.
|
||||
- **Run window**: set a minimum duration long enough for multiple blocks to be observed and verified.
|
||||
|
||||
Use these knobs to express intent clearly, keeping scenario definitions concise and consistent across teams.
|
||||
|
||||
### Troubleshooting Scenarios
|
||||
Common symptoms and likely causes:
|
||||
|
||||
- **No or slow block progression**: runner started workloads before readiness, insufficient run window, or environment too slow—extend duration or enable slow-environment tuning.
|
||||
- **Transactions not included**: missing or insufficient wallet seeding, misaligned transaction rate with block cadence, or network instability—reduce rate and verify wallet setup.
|
||||
- **Chaos stalls the run**: node control not available for the chosen runner or restart cadence too aggressive—enable control capability and widen restart intervals.
|
||||
- **Observability gaps**: metrics or logs unreachable because ports clash or services are not exposed—adjust observability ports and confirm runner wiring.
|
||||
- **Flaky behavior across runs**: mixing chaos with functional smoke tests or inconsistent topology between environments—separate deterministic and chaos scenarios and standardize topology presets.
|
||||
|
||||
### FAQ
|
||||
**Why block-oriented timing?**
|
||||
Using block cadence reduces dependence on host speed and keeps assertions aligned with protocol behavior.
|
||||
|
||||
**Can I reuse the same scenario across runners?**
|
||||
Yes. The plan stays the same; swap runners (local, compose, k8s) to target different environments.
|
||||
|
||||
**When should I enable chaos workloads?**
|
||||
Only when testing resilience or operational recovery; keep functional smoke tests deterministic.
|
||||
|
||||
**How long should runs be?**
|
||||
Long enough for multiple blocks so liveness and inclusion checks are meaningful; very short runs risk false confidence.
|
||||
|
||||
**Do I always need seeded wallets?**
|
||||
Only for transaction scenarios. Data-availability or pure chaos scenarios may not require them, but liveness checks still need validators producing blocks.
|
||||
|
||||
**What if expectations fail but workloads “look fine”?**
|
||||
Trust expectations first—they capture the intended success criteria. Use the observability signals and runner logs to pinpoint why the system missed the target.
|
||||
|
||||
### Glossary
|
||||
- **Validator**: node role responsible for participating in consensus and block production.
|
||||
- **Executor**: node role that processes transactions or workloads delegated by validators.
|
||||
- **DA (Data Availability)**: subsystem ensuring blobs or channel data are published and retrievable for validation.
|
||||
- **Workload**: traffic or behavior generator that exercises the system during a scenario run.
|
||||
- **Expectation**: post-run assertion that judges whether the system met the intended success criteria.
|
||||
- **Topology**: declarative description of the cluster shape, roles, and high-level parameters for a scenario.
|
||||
- **Blockfeed**: stream of block observations used for liveness or inclusion signals during a run.
|
||||
- **Control capability**: the ability for a runner to start, stop, or restart nodes, used by chaos workloads.
|
||||
1711
book/nomos_testing_framework_book_v4.md
Normal file
1711
book/nomos_testing_framework_book_v4.md
Normal file
File diff suppressed because it is too large
Load Diff
31
book/src/SUMMARY.md
Normal file
31
book/src/SUMMARY.md
Normal file
@ -0,0 +1,31 @@
|
||||
# Summary
|
||||
- [Project Context Primer](project-context-primer.md)
|
||||
- [What You Will Learn](what-you-will-learn.md)
|
||||
- [Part I — Foundations](part-i.md)
|
||||
- [Introduction](introduction.md)
|
||||
- [Architecture Overview](architecture-overview.md)
|
||||
- [Testing Philosophy](testing-philosophy.md)
|
||||
- [Scenario Lifecycle (Conceptual)](scenario-lifecycle.md)
|
||||
- [Design Rationale](design-rationale.md)
|
||||
- [Part II — User Guide](part-ii.md)
|
||||
- [Workspace Layout](workspace-layout.md)
|
||||
- [Annotated Tree](annotated-tree.md)
|
||||
- [Authoring Scenarios](authoring-scenarios.md)
|
||||
- [Core Content: Workloads & Expectations](workloads.md)
|
||||
- [Core Content: ScenarioBuilderExt Patterns](scenario-builder-ext-patterns.md)
|
||||
- [Best Practices](best-practices.md)
|
||||
- [Examples](examples.md)
|
||||
- [Advanced & Artificial Examples](examples-advanced.md)
|
||||
- [Running Scenarios](running-scenarios.md)
|
||||
- [Runners](runners.md)
|
||||
- [Operations](operations.md)
|
||||
- [Part III — Developer Reference](part-iii.md)
|
||||
- [Scenario Model (Developer Level)](scenario-model.md)
|
||||
- [Extending the Framework](extending.md)
|
||||
- [Example: New Workload & Expectation (Rust)](custom-workload-example.md)
|
||||
- [Internal Crate Reference](internal-crate-reference.md)
|
||||
- [Part IV — Appendix](part-iv.md)
|
||||
- [DSL Cheat Sheet](dsl-cheat-sheet.md)
|
||||
- [Troubleshooting Scenarios](troubleshooting.md)
|
||||
- [FAQ](faq.md)
|
||||
- [Glossary](glossary.md)
|
||||
17
book/src/annotated-tree.md
Normal file
17
book/src/annotated-tree.md
Normal file
@ -0,0 +1,17 @@
|
||||
# Annotated Tree
|
||||
|
||||
High-level view of the workspace and how pieces relate:
|
||||
```
|
||||
nomos-testing/
|
||||
├─ testing-framework/
|
||||
│ ├─ configs/ # shared configuration helpers
|
||||
│ ├─ core/ # scenario model, runtime, topology
|
||||
│ ├─ workflows/ # workloads, expectations, DSL extensions
|
||||
│ └─ runners/ # local, compose, k8s deployment backends
|
||||
├─ tests/ # integration scenarios using the framework
|
||||
└─ scripts/ # supporting setup utilities (e.g., assets)
|
||||
```
|
||||
|
||||
Each area maps to a responsibility: describe configs, orchestrate scenarios,
|
||||
package common traffic and assertions, adapt to environments, and demonstrate
|
||||
end-to-end usage.
|
||||
29
book/src/architecture-overview.md
Normal file
29
book/src/architecture-overview.md
Normal file
@ -0,0 +1,29 @@
|
||||
# Architecture Overview
|
||||
|
||||
The framework follows a clear flow: **Topology → Scenario → Runner → Workloads → Expectations**.
|
||||
|
||||
- **Topology** describes the cluster: how many nodes, their roles, and the high-level network and data-availability parameters they should follow.
|
||||
- **Scenario** combines that topology with the activities to run and the checks to perform, forming a single plan.
|
||||
- **Deployer/Runner** pair turns the plan into a live environment on the chosen backend (local processes, Docker Compose, or Kubernetes) and brokers readiness.
|
||||
- **Workloads** generate traffic and conditions that exercise the system.
|
||||
- **Expectations** observe the run and judge success or failure once activity completes.
|
||||
|
||||
Conceptual diagram:
|
||||
```
|
||||
Topology → Scenario → Runner → Workloads → Expectations
|
||||
(shape (plan) (deploy (drive (verify
|
||||
cluster) & orchestrate) traffic) outcomes)
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A(Topology<br/>shape cluster) --> B(Scenario<br/>plan)
|
||||
B --> C(Deployer/Runner<br/>deploy & orchestrate)
|
||||
C --> D(Workloads<br/>drive traffic)
|
||||
D --> E(Expectations<br/>verify outcomes)
|
||||
```
|
||||
|
||||
Each layer has a narrow responsibility so that cluster shape, deployment choice,
|
||||
traffic generation, and health checks can evolve independently while fitting
|
||||
together predictably.
|
||||
20
book/src/authoring-scenarios.md
Normal file
20
book/src/authoring-scenarios.md
Normal file
@ -0,0 +1,20 @@
|
||||
# Authoring Scenarios
|
||||
|
||||
Creating a scenario is a declarative exercise:
|
||||
|
||||
1. **Shape the topology**: decide how many validators and executors to run, and
|
||||
what high-level network and data-availability characteristics matter for the
|
||||
test.
|
||||
2. **Attach workloads**: pick traffic generators that align with your goals
|
||||
(transactions, data-availability blobs, or chaos for resilience probes).
|
||||
3. **Define expectations**: specify the health signals that must hold when the
|
||||
run finishes (e.g., consensus liveness, inclusion of submitted activity; see
|
||||
[Core Content: Workloads & Expectations](workloads.md)).
|
||||
4. **Set duration**: choose a run window long enough to observe meaningful
|
||||
block progression and the effects of your workloads.
|
||||
5. **Choose a runner**: target local processes for fast iteration, Docker
|
||||
Compose for reproducible multi-node stacks, or Kubernetes for cluster-grade
|
||||
validation. For environment considerations, see [Operations](operations.md).
|
||||
|
||||
Keep scenarios small and explicit: make the intended behavior and the success
|
||||
criteria clear so failures are easy to interpret and act upon.
|
||||
16
book/src/best-practices.md
Normal file
16
book/src/best-practices.md
Normal file
@ -0,0 +1,16 @@
|
||||
# Best Practices
|
||||
|
||||
- **State your intent**: document the goal of each scenario (throughput, DA
|
||||
validation, resilience) so expectation choices are obvious.
|
||||
- **Keep runs meaningful**: choose durations that allow multiple blocks and make
|
||||
timing-based assertions trustworthy.
|
||||
- **Separate concerns**: start with deterministic workloads for functional
|
||||
checks; add chaos in dedicated resilience scenarios to avoid noisy failures.
|
||||
- **Reuse patterns**: standardize on shared topology and workload presets so
|
||||
results are comparable across environments and teams.
|
||||
- **Observe first, tune second**: rely on liveness and inclusion signals to
|
||||
interpret outcomes before tweaking rates or topology.
|
||||
- **Environment fit**: pick runners that match the feedback loop you need—local
|
||||
for speed, compose for reproducible stacks, k8s for cluster-grade fidelity.
|
||||
- **Minimal surprises**: seed only necessary wallets and keep configuration
|
||||
deltas explicit when moving between CI and developer machines.
|
||||
116
book/src/custom-workload-example.md
Normal file
116
book/src/custom-workload-example.md
Normal file
@ -0,0 +1,116 @@
|
||||
# Example: New Workload & Expectation (Rust)
|
||||
|
||||
A minimal, end-to-end illustration of adding a custom workload and matching
|
||||
expectation. This shows the shape of the traits and where to plug into the
|
||||
framework; expand the logic to fit your real test.
|
||||
|
||||
## Workload: simple reachability probe
|
||||
|
||||
Key ideas:
|
||||
- **name**: identifies the workload in logs.
|
||||
- **expectations**: workloads can bundle defaults so callers don’t forget checks.
|
||||
- **init**: derive inputs from the generated topology (e.g., pick a target node).
|
||||
- **start**: drive async activity using the shared `RunContext`.
|
||||
|
||||
```rust
|
||||
use std::sync::Arc;
|
||||
use async_trait::async_trait;
|
||||
use testing_framework_core::scenario::{
|
||||
DynError, Expectation, RunContext, RunMetrics, Workload,
|
||||
};
|
||||
use testing_framework_core::topology::GeneratedTopology;
|
||||
|
||||
pub struct ReachabilityWorkload {
|
||||
target_idx: usize,
|
||||
bundled: Vec<Box<dyn Expectation>>,
|
||||
}
|
||||
|
||||
impl ReachabilityWorkload {
|
||||
pub fn new(target_idx: usize) -> Self {
|
||||
Self {
|
||||
target_idx,
|
||||
bundled: vec![Box::new(ReachabilityExpectation::new(target_idx))],
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Workload for ReachabilityWorkload {
|
||||
fn name(&self) -> &'static str {
|
||||
"reachability_workload"
|
||||
}
|
||||
|
||||
fn expectations(&self) -> Vec<Box<dyn Expectation>> {
|
||||
self.bundled.clone()
|
||||
}
|
||||
|
||||
fn init(
|
||||
&mut self,
|
||||
topology: &GeneratedTopology,
|
||||
_metrics: &RunMetrics,
|
||||
) -> Result<(), DynError> {
|
||||
if topology.validators().get(self.target_idx).is_none() {
|
||||
return Err("no validator at requested index".into());
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn start(&self, ctx: &RunContext) -> Result<(), DynError> {
|
||||
let client = ctx
|
||||
.clients()
|
||||
.validators()
|
||||
.get(self.target_idx)
|
||||
.ok_or("missing target client")?;
|
||||
|
||||
// Pseudo-action: issue a lightweight RPC to prove reachability.
|
||||
client.health_check().await.map_err(|e| e.into())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Expectation: confirm the target stayed reachable
|
||||
|
||||
Key ideas:
|
||||
- **start_capture**: snapshot baseline if needed (not used here).
|
||||
- **evaluate**: assert the condition after workloads finish.
|
||||
|
||||
```rust
|
||||
use async_trait::async_trait;
|
||||
use testing_framework_core::scenario::{DynError, Expectation, RunContext};
|
||||
|
||||
pub struct ReachabilityExpectation {
|
||||
target_idx: usize,
|
||||
}
|
||||
|
||||
impl ReachabilityExpectation {
|
||||
pub fn new(target_idx: usize) -> Self {
|
||||
Self { target_idx }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Expectation for ReachabilityExpectation {
|
||||
fn name(&self) -> &str {
|
||||
"target_reachable"
|
||||
}
|
||||
|
||||
async fn evaluate(&mut self, ctx: &RunContext) -> Result<(), DynError> {
|
||||
let client = ctx
|
||||
.clients()
|
||||
.validators()
|
||||
.get(self.target_idx)
|
||||
.ok_or("missing target client")?;
|
||||
|
||||
client.health_check().await.map_err(|e| {
|
||||
format!("target became unreachable during run: {e}").into()
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## How to wire it
|
||||
- Build your scenario as usual and call `.with_workload(ReachabilityWorkload::new(0))`.
|
||||
- The bundled expectation is attached automatically; you can add more with
|
||||
`.with_expectation(...)` if needed.
|
||||
- Keep the logic minimal and fast for smoke tests; grow it into richer probes
|
||||
for deeper scenarios.
|
||||
7
book/src/design-rationale.md
Normal file
7
book/src/design-rationale.md
Normal file
@ -0,0 +1,7 @@
|
||||
# Design Rationale
|
||||
|
||||
- **Modular crates** keep configuration, orchestration, workloads, and runners decoupled so each can evolve without breaking the others.
|
||||
- **Pluggable runners** let the same scenario run on a laptop, a Docker host, or a Kubernetes cluster, making validation portable across environments.
|
||||
- **Separated workloads and expectations** clarify intent: what traffic to generate versus how to judge success. This simplifies review and reuse.
|
||||
- **Declarative topology** makes cluster shape explicit and repeatable, reducing surprise when moving between CI and developer machines.
|
||||
- **Maintainability through predictability**: a clear flow from plan to deployment to verification lowers the cost of extending the framework and interpreting failures.
|
||||
19
book/src/dsl-cheat-sheet.md
Normal file
19
book/src/dsl-cheat-sheet.md
Normal file
@ -0,0 +1,19 @@
|
||||
# Core Content: DSL Cheat Sheet
|
||||
|
||||
The framework offers a fluent builder style to keep scenarios readable. Common
|
||||
knobs:
|
||||
|
||||
- **Topology shaping**: set validator and executor counts, pick a network layout
|
||||
style, and adjust high-level data-availability traits.
|
||||
- **Wallet seeding**: define how many users participate and the total funds
|
||||
available for transaction workloads.
|
||||
- **Workload tuning**: configure transaction rates, data-availability channel
|
||||
and blob rates, and whether chaos restarts should include validators,
|
||||
executors, or both.
|
||||
- **Expectations**: attach liveness and workload-specific checks so success is
|
||||
explicit.
|
||||
- **Run window**: set a minimum duration long enough for multiple blocks to be
|
||||
observed and verified.
|
||||
|
||||
Use these knobs to express intent clearly, keeping scenario definitions concise
|
||||
and consistent across teams.
|
||||
62
book/src/examples-advanced.md
Normal file
62
book/src/examples-advanced.md
Normal file
@ -0,0 +1,62 @@
|
||||
# Advanced & Artificial Examples
|
||||
|
||||
These illustrative scenarios stretch the framework to show how to build new
|
||||
workloads, expectations, deployers, and topology tricks. They are intentionally
|
||||
“synthetic” to teach capabilities rather than prescribe production tests.
|
||||
|
||||
## Synthetic Delay Workload (Network Latency Simulation)
|
||||
- **Idea**: inject fake latency between node interactions using internal timers,
|
||||
not OS-level tooling.
|
||||
- **Demonstrates**: sequencing control inside a workload, verifying protocol
|
||||
progression under induced lag, using timers to pace submissions.
|
||||
- **Shape**: wrap submissions in delays that mimic slow peers; ensure the
|
||||
expectation checks blocks still progress.
|
||||
|
||||
## Oscillating Load Workload (Traffic Waves)
|
||||
- **Idea**: traffic rate changes every block or N seconds (e.g., blocks 1–3 low,
|
||||
4–5 high, 6–7 zero, repeat).
|
||||
- **Demonstrates**: dynamic, stateful workloads that use `RunMetrics` to time
|
||||
phases; modeling real-world burstiness.
|
||||
- **Shape**: schedule per-phase rates; confirm inclusion/liveness across peaks
|
||||
and troughs.
|
||||
|
||||
## Byzantine Behavior Mock
|
||||
- **Idea**: a workload that drops half its planned submissions, sometimes
|
||||
double-submits, and intentionally triggers expectation failures.
|
||||
- **Demonstrates**: negative testing, resilience checks, and the value of clear
|
||||
expectations when behavior is adversarial by design.
|
||||
- **Shape**: parameterize drop/double-submit probabilities; pair with an
|
||||
expectation that documents what “bad” looks like.
|
||||
|
||||
## Custom Expectation: Block Finality Drift
|
||||
- **Idea**: assert the last few blocks differ and block time stays within a
|
||||
tolerated drift budget.
|
||||
- **Demonstrates**: consuming `BlockFeed` or time-series metrics to validate
|
||||
protocol cadence; crafting post-run assertions around block diversity and
|
||||
timing.
|
||||
- **Shape**: collect recent blocks, confirm no duplicates, and compare observed
|
||||
intervals to a drift threshold.
|
||||
|
||||
## Custom Deployer: Dry-Run Deployer
|
||||
- **Idea**: a deployer that never starts nodes; it emits configs, simulates
|
||||
readiness, provides fake blockfeed/metrics.
|
||||
- **Demonstrates**: full power of the deployer interface for CI dry-runs,
|
||||
config verification, and ultra-fast feedback without Nomos binaries.
|
||||
- **Shape**: produce logs/artifacts, stub readiness, and feed synthetic blocks
|
||||
so expectations can still run.
|
||||
|
||||
## Stochastic Topology Generator
|
||||
- **Idea**: topology parameters change at runtime (random validators, DA
|
||||
settings, network shapes).
|
||||
- **Demonstrates**: randomized property testing and fuzzing approaches to
|
||||
topology building.
|
||||
- **Shape**: pick roles and network layouts randomly per run; keep expectations
|
||||
tolerant to variability while still asserting core liveness.
|
||||
|
||||
## Multi-Phase Scenario (“Pipelines”)
|
||||
- **Idea**: scenario runs in phases (e.g., phase 1 transactions, phase 2 DA,
|
||||
phase 3 restarts, phase 4 sync check).
|
||||
- **Demonstrates**: multi-stage tests, modular scenario assembly, and deliberate
|
||||
lifecycle control.
|
||||
- **Shape**: drive phase-specific workloads/expectations sequentially; enforce
|
||||
clear boundaries and post-phase checks.
|
||||
28
book/src/examples.md
Normal file
28
book/src/examples.md
Normal file
@ -0,0 +1,28 @@
|
||||
# Examples
|
||||
|
||||
Concrete scenario shapes that illustrate how to combine topologies, workloads,
|
||||
and expectations. Adjust counts, rates, and durations to fit your environment.
|
||||
|
||||
## Simple 2-validator transaction workload
|
||||
- **Topology**: two validators.
|
||||
- **Workload**: transaction submissions at a modest per-block rate with a small
|
||||
set of wallet actors.
|
||||
- **Expectations**: consensus liveness and inclusion of submitted activity.
|
||||
- **When to use**: smoke tests for consensus and transaction flow on minimal
|
||||
hardware.
|
||||
|
||||
## DA + transaction workload
|
||||
- **Topology**: validators plus executors if available.
|
||||
- **Workloads**: data-availability blobs/channels and transactions running
|
||||
together to stress both paths.
|
||||
- **Expectations**: consensus liveness and workload-level inclusion/availability
|
||||
checks.
|
||||
- **When to use**: end-to-end coverage of transaction and DA layers in one run.
|
||||
|
||||
## Chaos + liveness check
|
||||
- **Topology**: validators (optionally executors) with node control enabled.
|
||||
- **Workloads**: baseline traffic (transactions or DA) plus chaos restarts on
|
||||
selected roles.
|
||||
- **Expectations**: consensus liveness to confirm the system keeps progressing
|
||||
despite restarts; workload-specific inclusion if traffic is present.
|
||||
- **When to use**: resilience validation and operational readiness drills.
|
||||
31
book/src/extending.md
Normal file
31
book/src/extending.md
Normal file
@ -0,0 +1,31 @@
|
||||
# Extending the Framework
|
||||
|
||||
## Adding a workload
|
||||
1) Implement `testing_framework_core::scenario::Workload`:
|
||||
- Provide a name and any bundled expectations.
|
||||
- In `init`, derive inputs from `GeneratedTopology` and `RunMetrics`; fail
|
||||
fast if prerequisites are missing (e.g., wallet data, node addresses).
|
||||
- In `start`, drive async traffic using the `RunContext` clients.
|
||||
2) Expose the workload from a module under `testing-framework/workflows` and
|
||||
consider adding a DSL helper for ergonomic wiring.
|
||||
|
||||
## Adding an expectation
|
||||
1) Implement `testing_framework_core::scenario::Expectation`:
|
||||
- Use `start_capture` to snapshot baseline metrics.
|
||||
- Use `evaluate` to assert outcomes after workloads finish; return all errors
|
||||
so the runner can aggregate them.
|
||||
2) Export it from `testing-framework/workflows` if it is reusable.
|
||||
|
||||
## Adding a runner
|
||||
1) Implement `testing_framework_core::scenario::Deployer` for your backend.
|
||||
- Produce a `RunContext` with `NodeClients`, metrics endpoints, and optional
|
||||
`NodeControlHandle`.
|
||||
- Guard cleanup with `CleanupGuard` to reclaim resources even on failures.
|
||||
2) Mirror the readiness and block-feed probes used by the existing runners so
|
||||
workloads can rely on consistent signals.
|
||||
|
||||
## Adding topology helpers
|
||||
- Extend `testing_framework_core::topology::TopologyBuilder` with new layouts or
|
||||
configuration presets (e.g., specialized DA parameters). Keep defaults safe:
|
||||
ensure at least one participant and clamp dispersal factors as the current
|
||||
helpers do.
|
||||
26
book/src/faq.md
Normal file
26
book/src/faq.md
Normal file
@ -0,0 +1,26 @@
|
||||
# FAQ
|
||||
|
||||
**Why block-oriented timing?**
|
||||
Using block cadence reduces dependence on host speed and keeps assertions aligned
|
||||
with protocol behavior.
|
||||
|
||||
**Can I reuse the same scenario across runners?**
|
||||
Yes. The plan stays the same; swap runners (local, compose, k8s) to target
|
||||
different environments.
|
||||
|
||||
**When should I enable chaos workloads?**
|
||||
Only when testing resilience or operational recovery; keep functional smoke
|
||||
tests deterministic.
|
||||
|
||||
**How long should runs be?**
|
||||
Long enough for multiple blocks so liveness and inclusion checks are
|
||||
meaningful; very short runs risk false confidence.
|
||||
|
||||
**Do I always need seeded wallets?**
|
||||
Only for transaction scenarios. Data-availability or pure chaos scenarios may
|
||||
not require them, but liveness checks still need validators producing blocks.
|
||||
|
||||
**What if expectations fail but workloads “look fine”?**
|
||||
Trust expectations first—they capture the intended success criteria. Use the
|
||||
observability signals and runner logs to pinpoint why the system missed the
|
||||
target.
|
||||
18
book/src/glossary.md
Normal file
18
book/src/glossary.md
Normal file
@ -0,0 +1,18 @@
|
||||
# Glossary
|
||||
|
||||
- **Validator**: node role responsible for participating in consensus and block
|
||||
production.
|
||||
- **Executor**: node role that processes transactions or workloads delegated by
|
||||
validators.
|
||||
- **DA (Data Availability)**: subsystem ensuring blobs or channel data are
|
||||
published and retrievable for validation.
|
||||
- **Workload**: traffic or behavior generator that exercises the system during a
|
||||
scenario run.
|
||||
- **Expectation**: post-run assertion that judges whether the system met the
|
||||
intended success criteria.
|
||||
- **Topology**: declarative description of the cluster shape, roles, and
|
||||
high-level parameters for a scenario.
|
||||
- **Blockfeed**: stream of block observations used for liveness or inclusion
|
||||
signals during a run.
|
||||
- **Control capability**: the ability for a runner to start, stop, or restart
|
||||
nodes, used by chaos workloads.
|
||||
18
book/src/internal-crate-reference.md
Normal file
18
book/src/internal-crate-reference.md
Normal file
@ -0,0 +1,18 @@
|
||||
# Internal Crate Reference
|
||||
|
||||
High-level roles of the crates that make up the framework:
|
||||
|
||||
- **Configs**: prepares reusable configuration primitives for nodes, networking,
|
||||
tracing, data availability, and wallets, shared by all scenarios and runners.
|
||||
- **Core scenario orchestration**: houses the topology and scenario model,
|
||||
runtime coordination, node clients, and readiness/health probes.
|
||||
- **Workflows**: packages workloads and expectations into reusable building
|
||||
blocks and offers a fluent DSL to assemble them.
|
||||
- **Runners**: implements deployment backends (local host, Docker Compose,
|
||||
Kubernetes) that all consume the same scenario plan.
|
||||
- **Test workflows**: example scenarios and integration checks that exercise the
|
||||
framework end to end and serve as living documentation.
|
||||
|
||||
Use this map to locate where to add new capabilities: configuration primitives
|
||||
in configs, orchestration changes in core, reusable traffic/assertions in
|
||||
workflows, environment adapters in runners, and demonstrations in tests.
|
||||
15
book/src/introduction.md
Normal file
15
book/src/introduction.md
Normal file
@ -0,0 +1,15 @@
|
||||
# Introduction
|
||||
|
||||
The Nomos Testing Framework is a purpose-built toolkit for exercising Nomos in
|
||||
realistic, multi-node environments. It solves the gap between small, isolated
|
||||
tests and full-system validation by letting teams describe a cluster layout,
|
||||
drive meaningful traffic, and assert the outcomes in one coherent plan.
|
||||
|
||||
It is for protocol engineers, infrastructure operators, and QA teams who need
|
||||
repeatable confidence that validators, executors, and data-availability
|
||||
components work together under network and timing constraints.
|
||||
|
||||
Multi-node integration testing is required because many Nomos behaviors—block
|
||||
progress, data availability, liveness under churn—only emerge when several
|
||||
roles interact over real networking and time. This framework makes those checks
|
||||
declarative, observable, and portable across environments.
|
||||
42
book/src/operations.md
Normal file
42
book/src/operations.md
Normal file
@ -0,0 +1,42 @@
|
||||
# Operations
|
||||
|
||||
Operational readiness focuses on prerequisites, environment fit, and clear
|
||||
signals:
|
||||
|
||||
- **Prerequisites**: keep a sibling `nomos-node` checkout available; ensure the
|
||||
chosen runner’s platform needs are met (local binaries for host runs, Docker
|
||||
for compose, cluster access for k8s).
|
||||
- **Artifacts**: some scenarios depend on prover or circuit assets; fetch them
|
||||
ahead of time with the provided helper scripts when needed.
|
||||
- **Environment flags**: use slow-environment toggles to relax timeouts, enable
|
||||
tracing when debugging, and adjust observability ports to avoid clashes.
|
||||
- **Readiness checks**: verify runners report node readiness before starting
|
||||
workloads; this avoids false negatives from starting too early.
|
||||
- **Failure triage**: map failures to missing prerequisites (wallet seeding,
|
||||
node control availability), runner platform issues, or unmet expectations.
|
||||
Start with liveness signals, then dive into workload-specific assertions.
|
||||
|
||||
Treat operational hygiene—assets present, prerequisites satisfied, observability
|
||||
reachable—as the first step to reliable scenario outcomes.
|
||||
|
||||
Metrics and observability flow:
|
||||
```
|
||||
Runner exposes endpoints/ports
|
||||
│
|
||||
▼
|
||||
Runtime collects block/health signals
|
||||
│
|
||||
▼
|
||||
Expectations consume signals to decide pass/fail
|
||||
│
|
||||
▼
|
||||
Operators inspect logs/metrics when failures arise
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Expose[Runner exposes endpoints/ports] --> Collect[Runtime collects block/health signals]
|
||||
Collect --> Consume[Expectations consume signals<br/>decide pass/fail]
|
||||
Consume --> Inspect[Operators inspect logs/metrics<br/>when failures arise]
|
||||
```
|
||||
4
book/src/part-i.md
Normal file
4
book/src/part-i.md
Normal file
@ -0,0 +1,4 @@
|
||||
# Part I — Foundations
|
||||
|
||||
Conceptual chapters that establish the mental model for the framework and how
|
||||
it approaches multi-node testing.
|
||||
4
book/src/part-ii.md
Normal file
4
book/src/part-ii.md
Normal file
@ -0,0 +1,4 @@
|
||||
# Part II — User Guide
|
||||
|
||||
Practical guidance for shaping scenarios, combining workloads and expectations,
|
||||
and running them across different environments.
|
||||
4
book/src/part-iii.md
Normal file
4
book/src/part-iii.md
Normal file
@ -0,0 +1,4 @@
|
||||
# Part III — Developer Reference
|
||||
|
||||
Deep dives for contributors who extend the framework, evolve its abstractions,
|
||||
or maintain the crate set.
|
||||
4
book/src/part-iv.md
Normal file
4
book/src/part-iv.md
Normal file
@ -0,0 +1,4 @@
|
||||
# Part IV — Appendix
|
||||
|
||||
Quick-reference material and supporting guidance to keep scenarios discoverable,
|
||||
debuggable, and consistent.
|
||||
16
book/src/project-context-primer.md
Normal file
16
book/src/project-context-primer.md
Normal file
@ -0,0 +1,16 @@
|
||||
# Project Context Primer
|
||||
|
||||
This book focuses on the Nomos Testing Framework. It assumes familiarity with
|
||||
the Nomos architecture, but for completeness, here is a short primer.
|
||||
|
||||
- **Nomos** is a modular blockchain protocol composed of validators, executors,
|
||||
and a data-availability (DA) subsystem.
|
||||
- **Validators** participate in consensus and produce blocks.
|
||||
- **Executors** run application logic or off-chain computations referenced by
|
||||
blocks.
|
||||
- **Data Availability (DA)** ensures that data referenced in blocks is
|
||||
published and retrievable, including blobs or channel data used by workloads.
|
||||
|
||||
These roles interact tightly, which is why meaningful testing must be performed
|
||||
in multi-node environments that include real networking, timing, and DA
|
||||
interaction.
|
||||
51
book/src/runners.md
Normal file
51
book/src/runners.md
Normal file
@ -0,0 +1,51 @@
|
||||
# Runners
|
||||
|
||||
Runners turn a scenario plan into a live environment while keeping the plan
|
||||
unchanged. Choose based on feedback speed, reproducibility, and fidelity. For
|
||||
environment and operational considerations, see [Operations](operations.md):
|
||||
|
||||
## Local runner
|
||||
- Launches node processes directly on the host.
|
||||
- Fastest feedback loop and minimal orchestration overhead.
|
||||
- Best for development-time iteration and debugging.
|
||||
|
||||
## Docker Compose runner
|
||||
- Starts nodes in containers to provide a reproducible multi-node stack on a
|
||||
single machine.
|
||||
- Discovers service ports and wires observability for convenient inspection.
|
||||
- Good balance between fidelity and ease of setup.
|
||||
|
||||
## Kubernetes runner
|
||||
- Deploys nodes onto a cluster for higher-fidelity, longer-running scenarios.
|
||||
- Suits CI or shared environments where cluster behavior and scheduling matter.
|
||||
|
||||
### Common expectations
|
||||
- All runners require at least one validator and, for transaction scenarios,
|
||||
access to seeded wallets.
|
||||
- Readiness probes gate workload start so traffic begins only after nodes are
|
||||
reachable.
|
||||
- Environment flags can relax timeouts or increase tracing when diagnostics are
|
||||
needed.
|
||||
|
||||
Runner abstraction:
|
||||
```
|
||||
Scenario Plan
|
||||
│
|
||||
▼
|
||||
Runner (local | compose | k8s)
|
||||
│ provisions env + readiness
|
||||
▼
|
||||
Runtime + Observability
|
||||
│
|
||||
▼
|
||||
Workloads / Expectations execute
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Plan[Scenario Plan] --> RunSel{Runner<br/>(local | compose | k8s)}
|
||||
RunSel --> Provision[Provision & readiness]
|
||||
Provision --> Runtime[Runtime + observability]
|
||||
Runtime --> Exec[Workloads & Expectations execute]
|
||||
```
|
||||
17
book/src/running-scenarios.md
Normal file
17
book/src/running-scenarios.md
Normal file
@ -0,0 +1,17 @@
|
||||
# Running Scenarios
|
||||
|
||||
Running a scenario follows the same conceptual flow regardless of environment:
|
||||
|
||||
1. Select or author a scenario plan that pairs a topology with workloads,
|
||||
expectations, and a suitable run window.
|
||||
2. Choose a runner aligned with your environment (local, compose, or k8s) and
|
||||
ensure its prerequisites are available.
|
||||
3. Deploy the plan through the runner; wait for readiness signals before
|
||||
starting workloads.
|
||||
4. Let workloads drive activity for the planned duration; keep observability
|
||||
signals visible so you can correlate outcomes.
|
||||
5. Evaluate expectations and capture results as the primary pass/fail signal.
|
||||
|
||||
Use the same plan across different runners to compare behavior between local
|
||||
development and CI or cluster settings. For environment prerequisites and
|
||||
flags, see [Operations](operations.md).
|
||||
17
book/src/scenario-builder-ext-patterns.md
Normal file
17
book/src/scenario-builder-ext-patterns.md
Normal file
@ -0,0 +1,17 @@
|
||||
# Core Content: ScenarioBuilderExt Patterns
|
||||
|
||||
Patterns that keep scenarios readable and reusable:
|
||||
|
||||
- **Topology-first**: start by shaping the cluster (counts, layout) so later
|
||||
steps inherit a clear foundation.
|
||||
- **Bundle defaults**: use the DSL helpers to attach common expectations (like
|
||||
liveness) whenever you add a matching workload, reducing forgotten checks.
|
||||
- **Intentional rates**: express traffic in per-block terms to align with
|
||||
protocol timing rather than wall-clock assumptions.
|
||||
- **Opt-in chaos**: enable restart patterns only in scenarios meant to probe
|
||||
resilience; keep functional smoke tests deterministic.
|
||||
- **Wallet clarity**: seed only the number of actors you need; it keeps
|
||||
transaction scenarios deterministic and interpretable.
|
||||
|
||||
These patterns make scenario definitions self-explanatory while staying aligned
|
||||
with the framework’s block-oriented timing model.
|
||||
24
book/src/scenario-lifecycle.md
Normal file
24
book/src/scenario-lifecycle.md
Normal file
@ -0,0 +1,24 @@
|
||||
# Scenario Lifecycle (Conceptual)
|
||||
|
||||
1. **Build the plan**: Declare a topology, attach workloads and expectations, and set the run window. The plan is the single source of truth for what will happen.
|
||||
2. **Deploy**: Hand the plan to a runner. It provisions the environment on the chosen backend and waits for nodes to signal readiness.
|
||||
3. **Drive workloads**: Start traffic and behaviors (transactions, data-availability activity, restarts) for the planned duration.
|
||||
4. **Observe blocks and signals**: Track block progression and other high-level metrics during or after the run window to ground assertions in protocol time.
|
||||
5. **Evaluate expectations**: Once activity stops (and optional cooldown completes), check liveness and workload-specific outcomes to decide pass or fail.
|
||||
6. **Cleanup**: Tear down resources so successive runs start fresh and do not inherit leaked state.
|
||||
|
||||
Conceptual lifecycle diagram:
|
||||
```
|
||||
Plan → Deploy → Readiness → Drive Workloads → Observe → Evaluate → Cleanup
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart LR
|
||||
P[Plan<br/>topology + workloads + expectations] --> D[Deploy<br/>runner provisions]
|
||||
D --> R[Readiness<br/>wait for nodes]
|
||||
R --> W[Drive Workloads]
|
||||
W --> O[Observe<br/>blocks/metrics]
|
||||
O --> E[Evaluate Expectations]
|
||||
E --> C[Cleanup]
|
||||
```
|
||||
23
book/src/scenario-model.md
Normal file
23
book/src/scenario-model.md
Normal file
@ -0,0 +1,23 @@
|
||||
# Scenario Model (Developer Level)
|
||||
|
||||
The scenario model defines clear, composable responsibilities:
|
||||
|
||||
- **Topology**: a declarative description of the cluster—how many nodes, their
|
||||
roles, and the broad network and data-availability characteristics. It
|
||||
represents the intended shape of the system under test.
|
||||
- **Scenario**: a plan combining topology, workloads, expectations, and a run
|
||||
window. Building a scenario validates prerequisites (like seeded wallets) and
|
||||
ensures the run lasts long enough to observe meaningful block progression.
|
||||
- **Workloads**: asynchronous tasks that generate traffic or conditions. They
|
||||
use shared context to interact with the deployed cluster and may bundle
|
||||
default expectations.
|
||||
- **Expectations**: post-run assertions. They can capture baselines before
|
||||
workloads start and evaluate success once activity stops.
|
||||
- **Runtime**: coordinates workloads and expectations for the configured
|
||||
duration, enforces cooldowns when control actions occur, and ensures cleanup
|
||||
so runs do not leak resources.
|
||||
|
||||
Developers extending the model should keep these boundaries strict: topology
|
||||
describes, scenarios assemble, runners deploy, workloads drive, and expectations
|
||||
judge outcomes. For guidance on adding new capabilities, see
|
||||
[Extending the Framework](extending.md).
|
||||
9
book/src/testing-philosophy.md
Normal file
9
book/src/testing-philosophy.md
Normal file
@ -0,0 +1,9 @@
|
||||
# Testing Philosophy
|
||||
|
||||
- **Declarative over imperative**: describe the desired cluster shape, traffic, and success criteria; let the framework orchestrate the run.
|
||||
- **Observable health signals**: prefer liveness and inclusion signals that reflect real user impact instead of internal debug state.
|
||||
- **Determinism first**: default scenarios aim for repeatable outcomes with fixed topologies and traffic rates; variability is opt-in.
|
||||
- **Targeted non-determinism**: introduce randomness (e.g., restarts) only when probing resilience or operational robustness.
|
||||
- **Protocol time, not wall time**: reason in blocks and protocol-driven intervals to reduce dependence on host speed or scheduler noise.
|
||||
- **Minimum run window**: always allow enough block production to make assertions meaningful; very short runs risk false confidence.
|
||||
- **Use chaos with intent**: chaos workloads are for recovery and fault-tolerance validation, not for baseline functional checks.
|
||||
9
book/src/troubleshooting.md
Normal file
9
book/src/troubleshooting.md
Normal file
@ -0,0 +1,9 @@
|
||||
# Troubleshooting Scenarios
|
||||
|
||||
Common symptoms and likely causes:
|
||||
|
||||
- **No or slow block progression**: runner started workloads before readiness, insufficient run window, or environment too slow—extend duration or enable slow-environment tuning.
|
||||
- **Transactions not included**: missing or insufficient wallet seeding, misaligned transaction rate with block cadence, or network instability—reduce rate and verify wallet setup.
|
||||
- **Chaos stalls the run**: node control not available for the chosen runner or restart cadence too aggressive—enable control capability and widen restart intervals.
|
||||
- **Observability gaps**: metrics or logs unreachable because ports clash or services are not exposed—adjust observability ports and confirm runner wiring.
|
||||
- **Flaky behavior across runs**: mixing chaos with functional smoke tests or inconsistent topology between environments—separate deterministic and chaos scenarios and standardize topology presets.
|
||||
7
book/src/usage-patterns.md
Normal file
7
book/src/usage-patterns.md
Normal file
@ -0,0 +1,7 @@
|
||||
# Usage Patterns
|
||||
|
||||
- **Shape a topology, pick a runner**: choose local for quick iteration, compose for reproducible multi-node stacks with observability, or k8s for cluster-grade validation.
|
||||
- **Compose workloads deliberately**: pair transactions and data-availability traffic for end-to-end coverage; add chaos only when assessing recovery and resilience.
|
||||
- **Align expectations with goals**: use liveness-style checks to confirm the system keeps up with planned activity, and add workload-specific assertions for inclusion or availability.
|
||||
- **Reuse plans across environments**: keep the scenario constant while swapping runners to compare behavior between developer machines and CI clusters.
|
||||
- **Iterate with clear signals**: treat expectation outcomes as the primary pass/fail indicator, and adjust topology or workloads based on what those signals reveal.
|
||||
6
book/src/what-you-will-learn.md
Normal file
6
book/src/what-you-will-learn.md
Normal file
@ -0,0 +1,6 @@
|
||||
# What You Will Learn
|
||||
|
||||
This book gives you a clear mental model for Nomos multi-node testing, shows how
|
||||
to author scenarios that pair realistic workloads with explicit expectations,
|
||||
and guides you to run them across local, containerized, and cluster environments
|
||||
without changing the plan.
|
||||
42
book/src/workloads.md
Normal file
42
book/src/workloads.md
Normal file
@ -0,0 +1,42 @@
|
||||
# Core Content: Workloads & Expectations
|
||||
|
||||
Workloads describe the activity a scenario generates; expectations describe the
|
||||
signals that must hold when that activity completes. Both are pluggable so
|
||||
scenarios stay readable and purpose-driven.
|
||||
|
||||
## Workloads
|
||||
- **Transaction workload**: submits user-level transactions at a configurable
|
||||
rate and can limit how many distinct actors participate.
|
||||
- **Data-availability workload**: drives blob and channel activity to exercise
|
||||
data-availability paths.
|
||||
- **Chaos workload**: triggers controlled node restarts to test resilience and
|
||||
recovery behaviors (requires a runner that can control nodes).
|
||||
|
||||
## Expectations
|
||||
- **Consensus liveness**: verifies the system continues to produce blocks in
|
||||
line with the planned workload and timing window.
|
||||
- **Workload-specific checks**: each workload can attach its own success
|
||||
criteria (e.g., inclusion of submitted activity) so scenarios remain concise.
|
||||
|
||||
Together, workloads and expectations let you express both the pressure applied
|
||||
to the system and the definition of “healthy” for that run.
|
||||
|
||||
Workload pipeline (conceptual):
|
||||
```
|
||||
Inputs (topology + wallets + rates)
|
||||
│
|
||||
▼
|
||||
Workload init → Drive traffic → Collect signals
|
||||
│
|
||||
▼
|
||||
Expectations evaluate
|
||||
```
|
||||
|
||||
Mermaid view:
|
||||
```mermaid
|
||||
flowchart TD
|
||||
I[Inputs<br/>(topology + wallets + rates)] --> Init[Workload init]
|
||||
Init --> Drive[Drive traffic]
|
||||
Drive --> Collect[Collect signals]
|
||||
Collect --> Eval[Expectations evaluate]
|
||||
```
|
||||
19
book/src/workspace-layout.md
Normal file
19
book/src/workspace-layout.md
Normal file
@ -0,0 +1,19 @@
|
||||
# Workspace Layout
|
||||
|
||||
The workspace focuses on multi-node integration testing and sits alongside a
|
||||
`nomos-node` checkout. Its crates separate concerns to keep scenarios
|
||||
repeatable and portable:
|
||||
|
||||
- **Configs**: prepares high-level node, network, tracing, and wallet settings
|
||||
used across test environments.
|
||||
- **Core scenario orchestration**: the engine that holds topology descriptions,
|
||||
scenario plans, runtimes, workloads, and expectations.
|
||||
- **Workflows**: ready-made workloads (transactions, data-availability, chaos)
|
||||
and reusable expectations assembled into a user-facing DSL.
|
||||
- **Runners**: deployment backends for local processes, Docker Compose, and
|
||||
Kubernetes, all consuming the same scenario plan.
|
||||
- **Test workflows**: example scenarios and integration checks that show how
|
||||
the pieces fit together.
|
||||
|
||||
This split keeps configuration, orchestration, reusable traffic patterns, and
|
||||
deployment adapters loosely coupled while sharing one mental model for tests.
|
||||
@ -21,6 +21,20 @@ if [ ! -d "$CIRCUITS_DIR" ]; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
system_gmp_package() {
|
||||
local multiarch
|
||||
multiarch="$(gcc -print-multiarch 2>/dev/null || echo aarch64-linux-gnu)"
|
||||
local lib_path="/usr/lib/${multiarch}/libgmp.a"
|
||||
if [ ! -f "$lib_path" ]; then
|
||||
echo "system libgmp.a not found at $lib_path" >&2
|
||||
return 1
|
||||
fi
|
||||
mkdir -p depends/gmp/package_aarch64/lib depends/gmp/package_aarch64/include
|
||||
cp "$lib_path" depends/gmp/package_aarch64/lib/
|
||||
# Headers are small; copy the public ones the build expects.
|
||||
cp /usr/include/gmp*.h depends/gmp/package_aarch64/include/ || true
|
||||
}
|
||||
|
||||
case "$TARGET_ARCH" in
|
||||
arm64 | aarch64)
|
||||
;;
|
||||
@ -41,12 +55,23 @@ git submodule update --init --recursive >&2
|
||||
if [ "${RAPIDSNARK_BUILD_GMP:-1}" = "1" ]; then
|
||||
GMP_TARGET="${RAPIDSNARK_GMP_TARGET:-aarch64}"
|
||||
./build_gmp.sh "$GMP_TARGET" >&2
|
||||
else
|
||||
echo "Using system libgmp to satisfy rapidsnark dependencies" >&2
|
||||
system_gmp_package
|
||||
fi
|
||||
|
||||
MAKE_TARGET="${RAPIDSNARK_MAKE_TARGET:-host_arm64}"
|
||||
PACKAGE_DIR="${RAPIDSNARK_PACKAGE_DIR:-package_arm64}"
|
||||
|
||||
make "$MAKE_TARGET" -j"$(nproc)" >&2
|
||||
rm -rf build_prover_arm64
|
||||
mkdir build_prover_arm64
|
||||
cd build_prover_arm64
|
||||
cmake .. \
|
||||
-DTARGET_PLATFORM=aarch64 \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DCMAKE_INSTALL_PREFIX="../${PACKAGE_DIR}" \
|
||||
-DBUILD_SHARED_LIBS=OFF >&2
|
||||
cmake --build . --target prover verifier -- -j"$(nproc)" >&2
|
||||
|
||||
install -m 0755 "${PACKAGE_DIR}/bin/prover" "$CIRCUITS_DIR/prover"
|
||||
install -m 0755 "src/prover" "$CIRCUITS_DIR/prover"
|
||||
install -m 0755 "src/verifier" "$CIRCUITS_DIR/verifier"
|
||||
echo "rapidsnark prover installed to $CIRCUITS_DIR/prover" >&2
|
||||
|
||||
@ -121,7 +121,7 @@ download_release() {
|
||||
print_error "Please check that version ${VERSION} exists for platform ${platform}"
|
||||
print_error "Available releases: https://github.com/${REPO}/releases"
|
||||
rm -rf "$temp_dir"
|
||||
exit 1
|
||||
return 1
|
||||
fi
|
||||
|
||||
print_success "Download complete"
|
||||
@ -132,7 +132,7 @@ download_release() {
|
||||
if ! tar -xzf "${temp_dir}/${artifact}" -C "$INSTALL_DIR" --strip-components=1; then
|
||||
print_error "Failed to extract archive"
|
||||
rm -rf "$temp_dir"
|
||||
exit 1
|
||||
return 1
|
||||
fi
|
||||
|
||||
rm -rf "$temp_dir"
|
||||
@ -171,8 +171,18 @@ main() {
|
||||
# Check existing installation
|
||||
check_existing_installation
|
||||
|
||||
# Download and extract
|
||||
download_release "$platform"
|
||||
# Download and extract (retry with x86_64 bundle on aarch64 if needed)
|
||||
if ! download_release "$platform"; then
|
||||
if [[ "$platform" == linux-aarch64 ]]; then
|
||||
print_warning "Falling back to linux-x86_64 circuits bundle; will rebuild prover for aarch64."
|
||||
rm -rf "$INSTALL_DIR"
|
||||
if ! download_release "linux-x86_64"; then
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Handle macOS quarantine if needed
|
||||
if [[ "$platform" == macos-* ]]; then
|
||||
|
||||
@ -82,7 +82,7 @@ pub fn create_executor_config(config: GeneralConfig) -> ExecutorConfig {
|
||||
// non-string keys and keep services alive.
|
||||
recovery_file: PathBuf::new(),
|
||||
bootstrap: chain_service::BootstrapConfig {
|
||||
prolonged_bootstrap_period: Duration::from_secs(3),
|
||||
prolonged_bootstrap_period: config.bootstrapping_config.prolonged_bootstrap_period,
|
||||
force_bootstrap: false,
|
||||
offline_grace_period: chain_service::OfflineGracePeriodConfig {
|
||||
grace_period: Duration::from_secs(20 * 60),
|
||||
|
||||
@ -204,7 +204,8 @@ fn build_values(topology: &GeneratedTopology) -> HelmValues {
|
||||
let validators = topology
|
||||
.validators()
|
||||
.iter()
|
||||
.map(|validator| {
|
||||
.enumerate()
|
||||
.map(|(index, validator)| {
|
||||
let mut env = BTreeMap::new();
|
||||
env.insert(
|
||||
"CFG_NETWORK_PORT".into(),
|
||||
@ -225,6 +226,8 @@ fn build_values(topology: &GeneratedTopology) -> HelmValues {
|
||||
.port()
|
||||
.to_string(),
|
||||
);
|
||||
env.insert("CFG_HOST_KIND".into(), "validator".into());
|
||||
env.insert("CFG_HOST_IDENTIFIER".into(), format!("validator-{index}"));
|
||||
|
||||
NodeValues {
|
||||
api_port: validator.general.api_config.address.port(),
|
||||
@ -237,7 +240,8 @@ fn build_values(topology: &GeneratedTopology) -> HelmValues {
|
||||
let executors = topology
|
||||
.executors()
|
||||
.iter()
|
||||
.map(|executor| {
|
||||
.enumerate()
|
||||
.map(|(index, executor)| {
|
||||
let mut env = BTreeMap::new();
|
||||
env.insert(
|
||||
"CFG_NETWORK_PORT".into(),
|
||||
@ -258,6 +262,8 @@ fn build_values(topology: &GeneratedTopology) -> HelmValues {
|
||||
.port()
|
||||
.to_string(),
|
||||
);
|
||||
env.insert("CFG_HOST_KIND".into(), "executor".into());
|
||||
env.insert("CFG_HOST_IDENTIFIER".into(), format!("executor-{index}"));
|
||||
|
||||
NodeValues {
|
||||
api_port: executor.general.api_config.address.port(),
|
||||
|
||||
@ -22,7 +22,7 @@ use crate::{
|
||||
helm::{HelmError, install_release},
|
||||
host::node_host,
|
||||
logs::dump_namespace_logs,
|
||||
wait::{ClusterPorts, ClusterWaitError, NodeConfigPorts, wait_for_cluster_ready},
|
||||
wait::{ClusterPorts, ClusterReady, ClusterWaitError, NodeConfigPorts, wait_for_cluster_ready},
|
||||
};
|
||||
|
||||
pub struct K8sRunner {
|
||||
@ -66,6 +66,7 @@ struct ClusterEnvironment {
|
||||
executor_api_ports: Vec<u16>,
|
||||
executor_testing_ports: Vec<u16>,
|
||||
prometheus_port: u16,
|
||||
port_forwards: Vec<std::process::Child>,
|
||||
}
|
||||
|
||||
impl ClusterEnvironment {
|
||||
@ -75,6 +76,7 @@ impl ClusterEnvironment {
|
||||
release: String,
|
||||
cleanup: RunnerCleanup,
|
||||
ports: &ClusterPorts,
|
||||
port_forwards: Vec<std::process::Child>,
|
||||
) -> Self {
|
||||
Self {
|
||||
client,
|
||||
@ -86,6 +88,7 @@ impl ClusterEnvironment {
|
||||
executor_api_ports: ports.executors.iter().map(|ports| ports.api).collect(),
|
||||
executor_testing_ports: ports.executors.iter().map(|ports| ports.testing).collect(),
|
||||
prometheus_port: ports.prometheus,
|
||||
port_forwards,
|
||||
}
|
||||
}
|
||||
|
||||
@ -97,15 +100,17 @@ impl ClusterEnvironment {
|
||||
"k8s stack failure; collecting diagnostics"
|
||||
);
|
||||
dump_namespace_logs(&self.client, &self.namespace).await;
|
||||
kill_port_forwards(&mut self.port_forwards);
|
||||
if let Some(guard) = self.cleanup.take() {
|
||||
Box::new(guard).cleanup();
|
||||
}
|
||||
}
|
||||
|
||||
fn into_cleanup(mut self) -> RunnerCleanup {
|
||||
self.cleanup
|
||||
.take()
|
||||
.expect("cleanup guard should be available")
|
||||
fn into_cleanup(self) -> (RunnerCleanup, Vec<std::process::Child>) {
|
||||
(
|
||||
self.cleanup.expect("cleanup guard should be available"),
|
||||
self.port_forwards,
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
@ -264,12 +269,15 @@ impl Deployer for K8sRunner {
|
||||
return Err(err);
|
||||
}
|
||||
};
|
||||
let cleanup = cluster
|
||||
let (cleanup, port_forwards) = cluster
|
||||
.take()
|
||||
.expect("cluster should still be available")
|
||||
.into_cleanup();
|
||||
let cleanup_guard: Box<dyn CleanupGuard> =
|
||||
Box::new(K8sCleanupGuard::new(cleanup, block_feed_guard));
|
||||
let cleanup_guard: Box<dyn CleanupGuard> = Box::new(K8sCleanupGuard::new(
|
||||
cleanup,
|
||||
block_feed_guard,
|
||||
port_forwards,
|
||||
));
|
||||
let context = RunContext::new(
|
||||
descriptors,
|
||||
None,
|
||||
@ -301,6 +309,14 @@ fn ensure_supported_topology(descriptors: &GeneratedTopology) -> Result<(), K8sR
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn kill_port_forwards(handles: &mut Vec<std::process::Child>) {
|
||||
for handle in handles.iter_mut() {
|
||||
let _ = handle.kill();
|
||||
let _ = handle.wait();
|
||||
}
|
||||
handles.clear();
|
||||
}
|
||||
|
||||
fn collect_port_specs(descriptors: &GeneratedTopology) -> PortSpecs {
|
||||
let validators = descriptors
|
||||
.validators()
|
||||
@ -386,11 +402,11 @@ async fn setup_cluster(
|
||||
let mut cleanup_guard =
|
||||
Some(install_stack(client, &assets, &namespace, &release, validators, executors).await?);
|
||||
|
||||
let cluster_ports =
|
||||
let cluster_ready =
|
||||
wait_for_ports_or_cleanup(client, &namespace, &release, specs, &mut cleanup_guard).await?;
|
||||
|
||||
info!(
|
||||
prometheus_port = cluster_ports.prometheus,
|
||||
prometheus_port = cluster_ready.ports.prometheus,
|
||||
"discovered prometheus endpoint"
|
||||
);
|
||||
|
||||
@ -401,7 +417,8 @@ async fn setup_cluster(
|
||||
cleanup_guard
|
||||
.take()
|
||||
.expect("cleanup guard must exist after successful cluster startup"),
|
||||
&cluster_ports,
|
||||
&cluster_ready.ports,
|
||||
cluster_ready.port_forwards,
|
||||
);
|
||||
|
||||
if readiness_checks {
|
||||
@ -448,7 +465,7 @@ async fn wait_for_ports_or_cleanup(
|
||||
release: &str,
|
||||
specs: &PortSpecs,
|
||||
cleanup_guard: &mut Option<RunnerCleanup>,
|
||||
) -> Result<ClusterPorts, K8sRunnerError> {
|
||||
) -> Result<ClusterReady, K8sRunnerError> {
|
||||
match wait_for_cluster_ready(
|
||||
client,
|
||||
namespace,
|
||||
@ -498,13 +515,19 @@ async fn ensure_cluster_readiness(
|
||||
struct K8sCleanupGuard {
|
||||
cleanup: RunnerCleanup,
|
||||
block_feed: Option<BlockFeedTask>,
|
||||
port_forwards: Vec<std::process::Child>,
|
||||
}
|
||||
|
||||
impl K8sCleanupGuard {
|
||||
const fn new(cleanup: RunnerCleanup, block_feed: BlockFeedTask) -> Self {
|
||||
const fn new(
|
||||
cleanup: RunnerCleanup,
|
||||
block_feed: BlockFeedTask,
|
||||
port_forwards: Vec<std::process::Child>,
|
||||
) -> Self {
|
||||
Self {
|
||||
cleanup,
|
||||
block_feed: Some(block_feed),
|
||||
port_forwards,
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -514,6 +537,7 @@ impl CleanupGuard for K8sCleanupGuard {
|
||||
if let Some(block_feed) = self.block_feed.take() {
|
||||
CleanupGuard::cleanup(Box::new(block_feed));
|
||||
}
|
||||
kill_port_forwards(&mut self.port_forwards);
|
||||
CleanupGuard::cleanup(Box::new(self.cleanup));
|
||||
}
|
||||
}
|
||||
|
||||
@ -1,4 +1,9 @@
|
||||
use std::time::Duration;
|
||||
use std::{
|
||||
net::{Ipv4Addr, TcpListener, TcpStream},
|
||||
process::{Command as StdCommand, Stdio},
|
||||
thread,
|
||||
time::Duration,
|
||||
};
|
||||
|
||||
use k8s_openapi::api::{apps::v1::Deployment, core::v1::Service};
|
||||
use kube::{Api, Client, Error as KubeError};
|
||||
@ -9,7 +14,12 @@ use tokio::time::sleep;
|
||||
use crate::host::node_host;
|
||||
|
||||
const DEPLOYMENT_TIMEOUT: Duration = Duration::from_secs(180);
|
||||
const NODE_HTTP_TIMEOUT: Duration = Duration::from_secs(240);
|
||||
const NODE_HTTP_PROBE_TIMEOUT: Duration = Duration::from_secs(30);
|
||||
const HTTP_POLL_INTERVAL: Duration = Duration::from_secs(1);
|
||||
const PROMETHEUS_HTTP_PORT: u16 = 9090;
|
||||
const PROMETHEUS_HTTP_TIMEOUT: Duration = Duration::from_secs(240);
|
||||
const PROMETHEUS_HTTP_PROBE_TIMEOUT: Duration = Duration::from_secs(30);
|
||||
const PROMETHEUS_SERVICE_NAME: &str = "prometheus";
|
||||
|
||||
#[derive(Clone, Copy)]
|
||||
@ -30,6 +40,11 @@ pub struct ClusterPorts {
|
||||
pub prometheus: u16,
|
||||
}
|
||||
|
||||
pub struct ClusterReady {
|
||||
pub ports: ClusterPorts,
|
||||
pub port_forwards: Vec<std::process::Child>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ClusterWaitError {
|
||||
#[error("deployment {name} in namespace {namespace} did not become ready within {timeout:?}")]
|
||||
@ -62,6 +77,13 @@ pub enum ClusterWaitError {
|
||||
},
|
||||
#[error("timeout waiting for prometheus readiness on NodePort {port}")]
|
||||
PrometheusTimeout { port: u16 },
|
||||
#[error("failed to start port-forward for service {service} port {port}: {source}")]
|
||||
PortForward {
|
||||
service: String,
|
||||
port: u16,
|
||||
#[source]
|
||||
source: anyhow::Error,
|
||||
},
|
||||
}
|
||||
|
||||
pub async fn wait_for_deployment_ready(
|
||||
@ -159,7 +181,7 @@ pub async fn wait_for_cluster_ready(
|
||||
release: &str,
|
||||
validator_ports: &[NodeConfigPorts],
|
||||
executor_ports: &[NodeConfigPorts],
|
||||
) -> Result<ClusterPorts, ClusterWaitError> {
|
||||
) -> Result<ClusterReady, ClusterWaitError> {
|
||||
if validator_ports.is_empty() {
|
||||
return Err(ClusterWaitError::MissingValidator);
|
||||
}
|
||||
@ -177,11 +199,40 @@ pub async fn wait_for_cluster_ready(
|
||||
});
|
||||
}
|
||||
|
||||
let mut port_forwards = Vec::new();
|
||||
|
||||
let validator_api_ports: Vec<u16> = validator_allocations
|
||||
.iter()
|
||||
.map(|ports| ports.api)
|
||||
.collect();
|
||||
wait_for_node_http(&validator_api_ports, NodeRole::Validator).await?;
|
||||
if wait_for_node_http_nodeport(
|
||||
&validator_api_ports,
|
||||
NodeRole::Validator,
|
||||
NODE_HTTP_PROBE_TIMEOUT,
|
||||
)
|
||||
.await
|
||||
.is_err()
|
||||
{
|
||||
// Fall back to port-forwarding when NodePorts are unreachable from the host.
|
||||
validator_allocations.clear();
|
||||
port_forwards = port_forward_group(
|
||||
namespace,
|
||||
release,
|
||||
"validator",
|
||||
validator_ports,
|
||||
&mut validator_allocations,
|
||||
)?;
|
||||
let validator_api_ports: Vec<u16> = validator_allocations
|
||||
.iter()
|
||||
.map(|ports| ports.api)
|
||||
.collect();
|
||||
if let Err(err) =
|
||||
wait_for_node_http_port_forward(&validator_api_ports, NodeRole::Validator).await
|
||||
{
|
||||
kill_port_forwards(&mut port_forwards);
|
||||
return Err(err);
|
||||
}
|
||||
}
|
||||
|
||||
let mut executor_allocations = Vec::with_capacity(executor_ports.len());
|
||||
for (index, ports) in executor_ports.iter().enumerate() {
|
||||
@ -195,39 +246,102 @@ pub async fn wait_for_cluster_ready(
|
||||
});
|
||||
}
|
||||
|
||||
if !executor_allocations.is_empty() {
|
||||
let executor_api_ports: Vec<u16> = executor_allocations.iter().map(|ports| ports.api).collect();
|
||||
if !executor_allocations.is_empty()
|
||||
&& wait_for_node_http_nodeport(
|
||||
&executor_api_ports,
|
||||
NodeRole::Executor,
|
||||
NODE_HTTP_PROBE_TIMEOUT,
|
||||
)
|
||||
.await
|
||||
.is_err()
|
||||
{
|
||||
executor_allocations.clear();
|
||||
match port_forward_group(
|
||||
namespace,
|
||||
release,
|
||||
"executor",
|
||||
executor_ports,
|
||||
&mut executor_allocations,
|
||||
) {
|
||||
Ok(forwards) => port_forwards.extend(forwards),
|
||||
Err(err) => {
|
||||
kill_port_forwards(&mut port_forwards);
|
||||
return Err(err);
|
||||
}
|
||||
}
|
||||
let executor_api_ports: Vec<u16> =
|
||||
executor_allocations.iter().map(|ports| ports.api).collect();
|
||||
wait_for_node_http(&executor_api_ports, NodeRole::Executor).await?;
|
||||
if let Err(err) =
|
||||
wait_for_node_http_port_forward(&executor_api_ports, NodeRole::Executor).await
|
||||
{
|
||||
kill_port_forwards(&mut port_forwards);
|
||||
return Err(err);
|
||||
}
|
||||
}
|
||||
|
||||
let prometheus_port = find_node_port(
|
||||
let mut prometheus_port = find_node_port(
|
||||
client,
|
||||
namespace,
|
||||
PROMETHEUS_SERVICE_NAME,
|
||||
PROMETHEUS_HTTP_PORT,
|
||||
)
|
||||
.await?;
|
||||
wait_for_prometheus_http(prometheus_port).await?;
|
||||
if wait_for_prometheus_http_nodeport(prometheus_port, PROMETHEUS_HTTP_PROBE_TIMEOUT)
|
||||
.await
|
||||
.is_err()
|
||||
{
|
||||
let (local_port, forward) =
|
||||
port_forward_service(namespace, PROMETHEUS_SERVICE_NAME, PROMETHEUS_HTTP_PORT)
|
||||
.map_err(|err| {
|
||||
kill_port_forwards(&mut port_forwards);
|
||||
err
|
||||
})?;
|
||||
prometheus_port = local_port;
|
||||
port_forwards.push(forward);
|
||||
if let Err(err) =
|
||||
wait_for_prometheus_http_port_forward(prometheus_port, PROMETHEUS_HTTP_TIMEOUT).await
|
||||
{
|
||||
kill_port_forwards(&mut port_forwards);
|
||||
return Err(err);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(ClusterPorts {
|
||||
validators: validator_allocations,
|
||||
executors: executor_allocations,
|
||||
prometheus: prometheus_port,
|
||||
Ok(ClusterReady {
|
||||
ports: ClusterPorts {
|
||||
validators: validator_allocations,
|
||||
executors: executor_allocations,
|
||||
prometheus: prometheus_port,
|
||||
},
|
||||
port_forwards,
|
||||
})
|
||||
}
|
||||
|
||||
async fn wait_for_node_http(ports: &[u16], role: NodeRole) -> Result<(), ClusterWaitError> {
|
||||
async fn wait_for_node_http_nodeport(
|
||||
ports: &[u16],
|
||||
role: NodeRole,
|
||||
timeout: Duration,
|
||||
) -> Result<(), ClusterWaitError> {
|
||||
let host = node_host();
|
||||
http_probe::wait_for_http_ports_with_host(
|
||||
ports,
|
||||
role,
|
||||
&host,
|
||||
Duration::from_secs(240),
|
||||
Duration::from_secs(1),
|
||||
)
|
||||
.await
|
||||
.map_err(map_http_error)
|
||||
wait_for_node_http_on_host(ports, role, &host, timeout).await
|
||||
}
|
||||
|
||||
async fn wait_for_node_http_port_forward(
|
||||
ports: &[u16],
|
||||
role: NodeRole,
|
||||
) -> Result<(), ClusterWaitError> {
|
||||
wait_for_node_http_on_host(ports, role, "127.0.0.1", NODE_HTTP_TIMEOUT).await
|
||||
}
|
||||
|
||||
async fn wait_for_node_http_on_host(
|
||||
ports: &[u16],
|
||||
role: NodeRole,
|
||||
host: &str,
|
||||
timeout: Duration,
|
||||
) -> Result<(), ClusterWaitError> {
|
||||
http_probe::wait_for_http_ports_with_host(ports, role, host, timeout, HTTP_POLL_INTERVAL)
|
||||
.await
|
||||
.map_err(map_http_error)
|
||||
}
|
||||
|
||||
const fn map_http_error(error: HttpReadinessError) -> ClusterWaitError {
|
||||
@ -238,11 +352,30 @@ const fn map_http_error(error: HttpReadinessError) -> ClusterWaitError {
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn wait_for_prometheus_http(port: u16) -> Result<(), ClusterWaitError> {
|
||||
let client = reqwest::Client::new();
|
||||
let url = format!("http://{}:{port}/-/ready", node_host());
|
||||
pub async fn wait_for_prometheus_http_nodeport(
|
||||
port: u16,
|
||||
timeout: Duration,
|
||||
) -> Result<(), ClusterWaitError> {
|
||||
let host = node_host();
|
||||
wait_for_prometheus_http(&host, port, timeout).await
|
||||
}
|
||||
|
||||
for _ in 0..240 {
|
||||
pub async fn wait_for_prometheus_http_port_forward(
|
||||
port: u16,
|
||||
timeout: Duration,
|
||||
) -> Result<(), ClusterWaitError> {
|
||||
wait_for_prometheus_http("127.0.0.1", port, timeout).await
|
||||
}
|
||||
|
||||
pub async fn wait_for_prometheus_http(
|
||||
host: &str,
|
||||
port: u16,
|
||||
timeout: Duration,
|
||||
) -> Result<(), ClusterWaitError> {
|
||||
let client = reqwest::Client::new();
|
||||
let url = format!("http://{host}:{port}/-/ready");
|
||||
|
||||
for _ in 0..timeout.as_secs() {
|
||||
if let Ok(resp) = client.get(&url).send().await
|
||||
&& resp.status().is_success()
|
||||
{
|
||||
@ -253,3 +386,101 @@ pub async fn wait_for_prometheus_http(port: u16) -> Result<(), ClusterWaitError>
|
||||
|
||||
Err(ClusterWaitError::PrometheusTimeout { port })
|
||||
}
|
||||
|
||||
fn port_forward_group(
|
||||
namespace: &str,
|
||||
release: &str,
|
||||
kind: &str,
|
||||
ports: &[NodeConfigPorts],
|
||||
allocations: &mut Vec<NodePortAllocation>,
|
||||
) -> Result<Vec<std::process::Child>, ClusterWaitError> {
|
||||
let mut forwards = Vec::new();
|
||||
for (index, ports) in ports.iter().enumerate() {
|
||||
let service = format!("{release}-{kind}-{index}");
|
||||
let (api_port, api_forward) = match port_forward_service(namespace, &service, ports.api) {
|
||||
Ok(forward) => forward,
|
||||
Err(err) => {
|
||||
kill_port_forwards(&mut forwards);
|
||||
return Err(err);
|
||||
}
|
||||
};
|
||||
let (testing_port, testing_forward) =
|
||||
match port_forward_service(namespace, &service, ports.testing) {
|
||||
Ok(forward) => forward,
|
||||
Err(err) => {
|
||||
kill_port_forwards(&mut forwards);
|
||||
return Err(err);
|
||||
}
|
||||
};
|
||||
allocations.push(NodePortAllocation {
|
||||
api: api_port,
|
||||
testing: testing_port,
|
||||
});
|
||||
forwards.push(api_forward);
|
||||
forwards.push(testing_forward);
|
||||
}
|
||||
Ok(forwards)
|
||||
}
|
||||
|
||||
fn port_forward_service(
|
||||
namespace: &str,
|
||||
service: &str,
|
||||
remote_port: u16,
|
||||
) -> Result<(u16, std::process::Child), ClusterWaitError> {
|
||||
let local_port = allocate_local_port().map_err(|source| ClusterWaitError::PortForward {
|
||||
service: service.to_owned(),
|
||||
port: remote_port,
|
||||
source,
|
||||
})?;
|
||||
|
||||
let mut child = StdCommand::new("kubectl")
|
||||
.arg("port-forward")
|
||||
.arg("-n")
|
||||
.arg(namespace)
|
||||
.arg(format!("svc/{service}"))
|
||||
.arg(format!("{local_port}:{remote_port}"))
|
||||
.stdout(Stdio::null())
|
||||
.stderr(Stdio::null())
|
||||
.spawn()
|
||||
.map_err(|source| ClusterWaitError::PortForward {
|
||||
service: service.to_owned(),
|
||||
port: remote_port,
|
||||
source: source.into(),
|
||||
})?;
|
||||
|
||||
for _ in 0..20 {
|
||||
if let Ok(Some(status)) = child.try_wait() {
|
||||
return Err(ClusterWaitError::PortForward {
|
||||
service: service.to_owned(),
|
||||
port: remote_port,
|
||||
source: anyhow::anyhow!("kubectl exited with {status}"),
|
||||
});
|
||||
}
|
||||
if TcpStream::connect((Ipv4Addr::LOCALHOST, local_port)).is_ok() {
|
||||
return Ok((local_port, child));
|
||||
}
|
||||
thread::sleep(Duration::from_millis(250));
|
||||
}
|
||||
|
||||
let _ = child.kill();
|
||||
Err(ClusterWaitError::PortForward {
|
||||
service: service.to_owned(),
|
||||
port: remote_port,
|
||||
source: anyhow::anyhow!("port-forward did not become ready"),
|
||||
})
|
||||
}
|
||||
|
||||
fn allocate_local_port() -> anyhow::Result<u16> {
|
||||
let listener = TcpListener::bind((Ipv4Addr::LOCALHOST, 0))?;
|
||||
let port = listener.local_addr()?.port();
|
||||
drop(listener);
|
||||
Ok(port)
|
||||
}
|
||||
|
||||
fn kill_port_forwards(handles: &mut Vec<std::process::Child>) {
|
||||
for handle in handles.iter_mut() {
|
||||
let _ = handle.kill();
|
||||
let _ = handle.wait();
|
||||
}
|
||||
handles.clear();
|
||||
}
|
||||
|
||||
@ -2,7 +2,8 @@
|
||||
# check=skip=SecretsUsedInArgOrEnv
|
||||
# Ignore warnings about sensitive information as this is test data.
|
||||
|
||||
ARG VERSION=v0.2.0
|
||||
ARG VERSION=v0.3.1
|
||||
ARG CIRCUITS_OVERRIDE
|
||||
|
||||
# ===========================
|
||||
# BUILD IMAGE
|
||||
@ -11,24 +12,61 @@ ARG VERSION=v0.2.0
|
||||
FROM rust:1.91.0-slim-bookworm AS builder
|
||||
|
||||
ARG VERSION
|
||||
ARG CIRCUITS_OVERRIDE
|
||||
|
||||
LABEL maintainer="augustinas@status.im" \
|
||||
source="https://github.com/logos-co/nomos-node" \
|
||||
description="Nomos testnet build image"
|
||||
|
||||
WORKDIR /nomos
|
||||
WORKDIR /workspace
|
||||
COPY . .
|
||||
|
||||
# Install dependencies needed for building RocksDB.
|
||||
RUN apt-get update && apt-get install -yq \
|
||||
git gcc g++ clang libssl-dev pkg-config ca-certificates curl
|
||||
git gcc g++ clang make cmake m4 xz-utils libgmp-dev libssl-dev pkg-config ca-certificates curl wget
|
||||
|
||||
RUN chmod +x scripts/setup-nomos-circuits.sh && \
|
||||
scripts/setup-nomos-circuits.sh "$VERSION" "/opt/circuits"
|
||||
RUN mkdir -p /opt/circuits && \
|
||||
select_circuits_source() { \
|
||||
# Prefer an explicit override when it exists (file or directory). \
|
||||
if [ -n "$CIRCUITS_OVERRIDE" ] && [ -e "/workspace/${CIRCUITS_OVERRIDE}" ]; then \
|
||||
echo "/workspace/${CIRCUITS_OVERRIDE}"; \
|
||||
return 0; \
|
||||
fi; \
|
||||
# Fall back to the workspace bundle shipped with the repo. \
|
||||
if [ -e "/workspace/tests/kzgrs/kzgrs_test_params" ]; then \
|
||||
echo "/workspace/tests/kzgrs/kzgrs_test_params"; \
|
||||
return 0; \
|
||||
fi; \
|
||||
return 1; \
|
||||
}; \
|
||||
if CIRCUITS_PATH="$(select_circuits_source)"; then \
|
||||
echo "Using prebuilt circuits bundle from ${CIRCUITS_PATH#/workspace/}"; \
|
||||
if [ -d "$CIRCUITS_PATH" ]; then \
|
||||
cp -R "${CIRCUITS_PATH}/." /opt/circuits; \
|
||||
else \
|
||||
cp "${CIRCUITS_PATH}" /opt/circuits/; \
|
||||
fi; \
|
||||
fi; \
|
||||
if [ ! -f "/opt/circuits/pol/verification_key.json" ]; then \
|
||||
echo "Local circuits missing pol artifacts; downloading ${VERSION} bundle and rebuilding"; \
|
||||
chmod +x scripts/setup-nomos-circuits.sh && \
|
||||
NOMOS_CIRCUITS_REBUILD_RAPIDSNARK=1 \
|
||||
RAPIDSNARK_BUILD_GMP=1 \
|
||||
scripts/setup-nomos-circuits.sh "$VERSION" "/opt/circuits"; \
|
||||
fi
|
||||
|
||||
ENV NOMOS_CIRCUITS=/opt/circuits
|
||||
ENV CARGO_TARGET_DIR=/workspace/target
|
||||
|
||||
RUN cargo build --release --all-features
|
||||
# Fetch the nomos-node sources pinned in Cargo.lock and build the runtime binaries.
|
||||
RUN git clone https://github.com/logos-co/nomos-node.git /workspace/nomos-node && \
|
||||
cd /workspace/nomos-node && \
|
||||
git fetch --depth 1 origin 2f60a0372c228968c3526c341ebc7e58bbd178dd && \
|
||||
git checkout 2f60a0372c228968c3526c341ebc7e58bbd178dd && \
|
||||
cargo build --release --all-features --bins
|
||||
|
||||
# Build cfgsync binaries from this workspace.
|
||||
RUN cargo build --release --locked --manifest-path /workspace/testnet/cfgsync/Cargo.toml --bins
|
||||
|
||||
# ===========================
|
||||
# NODE IMAGE
|
||||
@ -50,11 +88,11 @@ RUN apt-get update && apt-get install -yq \
|
||||
|
||||
COPY --from=builder /opt/circuits /opt/circuits
|
||||
|
||||
COPY --from=builder /nomos/target/release/nomos-node /usr/bin/nomos-node
|
||||
COPY --from=builder /nomos/target/release/nomos-executor /usr/bin/nomos-executor
|
||||
COPY --from=builder /nomos/target/release/nomos-cli /usr/bin/nomos-cli
|
||||
COPY --from=builder /nomos/target/release/cfgsync-server /usr/bin/cfgsync-server
|
||||
COPY --from=builder /nomos/target/release/cfgsync-client /usr/bin/cfgsync-client
|
||||
COPY --from=builder /workspace/target/release/nomos-node /usr/bin/nomos-node
|
||||
COPY --from=builder /workspace/target/release/nomos-executor /usr/bin/nomos-executor
|
||||
COPY --from=builder /workspace/target/release/nomos-cli /usr/bin/nomos-cli
|
||||
COPY --from=builder /workspace/target/release/cfgsync-server /usr/bin/cfgsync-server
|
||||
COPY --from=builder /workspace/target/release/cfgsync-client /usr/bin/cfgsync-client
|
||||
|
||||
ENV NOMOS_CIRCUITS=/opt/circuits
|
||||
|
||||
|
||||
38
testnet/scripts/build_test_image.sh
Executable file
38
testnet/scripts/build_test_image.sh
Executable file
@ -0,0 +1,38 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
# Builds the testnet image with circuits. Prefers a local circuits bundle
|
||||
# (tests/kzgrs/kzgrs_test_params) or a custom override; otherwise downloads
|
||||
# from logos-co/nomos-circuits.
|
||||
|
||||
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
||||
IMAGE_TAG="${IMAGE_TAG:-nomos-testnet:local}"
|
||||
VERSION="${VERSION:-v0.3.1}"
|
||||
CIRCUITS_OVERRIDE="${CIRCUITS_OVERRIDE:-tests/kzgrs/kzgrs_test_params}"
|
||||
|
||||
echo "Workspace root: ${ROOT_DIR}"
|
||||
echo "Image tag: ${IMAGE_TAG}"
|
||||
echo "Circuits override: ${CIRCUITS_OVERRIDE:-<none>}"
|
||||
echo "Circuits version (fallback download): ${VERSION}"
|
||||
|
||||
build_args=(
|
||||
-f "${ROOT_DIR}/testnet/Dockerfile"
|
||||
-t "${IMAGE_TAG}"
|
||||
"${ROOT_DIR}"
|
||||
)
|
||||
|
||||
# Pass override/version args to the Docker build.
|
||||
if [ -n "${CIRCUITS_OVERRIDE}" ]; then
|
||||
build_args+=(--build-arg "CIRCUITS_OVERRIDE=${CIRCUITS_OVERRIDE}")
|
||||
fi
|
||||
build_args+=(--build-arg "VERSION=${VERSION}")
|
||||
|
||||
echo "Running: docker build ${build_args[*]}"
|
||||
docker build "${build_args[@]}"
|
||||
|
||||
cat <<EOF
|
||||
|
||||
Build complete.
|
||||
- Use this image in k8s/compose by exporting NOMOS_TESTNET_IMAGE=${IMAGE_TAG}
|
||||
- Circuits source: ${CIRCUITS_OVERRIDE:-download ${VERSION}}
|
||||
EOF
|
||||
@ -14,5 +14,9 @@ export CFG_FILE_PATH="/config.yaml" \
|
||||
# persist state.
|
||||
mkdir -p /recovery
|
||||
|
||||
/usr/bin/cfgsync-client && \
|
||||
exec /usr/bin/nomos-executor /config.yaml
|
||||
/usr/bin/cfgsync-client
|
||||
|
||||
# Align bootstrap timing with validators to keep configs consistent.
|
||||
sed -i "s/prolonged_bootstrap_period: .*/prolonged_bootstrap_period: '3.000000000'/" /config.yaml
|
||||
|
||||
exec /usr/bin/nomos-executor /config.yaml
|
||||
|
||||
@ -14,5 +14,9 @@ export CFG_FILE_PATH="/config.yaml" \
|
||||
# persist state.
|
||||
mkdir -p /recovery
|
||||
|
||||
/usr/bin/cfgsync-client && \
|
||||
exec /usr/bin/nomos-node /config.yaml
|
||||
/usr/bin/cfgsync-client
|
||||
|
||||
# Align bootstrap timing with executors to keep configs consistent.
|
||||
sed -i "s/prolonged_bootstrap_period: .*/prolonged_bootstrap_period: '3.000000000'/" /config.yaml
|
||||
|
||||
exec /usr/bin/nomos-node /config.yaml
|
||||
|
||||
76
testnet/scripts/setup-nomos-circuits.sh
Normal file
76
testnet/scripts/setup-nomos-circuits.sh
Normal file
@ -0,0 +1,76 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# Setup script for nomos-circuits
|
||||
#
|
||||
# Usage: ./setup-nomos-circuits.sh [VERSION] [INSTALL_DIR]
|
||||
# VERSION - Optional. Version to install (default: v0.3.1)
|
||||
# INSTALL_DIR - Optional. Installation directory (default: $HOME/.nomos-circuits)
|
||||
#
|
||||
# Examples:
|
||||
# ./setup-nomos-circuits.sh # Install default version to default location
|
||||
# ./setup-nomos-circuits.sh v0.2.0 # Install specific version to default location
|
||||
# ./setup-nomos-circuits.sh v0.2.0 /opt/circuits # Install to custom location
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
VERSION="${1:-v0.3.1}"
|
||||
DEFAULT_INSTALL_DIR="$HOME/.nomos-circuits"
|
||||
INSTALL_DIR="${2:-$DEFAULT_INSTALL_DIR}"
|
||||
REPO="logos-co/nomos-circuits"
|
||||
|
||||
detect_platform() {
|
||||
local os=""
|
||||
local arch=""
|
||||
case "$(uname -s)" in
|
||||
Linux*) os="linux" ;;
|
||||
Darwin*) os="macos" ;;
|
||||
MINGW*|MSYS*|CYGWIN*) os="windows" ;;
|
||||
*) echo "Unsupported operating system: $(uname -s)" >&2; exit 1 ;;
|
||||
esac
|
||||
case "$(uname -m)" in
|
||||
x86_64) arch="x86_64" ;;
|
||||
aarch64|arm64) arch="aarch64" ;;
|
||||
*) echo "Unsupported architecture: $(uname -m)" >&2; exit 1 ;;
|
||||
esac
|
||||
echo "${os}-${arch}"
|
||||
}
|
||||
|
||||
download_release() {
|
||||
local platform="$1"
|
||||
local artifact="nomos-circuits-${VERSION}-${platform}.tar.gz"
|
||||
local url="https://github.com/${REPO}/releases/download/${VERSION}/${artifact}"
|
||||
local temp_dir
|
||||
temp_dir=$(mktemp -d)
|
||||
|
||||
echo "Downloading nomos-circuits ${VERSION} for ${platform}..."
|
||||
if [ -n "${GITHUB_TOKEN:-}" ]; then
|
||||
auth_header="Authorization: Bearer ${GITHUB_TOKEN}"
|
||||
else
|
||||
auth_header=""
|
||||
fi
|
||||
|
||||
if ! curl -L ${auth_header:+-H "$auth_header"} -o "${temp_dir}/${artifact}" "${url}"; then
|
||||
echo "Failed to download release artifact from ${url}" >&2
|
||||
rm -rf "${temp_dir}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Extracting to ${INSTALL_DIR}..."
|
||||
rm -rf "${INSTALL_DIR}"
|
||||
mkdir -p "${INSTALL_DIR}"
|
||||
if ! tar -xzf "${temp_dir}/${artifact}" -C "${INSTALL_DIR}" --strip-components=1; then
|
||||
echo "Failed to extract ${artifact}" >&2
|
||||
rm -rf "${temp_dir}"
|
||||
exit 1
|
||||
fi
|
||||
rm -rf "${temp_dir}"
|
||||
}
|
||||
|
||||
platform=$(detect_platform)
|
||||
echo "Setting up nomos-circuits ${VERSION} for ${platform}"
|
||||
echo "Installing to ${INSTALL_DIR}"
|
||||
|
||||
download_release "${platform}"
|
||||
|
||||
echo "Installation complete. Circuits installed at: ${INSTALL_DIR}"
|
||||
echo "If using a custom directory, set NOMOS_CIRCUITS=${INSTALL_DIR}"
|
||||
Loading…
x
Reference in New Issue
Block a user