logos-blockchain-testing/book/src/manual-cluster.md

# Manual Clusters: Imperative Control

**When should I read this?** You're integrating external test drivers (like Cucumber/BDD frameworks) that need imperative node orchestration. This is an escape hatch for when the test orchestration must live outside the framework—most tests should use the standard scenario approach.

---

## Overview

**Manual clusters** provide imperative, on-demand node control for scenarios that don't fit the declarative `ScenarioBuilder` pattern:

```rust
use testing_framework_core::topology::config::TopologyConfig;
use testing_framework_core::scenario::{PeerSelection, StartNodeOptions};
use testing_framework_runner_local::LocalDeployer;

let config = TopologyConfig::with_node_numbers(3);
let deployer = LocalDeployer::new();
let cluster = deployer.manual_cluster(config)?;

// Start nodes on demand with explicit peer selection
let node_a = cluster.start_node_with(
    "a",
    StartNodeOptions {
        peers: PeerSelection::None, // Start isolated
    }
).await?.api;

let node_b = cluster.start_node_with(
    "b",
    StartNodeOptions {
        peers: PeerSelection::Named(vec!["node-a".to_owned()]), // Connect to A
    }
).await?.api;

// Wait for network readiness
cluster.wait_network_ready().await?;

// Custom validation logic
let info_a = node_a.consensus_info().await?;
let info_b = node_b.consensus_info().await?;
assert!(info_a.height.abs_diff(info_b.height) <= 5);
```

**Key difference from scenarios:**
- **External orchestration:** Your code (or an external driver like Cucumber) controls the execution flow step-by-step
- **Imperative model:** You call `start_node()`, `sleep()`, poll APIs directly in test logic
- **No framework execution:** The scenario runner doesn't drive workloads—you do

Note: Scenarios with node control can also start nodes dynamically, control peer selection, and orchestrate timing—but via **workloads** within the framework's execution model. Use manual clusters only when the orchestration must be external (e.g., Cucumber steps).

---

## When to Use Manual Clusters

**Manual clusters are an escape hatch for when orchestration must live outside the framework.**

Prefer workloads for scenario logic; use manual clusters only when an external system needs to control node lifecycle—for example:

**Cucumber/BDD integration**  
Gherkin steps control when nodes start, which peers they connect to, and when to verify state. The test driver (Cucumber) orchestrates the scenario step-by-step.

**Custom test harnesses**  
External scripts or tools that need programmatic control over node lifecycle as part of a larger testing pipeline.

---

## Core API

### Starting the Cluster

```rust
use testing_framework_core::topology::config::TopologyConfig;
use testing_framework_runner_local::LocalDeployer;

// Define capacity (preallocates ports/configs for N nodes)
let config = TopologyConfig::with_node_numbers(5);

let deployer = LocalDeployer::new();
let cluster = deployer.manual_cluster(config)?;
// Nodes are stopped automatically when cluster is dropped
```

**Important:** The `TopologyConfig` defines the **maximum capacity**, not the initial state. Nodes are started on-demand via API calls.

### Starting Nodes

**Default peers (topology layout):**

```rust
let node = cluster.start_node("seed").await?;
```

**No peers (isolated):**

```rust
use testing_framework_core::scenario::{PeerSelection, StartNodeOptions};

let node = cluster.start_node_with(
    "isolated",
    StartNodeOptions {
        peers: PeerSelection::None,
    }
).await?;
```

**Explicit peers (named):**

```rust
let node = cluster.start_node_with(
    "follower",
    StartNodeOptions {
        peers: PeerSelection::Named(vec![
            "node-seed".to_owned(),
            "node-isolated".to_owned(),
        ]),
    }
).await?;
```

**Note:** Node names are prefixed with `node-` internally. If you start a node with name `"a"`, reference it as `"node-a"` in peer lists.

### Getting Node Clients

```rust
// From start result
let started = cluster.start_node("my-node").await?;
let client = started.api;

// Or lookup by name
if let Some(client) = cluster.node_client("node-my-node") {
    let info = client.consensus_info().await?;
    println!("Height: {}", info.height);
}
```

### Waiting for Readiness

```rust
// Waits until all started nodes have connected to their expected peers
cluster.wait_network_ready().await?;
```

**Behavior:**
- Single-node clusters always ready (no peers to verify)
- Multi-node clusters wait for peer counts to match expectations
- Timeout after 60 seconds (120 seconds if `SLOW_TEST_ENV=true`) with diagnostic message

---

## Complete Example: External Test Driver Pattern

This shows how an external test driver (like Cucumber) might use manual clusters to control node lifecycle:

```rust
use std::time::Duration;
use anyhow::Result;
use testing_framework_core::{
    scenario::{PeerSelection, StartNodeOptions},
    topology::config::TopologyConfig,
};
use testing_framework_runner_local::LocalDeployer;
use tokio::time::sleep;

#[tokio::test]
async fn external_driver_example() -> Result<()> {
    // Step 1: Create cluster with capacity for 3 nodes
    let config = TopologyConfig::with_node_numbers(3);
    let deployer = LocalDeployer::new();
    let cluster = deployer.manual_cluster(config)?;

    // Step 2: External driver decides to start 2 nodes initially
    println!("Starting initial topology...");
    let node_a = cluster.start_node("a").await?.api;
    let node_b = cluster
        .start_node_with(
            "b",
            StartNodeOptions {
                peers: PeerSelection::Named(vec!["node-a".to_owned()]),
            },
        )
        .await?
        .api;

    cluster.wait_network_ready().await?;

    // Step 3: External driver runs some protocol operations
    let info = node_a.consensus_info().await?;
    println!("Initial cluster height: {}", info.height);

    // Step 4: Later, external driver decides to add third node
    println!("External driver adding third node...");
    let node_c = cluster
        .start_node_with(
            "c",
            StartNodeOptions {
                peers: PeerSelection::Named(vec!["node-a".to_owned()]),
            },
        )
        .await?
        .api;

    cluster.wait_network_ready().await?;

    // Step 5: External driver validates final state
    let heights = vec![
        node_a.consensus_info().await?.height,
        node_b.consensus_info().await?.height,
        node_c.consensus_info().await?.height,
    ];
    println!("Final heights: {:?}", heights);

    Ok(())
}
```

**Key pattern:**
The external driver controls **when** nodes start and **which peers** they connect to, allowing test frameworks like Cucumber to orchestrate scenarios step-by-step based on Gherkin steps or other external logic.

---

## Peer Selection Strategies

**`PeerSelection::DefaultLayout`**  
Uses the topology's network layout (star/chain/full). Default behavior.

```rust
let node = cluster.start_node_with(
    "normal",
    StartNodeOptions {
        peers: PeerSelection::DefaultLayout,
    }
).await?;
```

**`PeerSelection::None`**  
Node starts with no initial peers. Use when an external driver needs to build topology incrementally.

```rust
let isolated = cluster.start_node_with(
    "isolated",
    StartNodeOptions {
        peers: PeerSelection::None,
    }
).await?;
```

**`PeerSelection::Named(vec!["node-a", "node-b"])`**  
Explicit peer list. Use when an external driver needs to construct specific peer relationships.

```rust
let follower = cluster.start_node_with(
    "follower",
    StartNodeOptions {
        peers: PeerSelection::Named(vec![
            "node-seed".to_owned(),
            "node-seed".to_owned(),
        ]),
    }
).await?;
```

**Remember:** Node names are automatically prefixed with `node-`. If you call `start_node("a")`, reference it as `"node-a"` in peer lists.

---

## Custom Validation Patterns

Manual clusters don't have built-in expectations—you write validation logic directly:

### Height Convergence

```rust
use tokio::time::{sleep, Duration};

let start = tokio::time::Instant::now();
loop {
    let heights: Vec<u64> = vec![
        node_a.consensus_info().await?.height,
        node_b.consensus_info().await?.height,
        node_c.consensus_info().await?.height,
    ];

    let max_diff = heights.iter().max().unwrap() - heights.iter().min().unwrap();
    if max_diff <= 5 {
        println!("Converged: heights={:?}", heights);
        break;
    }

    if start.elapsed() > Duration::from_secs(60) {
        return Err(anyhow::anyhow!("Convergence timeout: heights={:?}", heights));
    }

    sleep(Duration::from_secs(2)).await;
}
```

### Peer Count Verification

```rust
let info = node.network_info().await?;
assert_eq!(
    info.n_peers, 3,
    "Expected 3 peers, found {}",
    info.n_peers
);
```

### Block Production

```rust
// Verify node is producing blocks
let initial_height = node_a.consensus_info().await?.height;

sleep(Duration::from_secs(10)).await;

let current_height = node_a.consensus_info().await?.height;
assert!(
    current_height > initial_height,
    "Node should have produced blocks: initial={}, current={}",
    initial_height,
    current_height
);
```

---

## Limitations

**Local deployer only**  
Manual clusters currently only work with `LocalDeployer`. Compose and K8s support is not available.

**No built-in workloads**  
You must manually submit transactions via node API clients. The framework's transaction workloads are scenario-specific.

**No automatic expectations**  
You wire validation yourself. The `.expect_*()` methods from scenarios are not automatically attached—you write custom validation loops.

**No RunContext**  
Manual clusters don't provide `RunContext`, so features like `BlockFeed` and metrics queries require manual setup.

---

## Relationship to Node Control

Manual clusters and [node control](node-control.md) share the same underlying infrastructure (`LocalDynamicNodes`), but serve different purposes:

| Feature | Manual Cluster | Node Control (Scenario) |
|---------|---------------|-------------------------|
| **Orchestration** | External (your code/Cucumber) | Framework (workloads) |
| **Programming model** | Imperative (step-by-step) | Declarative (plan + execute) |
| **Node lifecycle** | Manual `start_node()` calls | Automatic + workload-driven |
| **Traffic generation** | Manual API calls | Built-in workloads (tx, chaos) |
| **Validation** | Manual polling loops | Built-in expectations + custom |
| **Use case** | Cucumber/BDD integration | Standard testing & chaos |

**When to use which:**
- **Scenarios with node control** → Standard testing (built-in workloads drive node control)
- **Manual clusters** → External drivers (Cucumber/BDD where external logic drives node control)

---

## Running Manual Cluster Tests

Manual cluster tests are typically marked with `#[ignore]` to prevent accidental runs:

```rust
#[tokio::test]
#[ignore = "run manually with: cargo test -- --ignored external_driver_example"]
async fn external_driver_example() -> Result<()> {
    // ...
}
```

**To run:**

```bash
# Required: dev mode for fast proofs
cargo test -p runner-examples -- --ignored external_driver_example
```

**Logs:**

```bash
# Preserve logs after test
LOGOS_BLOCKCHAIN_TESTS_KEEP_LOGS=1 \
RUST_LOG=info \
cargo test -p runner-examples -- --ignored external_driver_example
```

---

## See Also

- [Testing Philosophy](testing-philosophy.md) — Why the framework is declarative by default
- [RunContext: BlockFeed & Node Control](node-control.md) — Node control within scenarios
- [Chaos Testing](chaos.md) — Restart-based chaos (scenario approach)
- [Scenario Builder Extensions](scenario-builder-ext-patterns.md) — Extending the declarative model
docs: sync book with current framework 2026-01-26 16:36:51 +01:00			`# Manual Clusters: Imperative Control`

			`When should I read this? You're integrating external test drivers (like Cucumber/BDD frameworks) that need imperative node orchestration. This is an escape hatch for when the test orchestration must live outside the framework—most tests should use the standard scenario approach.`

			`---`

			`## Overview`

			Manual clusters provide imperative, on-demand node control for scenarios that don't fit the declarative `ScenarioBuilder` pattern:

			```rust
			`use testing_framework_core::topology::config::TopologyConfig;`
			`use testing_framework_core::scenario::{PeerSelection, StartNodeOptions};`
			`use testing_framework_runner_local::LocalDeployer;`

			`let config = TopologyConfig::with_node_numbers(3);`
			`let deployer = LocalDeployer::new();`
			`let cluster = deployer.manual_cluster(config)?;`

			`// Start nodes on demand with explicit peer selection`
			`let node_a = cluster.start_node_with(`
			`"a",`
			`StartNodeOptions {`
			`peers: PeerSelection::None, // Start isolated`
			`}`
			`).await?.api;`

			`let node_b = cluster.start_node_with(`
			`"b",`
			`StartNodeOptions {`
			`peers: PeerSelection::Named(vec!["node-a".to_owned()]), // Connect to A`
			`}`
			`).await?.api;`

			`// Wait for network readiness`
			`cluster.wait_network_ready().await?;`

			`// Custom validation logic`
			`let info_a = node_a.consensus_info().await?;`
			`let info_b = node_b.consensus_info().await?;`
			`assert!(info_a.height.abs_diff(info_b.height) <= 5);`
			```

			`Key difference from scenarios:`
			`- External orchestration: Your code (or an external driver like Cucumber) controls the execution flow step-by-step`
			- Imperative model: You call `start_node()`, `sleep()`, poll APIs directly in test logic
			`- No framework execution: The scenario runner doesn't drive workloads—you do`

			`Note: Scenarios with node control can also start nodes dynamically, control peer selection, and orchestrate timing—but via workloads within the framework's execution model. Use manual clusters only when the orchestration must be external (e.g., Cucumber steps).`

			`---`

			`## When to Use Manual Clusters`

			`Manual clusters are an escape hatch for when orchestration must live outside the framework.`

			`Prefer workloads for scenario logic; use manual clusters only when an external system needs to control node lifecycle—for example:`

			`Cucumber/BDD integration`
			`Gherkin steps control when nodes start, which peers they connect to, and when to verify state. The test driver (Cucumber) orchestrates the scenario step-by-step.`

			`Custom test harnesses`
			`External scripts or tools that need programmatic control over node lifecycle as part of a larger testing pipeline.`

			`---`

			`## Core API`

			`### Starting the Cluster`

			```rust
			`use testing_framework_core::topology::config::TopologyConfig;`
			`use testing_framework_runner_local::LocalDeployer;`

			`// Define capacity (preallocates ports/configs for N nodes)`
			`let config = TopologyConfig::with_node_numbers(5);`

			`let deployer = LocalDeployer::new();`
			`let cluster = deployer.manual_cluster(config)?;`
			`// Nodes are stopped automatically when cluster is dropped`
			```

			Important: The `TopologyConfig` defines the maximum capacity, not the initial state. Nodes are started on-demand via API calls.

			`### Starting Nodes`

			`Default peers (topology layout):`

			```rust
			`let node = cluster.start_node("seed").await?;`
			```

			`No peers (isolated):`

			```rust
			`use testing_framework_core::scenario::{PeerSelection, StartNodeOptions};`

			`let node = cluster.start_node_with(`
			`"isolated",`
			`StartNodeOptions {`
			`peers: PeerSelection::None,`
			`}`
			`).await?;`
			```

			`Explicit peers (named):`

			```rust
			`let node = cluster.start_node_with(`
			`"follower",`
			`StartNodeOptions {`
			`peers: PeerSelection::Named(vec![`
			`"node-seed".to_owned(),`
			`"node-isolated".to_owned(),`
			`]),`
			`}`
			`).await?;`
			```

			Note: Node names are prefixed with `node-` internally. If you start a node with name `"a"`, reference it as `"node-a"` in peer lists.

			`### Getting Node Clients`

			```rust
			`// From start result`
			`let started = cluster.start_node("my-node").await?;`
			`let client = started.api;`

			`// Or lookup by name`
			`if let Some(client) = cluster.node_client("node-my-node") {`
			`let info = client.consensus_info().await?;`
			`println!("Height: {}", info.height);`
			`}`
			```

			`### Waiting for Readiness`

			```rust
			`// Waits until all started nodes have connected to their expected peers`
			`cluster.wait_network_ready().await?;`
			```

			`Behavior:`
			`- Single-node clusters always ready (no peers to verify)`
			`- Multi-node clusters wait for peer counts to match expectations`
			- Timeout after 60 seconds (120 seconds if `SLOW_TEST_ENV=true`) with diagnostic message

			`---`

			`## Complete Example: External Test Driver Pattern`

			`This shows how an external test driver (like Cucumber) might use manual clusters to control node lifecycle:`

			```rust
			`use std::time::Duration;`
			`use anyhow::Result;`
			`use testing_framework_core::{`
			`scenario::{PeerSelection, StartNodeOptions},`
			`topology::config::TopologyConfig,`
			`};`
			`use testing_framework_runner_local::LocalDeployer;`
			`use tokio::time::sleep;`

			`#[tokio::test]`
			`async fn external_driver_example() -> Result<()> {`
			`// Step 1: Create cluster with capacity for 3 nodes`
			`let config = TopologyConfig::with_node_numbers(3);`
			`let deployer = LocalDeployer::new();`
			`let cluster = deployer.manual_cluster(config)?;`

			`// Step 2: External driver decides to start 2 nodes initially`
			`println!("Starting initial topology...");`
			`let node_a = cluster.start_node("a").await?.api;`
			`let node_b = cluster`
			`.start_node_with(`
			`"b",`
			`StartNodeOptions {`
			`peers: PeerSelection::Named(vec!["node-a".to_owned()]),`
			`},`
			`)`
			`.await?`
			`.api;`

			`cluster.wait_network_ready().await?;`

			`// Step 3: External driver runs some protocol operations`
			`let info = node_a.consensus_info().await?;`
			`println!("Initial cluster height: {}", info.height);`

			`// Step 4: Later, external driver decides to add third node`
			`println!("External driver adding third node...");`
			`let node_c = cluster`
			`.start_node_with(`
			`"c",`
			`StartNodeOptions {`
			`peers: PeerSelection::Named(vec!["node-a".to_owned()]),`
			`},`
			`)`
			`.await?`
			`.api;`

			`cluster.wait_network_ready().await?;`

			`// Step 5: External driver validates final state`
			`let heights = vec![`
			`node_a.consensus_info().await?.height,`
			`node_b.consensus_info().await?.height,`
			`node_c.consensus_info().await?.height,`
			`];`
			`println!("Final heights: {:?}", heights);`

			`Ok(())`
			`}`
			```

			`Key pattern:`
			`The external driver controls when nodes start and which peers they connect to, allowing test frameworks like Cucumber to orchestrate scenarios step-by-step based on Gherkin steps or other external logic.`

			`---`

			`## Peer Selection Strategies`

			`PeerSelection::DefaultLayout`
			`Uses the topology's network layout (star/chain/full). Default behavior.`

			```rust
			`let node = cluster.start_node_with(`
			`"normal",`
			`StartNodeOptions {`
			`peers: PeerSelection::DefaultLayout,`
			`}`
			`).await?;`
			```

			`PeerSelection::None`
			`Node starts with no initial peers. Use when an external driver needs to build topology incrementally.`

			```rust
			`let isolated = cluster.start_node_with(`
			`"isolated",`
			`StartNodeOptions {`
			`peers: PeerSelection::None,`
			`}`
			`).await?;`
			```

			`PeerSelection::Named(vec!["node-a", "node-b"])`
			`Explicit peer list. Use when an external driver needs to construct specific peer relationships.`

			```rust
			`let follower = cluster.start_node_with(`
			`"follower",`
			`StartNodeOptions {`
			`peers: PeerSelection::Named(vec![`
			`"node-seed".to_owned(),`
			`"node-seed".to_owned(),`
			`]),`
			`}`
			`).await?;`
			```

			Remember: Node names are automatically prefixed with `node-`. If you call `start_node("a")`, reference it as `"node-a"` in peer lists.

			`---`

			`## Custom Validation Patterns`

			`Manual clusters don't have built-in expectations—you write validation logic directly:`

			`### Height Convergence`

			```rust
			`use tokio::time::{sleep, Duration};`

			`let start = tokio::time::Instant::now();`
			`loop {`
			`let heights: Vec<u64> = vec![`
			`node_a.consensus_info().await?.height,`
			`node_b.consensus_info().await?.height,`
			`node_c.consensus_info().await?.height,`
			`];`

			`let max_diff = heights.iter().max().unwrap() - heights.iter().min().unwrap();`
			`if max_diff <= 5 {`
			`println!("Converged: heights={:?}", heights);`
			`break;`
			`}`

			`if start.elapsed() > Duration::from_secs(60) {`
			`return Err(anyhow::anyhow!("Convergence timeout: heights={:?}", heights));`
			`}`

			`sleep(Duration::from_secs(2)).await;`
			`}`
			```

			`### Peer Count Verification`

			```rust
			`let info = node.network_info().await?;`
			`assert_eq!(`
			`info.n_peers, 3,`
			`"Expected 3 peers, found {}",`
			`info.n_peers`
			`);`
			```

			`### Block Production`

			```rust
			`// Verify node is producing blocks`
			`let initial_height = node_a.consensus_info().await?.height;`

			`sleep(Duration::from_secs(10)).await;`

			`let current_height = node_a.consensus_info().await?.height;`
			`assert!(`
			`current_height > initial_height,`
			`"Node should have produced blocks: initial={}, current={}",`
			`initial_height,`
			`current_height`
			`);`
			```

			`---`

			`## Limitations`

			`Local deployer only`
			Manual clusters currently only work with `LocalDeployer`. Compose and K8s support is not available.

			`No built-in workloads`
			`You must manually submit transactions via node API clients. The framework's transaction workloads are scenario-specific.`

			`No automatic expectations`
			You wire validation yourself. The `.expect_*()` methods from scenarios are not automatically attached—you write custom validation loops.

			`No RunContext`
			Manual clusters don't provide `RunContext`, so features like `BlockFeed` and metrics queries require manual setup.

			`---`

			`## Relationship to Node Control`

			Manual clusters and [node control](node-control.md) share the same underlying infrastructure (`LocalDynamicNodes`), but serve different purposes:

			`\| Feature \| Manual Cluster \| Node Control (Scenario) \|`
			`\|---------\|---------------\|-------------------------\|`
			`\| Orchestration \| External (your code/Cucumber) \| Framework (workloads) \|`
			`\| Programming model \| Imperative (step-by-step) \| Declarative (plan + execute) \|`
			\| Node lifecycle \| Manual `start_node()` calls \| Automatic + workload-driven \|
			`\| Traffic generation \| Manual API calls \| Built-in workloads (tx, chaos) \|`
			`\| Validation \| Manual polling loops \| Built-in expectations + custom \|`
			`\| Use case \| Cucumber/BDD integration \| Standard testing & chaos \|`

			`When to use which:`
			`- Scenarios with node control → Standard testing (built-in workloads drive node control)`
			`- Manual clusters → External drivers (Cucumber/BDD where external logic drives node control)`

			`---`

			`## Running Manual Cluster Tests`

			Manual cluster tests are typically marked with `#[ignore]` to prevent accidental runs:

			```rust
			`#[tokio::test]`
			`#[ignore = "run manually with: cargo test -- --ignored external_driver_example"]`
			`async fn external_driver_example() -> Result<()> {`
			`// ...`
			`}`
			```

			`To run:`

			```bash
			`# Required: dev mode for fast proofs`
			`cargo test -p runner-examples -- --ignored external_driver_example`
			```

			`Logs:`

			```bash
			`# Preserve logs after test`
			`LOGOS_BLOCKCHAIN_TESTS_KEEP_LOGS=1 \`
			`RUST_LOG=info \`
			`cargo test -p runner-examples -- --ignored external_driver_example`
			```

			`---`

			`## See Also`

			`- [Testing Philosophy](testing-philosophy.md) — Why the framework is declarative by default`
			`- [RunContext: BlockFeed & Node Control](node-control.md) — Node control within scenarios`
			`- [Chaos Testing](chaos.md) — Restart-based chaos (scenario approach)`
			`- [Scenario Builder Extensions](scenario-builder-ext-patterns.md) — Extending the declarative model`