Node Control & RunContext

The deployer supplies a RunContext that workloads and expectations share. It provides:

Topology descriptors (GeneratedTopology)
Client handles (NodeClients / ClusterClient) for HTTP/RPC calls
Metrics (RunMetrics, Metrics) and block feed
Optional NodeControlHandle for managing nodes

Current Chaos Capabilities and Limitations

The framework currently supports process-level chaos (node restarts) for resilience testing:

Supported:

Restart validators (restart_validator)
Restart executors (restart_executor)
Random restart workload via .chaos().restart()

Not Yet Supported:

Network partitions (blocking peers, packet loss)
Resource constraints (CPU throttling, memory limits)
Byzantine behavior injection (invalid blocks, bad signatures)
Selective peer blocking/unblocking

For network partition testing, see Extension Ideas which describes the proposed block_peer/unblock_peer API (not yet implemented).

Accessing node control in workloads/expectations

Check for control support and use it conditionally:

use testing_framework_core::scenario::{Expectation, RunContext, Workload};

struct RestartWorkload;

impl Workload for RestartWorkload {
    fn name(&self) -> &'static str { "restart_workload" }

    async fn start(&self, ctx: &RunContext) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
        if let Some(control) = ctx.node_control() {
            // Restart the first validator (index 0) if supported.
            control.restart_validator(0).await?;
        }
        Ok(())
    }
}

When chaos workloads need control, require enable_node_control() in the scenario builder and deploy with a runner that supports it.

Current API surface

The NodeControlHandle trait currently provides:

pub trait NodeControlHandle: Send + Sync {
    async fn restart_validator(&self, index: usize) -> Result<(), DynError>;
    async fn restart_executor(&self, index: usize) -> Result<(), DynError>;
}

Future extensions may include peer blocking/unblocking or other control operations. For now, focus on restart-based chaos patterns as shown in the chaos workload examples.

Considerations

Always guard control usage: not all runners expose NodeControlHandle.
Treat control as best-effort: failures should surface as test failures, but workloads should degrade gracefully when control is absent.
Combine control actions with expectations (e.g., restart then assert height convergence) to keep scenarios meaningful.

2.5 KiB Raw Blame History

Node Control & RunContext

Current Chaos Capabilities and Limitations

Accessing node control in workloads/expectations

Current API surface

Considerations

2.5 KiB

Raw Blame History