mirror of
https://github.com/logos-blockchain/logos-blockchain-testing.git
synced 2026-01-06 23:33:07 +00:00
2.2 KiB
2.2 KiB
Chaos Workloads
Chaos in the framework uses node control to introduce failures and validate
recovery. The built-in restart workload lives in
testing_framework_workflows::workloads::chaos::RandomRestartWorkload.
How it works
- Requires
NodeControlCapability(enable_node_control()in the scenario builder) and a runner that provides aNodeControlHandle. - Randomly selects nodes (validators, executors) to restart based on your include/exclude flags.
- Respects min/max delay between restarts and a target cooldown to avoid flapping the same node too frequently.
- Runs alongside other workloads; expectations should account for the added disruption.
- Support varies by runner: node control is not provided by the local runner
and is not yet implemented for the k8s runner. Use a runner that advertises
NodeControlHandlesupport (e.g., compose) for chaos workloads.
Usage
use std::time::Duration;
use testing_framework_core::scenario::ScenarioBuilder;
use testing_framework_workflows::workloads::chaos::RandomRestartWorkload;
let plan = ScenarioBuilder::topology_with(|t| {
t.network_star()
.validators(2)
.executors(1)
})
.enable_node_control()
.with_workload(RandomRestartWorkload::new(
Duration::from_secs(45), // min delay
Duration::from_secs(75), // max delay
Duration::from_secs(120), // target cooldown
true, // include validators
true, // include executors
))
.expect_consensus_liveness()
.with_run_duration(Duration::from_secs(150))
.build();
// deploy with a runner that supports node control and run the scenario
Expectations to pair
- Consensus liveness: ensure blocks keep progressing despite restarts.
- Height convergence: optionally check all nodes converge after the chaos window.
- Any workload-specific inclusion checks if you’re also driving tx/DA traffic.
Best practices
- Keep delays/cooldowns realistic; avoid back-to-back restarts that would never happen in production.
- Limit chaos scope: toggle validators vs executors based on what you want to test.
- Combine with observability: monitor metrics/logs to explain failures.