mirror of
https://github.com/logos-blockchain/logos-blockchain-testing.git
synced 2026-04-12 06:03:09 +00:00
58 lines
2.4 KiB
Markdown
58 lines
2.4 KiB
Markdown
# Chaos Workloads
|
||
|
||
> **When should I read this?** You don't need chaos testing to be productive with the framework. Focus on basic scenarios first—chaos is for resilience validation and operational readiness drills once your core tests are stable.
|
||
|
||
Chaos in the framework uses node control to introduce failures and validate
|
||
recovery. The built-in restart workload lives in
|
||
`testing_framework_workflows::workloads::chaos::RandomRestartWorkload`.
|
||
|
||
## How it works
|
||
- Requires `NodeControlCapability` (`enable_node_control()` in the scenario
|
||
builder) and a runner that provides a `NodeControlHandle`.
|
||
- Randomly selects nodes to restart based on your
|
||
include/exclude flags.
|
||
- Respects min/max delay between restarts and a target cooldown to avoid
|
||
flapping the same node too frequently.
|
||
- Runs alongside other workloads; expectations should account for the added
|
||
disruption.
|
||
- Support varies by runner: node control is not provided by the local runner
|
||
and is not yet implemented for the k8s runner. Use a runner that advertises
|
||
`NodeControlHandle` support (e.g., compose) for chaos workloads.
|
||
|
||
## Usage
|
||
```rust,ignore
|
||
use std::time::Duration;
|
||
|
||
use testing_framework_core::scenario::ScenarioBuilder;
|
||
use testing_framework_workflows::{ScenarioBuilderExt, workloads::chaos::RandomRestartWorkload};
|
||
|
||
pub fn random_restart_plan() -> testing_framework_core::scenario::Scenario<
|
||
testing_framework_core::scenario::NodeControlCapability,
|
||
> {
|
||
ScenarioBuilder::topology_with(|t| t.network_star().nodes(2))
|
||
.enable_node_control()
|
||
.with_workload(RandomRestartWorkload::new(
|
||
Duration::from_secs(45), // min delay
|
||
Duration::from_secs(75), // max delay
|
||
Duration::from_secs(120), // target cooldown
|
||
true, // include nodes
|
||
))
|
||
.expect_consensus_liveness()
|
||
.with_run_duration(Duration::from_secs(150))
|
||
.build()
|
||
}
|
||
```
|
||
|
||
## Expectations to pair
|
||
- **Consensus liveness**: ensure blocks keep progressing despite restarts.
|
||
- **Height convergence**: optionally check all nodes converge after the chaos
|
||
window.
|
||
- Any workload-specific inclusion checks if you’re also driving transactions.
|
||
|
||
## Best practices
|
||
- Keep delays/cooldowns realistic; avoid back-to-back restarts that would never
|
||
happen in production.
|
||
- Limit chaos scope: toggle nodes based on what you want to
|
||
test.
|
||
- Combine with observability: monitor metrics/logs to explain failures.
|