mirror of
https://github.com/logos-blockchain/logos-blockchain-testing.git
synced 2026-05-28 12:39:29 +00:00
feat(core): add observation runtime foundation
This commit is contained in:
parent
e4dbb8bb85
commit
d3bb6348e5
@ -6,7 +6,7 @@ exclude-dev = true
|
|||||||
no-default-features = true
|
no-default-features = true
|
||||||
|
|
||||||
[advisories]
|
[advisories]
|
||||||
ignore = []
|
ignore = ["RUSTSEC-2026-0097"]
|
||||||
yanked = "deny"
|
yanked = "deny"
|
||||||
|
|
||||||
[bans]
|
[bans]
|
||||||
|
|||||||
314
docs/observation-runtime-plan.md
Normal file
314
docs/observation-runtime-plan.md
Normal file
@ -0,0 +1,314 @@
|
|||||||
|
# Observation Runtime Plan
|
||||||
|
|
||||||
|
## Why this work exists
|
||||||
|
|
||||||
|
TF is good at deployment plumbing. It is weak at continuous observation.
|
||||||
|
|
||||||
|
Today, the same problems are solved repeatedly with custom loops:
|
||||||
|
- TF block feed logic in Logos
|
||||||
|
- Cucumber manual-cluster polling loops
|
||||||
|
- ad hoc catch-up scans for wallet and chain state
|
||||||
|
- app-local state polling in expectations
|
||||||
|
|
||||||
|
That is the gap this work should close.
|
||||||
|
|
||||||
|
The goal is not a generic "distributed systems DSL".
|
||||||
|
The goal is one reusable observation runtime that:
|
||||||
|
- continuously collects data from dynamic sources
|
||||||
|
- keeps typed materialized state
|
||||||
|
- exposes both current snapshot and delta/history views
|
||||||
|
- fits naturally in TF scenarios and Cucumber manual-cluster code
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
### TF constraints
|
||||||
|
- TF abstractions must stay universal and simple.
|
||||||
|
- TF must not know app semantics like blocks, wallets, leaders, jobs, or topics.
|
||||||
|
- TF must remain useful for simple apps such as `openraft_kv`, not only Logos.
|
||||||
|
|
||||||
|
### App constraints
|
||||||
|
- Apps must be able to build richer abstractions on top of TF.
|
||||||
|
- Logos must be able to support:
|
||||||
|
- current block-feed replacement
|
||||||
|
- fork-aware chain state
|
||||||
|
- public-peer sync targets
|
||||||
|
- multi-wallet UTXO tracking
|
||||||
|
- Apps must be able to adopt this incrementally.
|
||||||
|
|
||||||
|
### Migration constraints
|
||||||
|
- We do not want a flag-day rewrite.
|
||||||
|
- Existing loops can coexist with the new runtime until replacements are proven.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
This work should not:
|
||||||
|
- put feed back onto the base `Application` trait
|
||||||
|
- build app-specific semantics into TF core
|
||||||
|
- replace filesystem blockchain snapshots used for startup/restore
|
||||||
|
- force every app to use continuous observation
|
||||||
|
- introduce a large public abstraction stack that nobody can explain
|
||||||
|
|
||||||
|
## Core idea
|
||||||
|
|
||||||
|
Introduce one TF-level observation runtime.
|
||||||
|
|
||||||
|
That runtime owns:
|
||||||
|
- source refresh
|
||||||
|
- scheduling
|
||||||
|
- polling/ingestion
|
||||||
|
- bounded history
|
||||||
|
- latest snapshot caching
|
||||||
|
- delta publication
|
||||||
|
- freshness/error tracking
|
||||||
|
- lifecycle hooks for TF and Cucumber
|
||||||
|
|
||||||
|
Apps own:
|
||||||
|
- source types
|
||||||
|
- raw observation logic
|
||||||
|
- materialized state
|
||||||
|
- snapshot shape
|
||||||
|
- delta/event shape
|
||||||
|
- higher-level projections such as wallet state
|
||||||
|
|
||||||
|
## Public TF surface
|
||||||
|
|
||||||
|
The TF public surface should stay small.
|
||||||
|
|
||||||
|
### `ObservedSource<S>`
|
||||||
|
A named source instance.
|
||||||
|
|
||||||
|
Used for:
|
||||||
|
- local node clients
|
||||||
|
- public peer endpoints
|
||||||
|
- any other app-owned source type
|
||||||
|
|
||||||
|
### `SourceProvider<S>`
|
||||||
|
Returns the current source set.
|
||||||
|
|
||||||
|
This must support dynamic source lists because:
|
||||||
|
- manual cluster nodes come and go
|
||||||
|
- Cucumber worlds may attach public peers
|
||||||
|
- node control may restart or replace sources during a run
|
||||||
|
|
||||||
|
### `Observer`
|
||||||
|
App-owned observation logic.
|
||||||
|
|
||||||
|
It defines:
|
||||||
|
- `Source`
|
||||||
|
- `State`
|
||||||
|
- `Snapshot`
|
||||||
|
- `Event`
|
||||||
|
|
||||||
|
And it implements:
|
||||||
|
- `init(...)`
|
||||||
|
- `poll(...)`
|
||||||
|
- `snapshot(...)`
|
||||||
|
|
||||||
|
The important boundary is:
|
||||||
|
- TF owns the runtime
|
||||||
|
- app code owns materialization
|
||||||
|
|
||||||
|
### `ObservationRuntime`
|
||||||
|
The engine that:
|
||||||
|
- starts the loop
|
||||||
|
- refreshes sources
|
||||||
|
- calls `poll(...)`
|
||||||
|
- stores history
|
||||||
|
- publishes deltas
|
||||||
|
- updates latest snapshot
|
||||||
|
- tracks last error and freshness
|
||||||
|
|
||||||
|
### `ObservationHandle`
|
||||||
|
The read-side interface for workloads, expectations, and Cucumber steps.
|
||||||
|
|
||||||
|
It should expose at least:
|
||||||
|
- latest snapshot
|
||||||
|
- delta subscription
|
||||||
|
- bounded history
|
||||||
|
- last error
|
||||||
|
|
||||||
|
## Intended shape
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct ObservedSource<S> {
|
||||||
|
pub name: String,
|
||||||
|
pub source: S,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
pub trait SourceProvider<S>: Send + Sync + 'static {
|
||||||
|
async fn sources(&self) -> Vec<ObservedSource<S>>;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
pub trait Observer: Send + Sync + 'static {
|
||||||
|
type Source: Clone + Send + Sync + 'static;
|
||||||
|
type State: Send + Sync + 'static;
|
||||||
|
type Snapshot: Clone + Send + Sync + 'static;
|
||||||
|
type Event: Clone + Send + Sync + 'static;
|
||||||
|
|
||||||
|
async fn init(
|
||||||
|
&self,
|
||||||
|
sources: &[ObservedSource<Self::Source>],
|
||||||
|
) -> Result<Self::State, DynError>;
|
||||||
|
|
||||||
|
async fn poll(
|
||||||
|
&self,
|
||||||
|
sources: &[ObservedSource<Self::Source>],
|
||||||
|
state: &mut Self::State,
|
||||||
|
) -> Result<Vec<Self::Event>, DynError>;
|
||||||
|
|
||||||
|
fn snapshot(&self, state: &Self::State) -> Self::Snapshot;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This is enough.
|
||||||
|
|
||||||
|
If more helper layers are needed, they should stay internal first.
|
||||||
|
|
||||||
|
## How current use cases fit
|
||||||
|
|
||||||
|
### `openraft_kv`
|
||||||
|
Use one simple observer.
|
||||||
|
|
||||||
|
- sources: node clients
|
||||||
|
- state: latest per-node Raft state
|
||||||
|
- snapshot: sorted node-state view
|
||||||
|
- events: optional deltas, possibly empty at first
|
||||||
|
|
||||||
|
This is the simplest proving case.
|
||||||
|
It validates the runtime without dragging in Logos complexity.
|
||||||
|
|
||||||
|
### Logos block feed replacement
|
||||||
|
Use one shared chain observer.
|
||||||
|
|
||||||
|
- sources: local node clients
|
||||||
|
- state:
|
||||||
|
- node heads
|
||||||
|
- block graph
|
||||||
|
- heights
|
||||||
|
- seen headers
|
||||||
|
- recent history
|
||||||
|
- snapshot:
|
||||||
|
- current head/lib/graph summary
|
||||||
|
- events:
|
||||||
|
- newly discovered blocks
|
||||||
|
|
||||||
|
This covers both existing Logos feed use cases:
|
||||||
|
- current snapshot consumers
|
||||||
|
- delta/subscription consumers
|
||||||
|
|
||||||
|
### Cucumber manual-cluster sync
|
||||||
|
Use the same observer runtime with a different source set.
|
||||||
|
|
||||||
|
- sources:
|
||||||
|
- local manual-cluster node clients
|
||||||
|
- public peer endpoints
|
||||||
|
- state:
|
||||||
|
- local consensus views
|
||||||
|
- public consensus views
|
||||||
|
- derived majority public target
|
||||||
|
- snapshot:
|
||||||
|
- current local and public sync picture
|
||||||
|
|
||||||
|
This removes custom poll/sleep loops from steps.
|
||||||
|
|
||||||
|
### Multi-wallet fork-aware tracking
|
||||||
|
This should not be a TF concept.
|
||||||
|
|
||||||
|
It should be a Logos projection built on top of the shared chain observer.
|
||||||
|
|
||||||
|
- input: chain observer state
|
||||||
|
- output: per-header wallet state cache keyed by block header
|
||||||
|
- property: naturally fork-aware because it follows actual ancestry
|
||||||
|
|
||||||
|
That replaces repeated backward scans from tip with continuous maintained state.
|
||||||
|
|
||||||
|
## Logos layering
|
||||||
|
|
||||||
|
Logos should not put every concern into one giant impl.
|
||||||
|
|
||||||
|
Recommended layering:
|
||||||
|
|
||||||
|
1. **Chain source adapter**
|
||||||
|
- local node reads
|
||||||
|
- public peer reads
|
||||||
|
|
||||||
|
2. **Shared chain observer**
|
||||||
|
- catch-up
|
||||||
|
- continuous ingestion
|
||||||
|
- graph/history materialization
|
||||||
|
|
||||||
|
3. **Logos projections**
|
||||||
|
- head view
|
||||||
|
- public sync target
|
||||||
|
- fork graph queries
|
||||||
|
- wallet state
|
||||||
|
- tx inclusion helpers
|
||||||
|
|
||||||
|
TF provides the runtime.
|
||||||
|
Logos provides the domain model built on top.
|
||||||
|
|
||||||
|
## Adoption plan
|
||||||
|
|
||||||
|
### Phase 1: add TF observation runtime
|
||||||
|
- add `ObservedSource`, `SourceProvider`, `Observer`, `ObservationRuntime`, `ObservationHandle`
|
||||||
|
- keep the public API small
|
||||||
|
- no app migrations yet
|
||||||
|
|
||||||
|
### Phase 2: prove it on `openraft_kv`
|
||||||
|
- add one simple observer over `/state`
|
||||||
|
- migrate one expectation to use the observation handle
|
||||||
|
- validate local, compose, and k8s
|
||||||
|
|
||||||
|
### Phase 3: add Logos shared chain observer
|
||||||
|
- implement it alongside current feed/loops
|
||||||
|
- do not remove existing consumers yet
|
||||||
|
- prove snapshot and delta outputs are useful
|
||||||
|
|
||||||
|
### Phase 4: migrate one Logos consumer at a time
|
||||||
|
Suggested order:
|
||||||
|
1. fork/head snapshot consumer
|
||||||
|
2. tx inclusion consumer
|
||||||
|
3. Cucumber sync-to-public-chain logic
|
||||||
|
4. wallet/UTXO tracking
|
||||||
|
|
||||||
|
### Phase 5: delete old loops and feed paths
|
||||||
|
- only after the new runtime has replaced real consumers cleanly
|
||||||
|
|
||||||
|
## Validation gates
|
||||||
|
|
||||||
|
Each phase should have clear checks.
|
||||||
|
|
||||||
|
### Runtime-level
|
||||||
|
- crate-level `cargo check`
|
||||||
|
- targeted tests for runtime lifecycle and history retention
|
||||||
|
- explicit tests for dynamic source refresh
|
||||||
|
|
||||||
|
### App-level
|
||||||
|
- `openraft_kv`:
|
||||||
|
- local failover
|
||||||
|
- compose failover
|
||||||
|
- k8s failover
|
||||||
|
- Logos:
|
||||||
|
- one snapshot consumer migrated
|
||||||
|
- one delta consumer migrated
|
||||||
|
- Cucumber:
|
||||||
|
- one manual-cluster sync path migrated
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
These should stay open until implementation forces a decision:
|
||||||
|
- whether `ObservationHandle` should expose full history directly or only cursor/subscription access
|
||||||
|
- how much error/freshness metadata belongs in the generic runtime vs app snapshot types
|
||||||
|
- whether multiple observers should share one scheduler/runtime instance or simply run independently first
|
||||||
|
|
||||||
|
## Design guardrails
|
||||||
|
|
||||||
|
When implementing this work:
|
||||||
|
- keep TF public abstractions minimal
|
||||||
|
- keep app semantics out of TF core
|
||||||
|
- do not chase a generic testing DSL
|
||||||
|
- build from reusable blocks, not one-off mega impls
|
||||||
|
- keep migration incremental
|
||||||
|
- prefer simple, explainable runtime behavior over clever abstraction
|
||||||
@ -29,5 +29,5 @@ reqwest = { features = ["json"], workspace = true }
|
|||||||
serde = { workspace = true }
|
serde = { workspace = true }
|
||||||
serde_yaml = { workspace = true }
|
serde_yaml = { workspace = true }
|
||||||
thiserror = { workspace = true }
|
thiserror = { workspace = true }
|
||||||
tokio = { features = ["macros", "process", "rt-multi-thread", "time"], workspace = true }
|
tokio = { features = ["macros", "process", "rt-multi-thread", "sync", "time"], workspace = true }
|
||||||
tracing = { workspace = true }
|
tracing = { workspace = true }
|
||||||
|
|||||||
@ -1,5 +1,6 @@
|
|||||||
pub mod cfgsync;
|
pub mod cfgsync;
|
||||||
pub mod env;
|
pub mod env;
|
||||||
|
pub mod observation;
|
||||||
pub mod runtime;
|
pub mod runtime;
|
||||||
pub mod scenario;
|
pub mod scenario;
|
||||||
pub mod topology;
|
pub mod topology;
|
||||||
|
|||||||
161
testing-framework/core/src/observation/factory.rs
Normal file
161
testing-framework/core/src/observation/factory.rs
Normal file
@ -0,0 +1,161 @@
|
|||||||
|
use std::{marker::PhantomData, sync::Arc};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
|
||||||
|
use super::{
|
||||||
|
ObservationConfig, ObservationHandle, ObservationRuntime, ObservedSource, Observer,
|
||||||
|
SourceProvider,
|
||||||
|
};
|
||||||
|
use crate::scenario::{
|
||||||
|
Application, DynError, NodeClients, PreparedRuntimeExtension, RuntimeExtensionFactory,
|
||||||
|
};
|
||||||
|
|
||||||
|
/// Boxed source provider used by observation factories.
|
||||||
|
pub type BoxedSourceProvider<S> = Box<dyn SourceProvider<S>>;
|
||||||
|
|
||||||
|
/// Builds an observation source provider once node clients are available.
|
||||||
|
pub trait SourceProviderFactory<E: Application, S>: Send + Sync + 'static {
|
||||||
|
/// Builds the source provider for one scenario run.
|
||||||
|
fn build_source_provider(
|
||||||
|
&self,
|
||||||
|
deployment: &E::Deployment,
|
||||||
|
node_clients: NodeClients<E>,
|
||||||
|
) -> Result<BoxedSourceProvider<S>, DynError>;
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<E, S, F> SourceProviderFactory<E, S> for F
|
||||||
|
where
|
||||||
|
E: Application,
|
||||||
|
S: Clone + Send + Sync + 'static,
|
||||||
|
F: Fn(&E::Deployment, NodeClients<E>) -> Result<BoxedSourceProvider<S>, DynError>
|
||||||
|
+ Send
|
||||||
|
+ Sync
|
||||||
|
+ 'static,
|
||||||
|
{
|
||||||
|
fn build_source_provider(
|
||||||
|
&self,
|
||||||
|
deployment: &E::Deployment,
|
||||||
|
node_clients: NodeClients<E>,
|
||||||
|
) -> Result<BoxedSourceProvider<S>, DynError> {
|
||||||
|
self(deployment, node_clients)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Fixed source provider for scenario runs with a stable source set.
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
pub struct StaticSourceProvider<S> {
|
||||||
|
sources: Vec<ObservedSource<S>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<S> StaticSourceProvider<S> {
|
||||||
|
/// Builds a provider from a fixed source list.
|
||||||
|
#[must_use]
|
||||||
|
pub fn new(sources: Vec<ObservedSource<S>>) -> Self {
|
||||||
|
Self { sources }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<S> SourceProvider<S> for StaticSourceProvider<S>
|
||||||
|
where
|
||||||
|
S: Clone + Send + Sync + 'static,
|
||||||
|
{
|
||||||
|
async fn sources(&self) -> Result<Vec<ObservedSource<S>>, DynError> {
|
||||||
|
Ok(self.sources.clone())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Runtime extension factory that starts one observer and stores its handle in
|
||||||
|
/// `RunContext`.
|
||||||
|
pub struct ObservationExtensionFactory<E: Application, O: Observer> {
|
||||||
|
observer_builder: Arc<dyn Fn() -> O + Send + Sync>,
|
||||||
|
source_provider_factory: Arc<dyn SourceProviderFactory<E, O::Source>>,
|
||||||
|
config: ObservationConfig,
|
||||||
|
env_marker: PhantomData<E>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<E: Application, O: Observer> ObservationExtensionFactory<E, O> {
|
||||||
|
/// Builds an observation extension factory from builders.
|
||||||
|
#[must_use]
|
||||||
|
pub fn from_parts(
|
||||||
|
observer_builder: impl Fn() -> O + Send + Sync + 'static,
|
||||||
|
source_provider_factory: impl SourceProviderFactory<E, O::Source>,
|
||||||
|
config: ObservationConfig,
|
||||||
|
) -> Self {
|
||||||
|
Self {
|
||||||
|
observer_builder: Arc::new(observer_builder),
|
||||||
|
source_provider_factory: Arc::new(source_provider_factory),
|
||||||
|
config,
|
||||||
|
env_marker: PhantomData,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<E, O> ObservationExtensionFactory<E, O>
|
||||||
|
where
|
||||||
|
E: Application,
|
||||||
|
O: Observer + Clone,
|
||||||
|
{
|
||||||
|
/// Builds an observation extension factory from one clonable observer.
|
||||||
|
#[must_use]
|
||||||
|
pub fn new(
|
||||||
|
observer: O,
|
||||||
|
source_provider_factory: impl SourceProviderFactory<E, O::Source>,
|
||||||
|
config: ObservationConfig,
|
||||||
|
) -> Self {
|
||||||
|
Self::from_parts(move || observer.clone(), source_provider_factory, config)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<E, O> RuntimeExtensionFactory<E> for ObservationExtensionFactory<E, O>
|
||||||
|
where
|
||||||
|
E: Application,
|
||||||
|
O: Observer,
|
||||||
|
{
|
||||||
|
async fn prepare(
|
||||||
|
&self,
|
||||||
|
deployment: &E::Deployment,
|
||||||
|
node_clients: NodeClients<E>,
|
||||||
|
) -> Result<PreparedRuntimeExtension, DynError> {
|
||||||
|
let source_provider = self
|
||||||
|
.source_provider_factory
|
||||||
|
.build_source_provider(deployment, node_clients)?;
|
||||||
|
|
||||||
|
let observer = (self.observer_builder)();
|
||||||
|
let runtime =
|
||||||
|
ObservationRuntime::start(source_provider, observer, self.config.clone()).await?;
|
||||||
|
|
||||||
|
let (handle, task) = runtime.into_parts();
|
||||||
|
|
||||||
|
Ok(PreparedRuntimeExtension::from_task(handle, task))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<S, P> SourceProvider<S> for Box<P>
|
||||||
|
where
|
||||||
|
S: Clone + Send + Sync + 'static,
|
||||||
|
P: SourceProvider<S> + ?Sized,
|
||||||
|
{
|
||||||
|
async fn sources(&self) -> Result<Vec<ObservedSource<S>>, DynError> {
|
||||||
|
(**self).sources().await
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl<S, P> SourceProvider<S> for Arc<P>
|
||||||
|
where
|
||||||
|
S: Clone + Send + Sync + 'static,
|
||||||
|
P: SourceProvider<S> + ?Sized,
|
||||||
|
{
|
||||||
|
async fn sources(&self) -> Result<Vec<ObservedSource<S>>, DynError> {
|
||||||
|
(**self).sources().await
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<O: Observer> From<ObservationHandle<O>> for PreparedRuntimeExtension {
|
||||||
|
fn from(handle: ObservationHandle<O>) -> Self {
|
||||||
|
PreparedRuntimeExtension::new(handle)
|
||||||
|
}
|
||||||
|
}
|
||||||
503
testing-framework/core/src/observation/mod.rs
Normal file
503
testing-framework/core/src/observation/mod.rs
Normal file
@ -0,0 +1,503 @@
|
|||||||
|
//! Generic continuous observation runtime.
|
||||||
|
//!
|
||||||
|
//! This module provides the reusable runtime needed by both TF scenarios and
|
||||||
|
//! manual-cluster consumers such as Cucumber worlds. It does not know any app
|
||||||
|
//! semantics. Apps provide source types, observation logic, materialized state,
|
||||||
|
//! snapshots, and delta events.
|
||||||
|
|
||||||
|
mod factory;
|
||||||
|
|
||||||
|
use std::{
|
||||||
|
any::type_name,
|
||||||
|
collections::VecDeque,
|
||||||
|
sync::Arc,
|
||||||
|
time::{Duration, SystemTime},
|
||||||
|
};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
pub use factory::{
|
||||||
|
BoxedSourceProvider, ObservationExtensionFactory, SourceProviderFactory, StaticSourceProvider,
|
||||||
|
};
|
||||||
|
use parking_lot::Mutex;
|
||||||
|
use tokio::{
|
||||||
|
sync::broadcast,
|
||||||
|
task::JoinHandle,
|
||||||
|
time::{MissedTickBehavior, interval},
|
||||||
|
};
|
||||||
|
use tracing::{debug, info, warn};
|
||||||
|
|
||||||
|
use crate::scenario::DynError;
|
||||||
|
|
||||||
|
/// Configuration for a background observation runtime.
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
pub struct ObservationConfig {
|
||||||
|
/// Time between observation cycles.
|
||||||
|
pub interval: Duration,
|
||||||
|
/// Maximum number of non-empty event batches retained in memory.
|
||||||
|
pub history_limit: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for ObservationConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
interval: Duration::from_secs(1),
|
||||||
|
history_limit: 64,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// One named observation source.
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
pub struct ObservedSource<S> {
|
||||||
|
/// Human-readable source name used in logs and app-level reporting.
|
||||||
|
pub name: String,
|
||||||
|
/// App-owned source handle.
|
||||||
|
pub source: S,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<S> ObservedSource<S> {
|
||||||
|
/// Builds one named observation source.
|
||||||
|
#[must_use]
|
||||||
|
pub fn new(name: &str, source: S) -> Self {
|
||||||
|
Self {
|
||||||
|
name: name.to_owned(),
|
||||||
|
source,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Supplies the current observation source set.
|
||||||
|
#[async_trait]
|
||||||
|
pub trait SourceProvider<S>: Send + Sync + 'static {
|
||||||
|
/// Returns the current source set for the next observation cycle.
|
||||||
|
async fn sources(&self) -> Result<Vec<ObservedSource<S>>, DynError>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// App-owned observation logic.
|
||||||
|
#[async_trait]
|
||||||
|
pub trait Observer: Send + Sync + 'static {
|
||||||
|
/// App-owned source type.
|
||||||
|
type Source: Clone + Send + Sync + 'static;
|
||||||
|
/// App-owned retained materialized state.
|
||||||
|
type State: Send + Sync + 'static;
|
||||||
|
/// App-owned current snapshot view.
|
||||||
|
type Snapshot: Clone + Send + Sync + 'static;
|
||||||
|
/// App-owned delta event type emitted per cycle.
|
||||||
|
type Event: Clone + Send + Sync + 'static;
|
||||||
|
|
||||||
|
/// Builds the initial retained state from the current source set.
|
||||||
|
async fn init(&self, sources: &[ObservedSource<Self::Source>])
|
||||||
|
-> Result<Self::State, DynError>;
|
||||||
|
|
||||||
|
/// Advances retained state by one cycle and returns any new delta events.
|
||||||
|
async fn poll(
|
||||||
|
&self,
|
||||||
|
sources: &[ObservedSource<Self::Source>],
|
||||||
|
state: &mut Self::State,
|
||||||
|
) -> Result<Vec<Self::Event>, DynError>;
|
||||||
|
|
||||||
|
/// Builds the current snapshot view from retained state.
|
||||||
|
fn snapshot(&self, state: &Self::State) -> Self::Snapshot;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// One materialized snapshot emitted by the runtime.
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
pub struct ObservationSnapshot<S> {
|
||||||
|
/// Monotonic cycle number.
|
||||||
|
pub cycle: u64,
|
||||||
|
/// Capture timestamp.
|
||||||
|
pub observed_at: SystemTime,
|
||||||
|
/// Number of sources used for this snapshot.
|
||||||
|
pub source_count: usize,
|
||||||
|
/// App-owned snapshot payload.
|
||||||
|
pub value: S,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// One delta batch emitted by a successful observation cycle.
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
pub struct ObservationBatch<E> {
|
||||||
|
/// Monotonic cycle number.
|
||||||
|
pub cycle: u64,
|
||||||
|
/// Capture timestamp.
|
||||||
|
pub observed_at: SystemTime,
|
||||||
|
/// Number of sources used for this batch.
|
||||||
|
pub source_count: usize,
|
||||||
|
/// App-owned delta events discovered in this cycle.
|
||||||
|
pub events: Vec<E>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Observation runtime failure stage.
|
||||||
|
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
|
||||||
|
pub enum ObservationFailureStage {
|
||||||
|
/// Source refresh failed before a poll could run.
|
||||||
|
SourceRefresh,
|
||||||
|
/// Observer poll failed after sources were refreshed.
|
||||||
|
Poll,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Last failed observation cycle.
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
pub struct ObservationFailure {
|
||||||
|
/// Monotonic cycle number.
|
||||||
|
pub cycle: u64,
|
||||||
|
/// Failure timestamp.
|
||||||
|
pub observed_at: SystemTime,
|
||||||
|
/// Number of sources involved in the failed cycle.
|
||||||
|
pub source_count: usize,
|
||||||
|
/// Runtime stage that failed.
|
||||||
|
pub stage: ObservationFailureStage,
|
||||||
|
/// Human-readable failure message.
|
||||||
|
pub message: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Errors returned while starting an observation runtime.
|
||||||
|
#[derive(Debug, thiserror::Error)]
|
||||||
|
pub enum ObservationRuntimeError {
|
||||||
|
/// The configured interval is invalid.
|
||||||
|
#[error("observation interval must be greater than zero")]
|
||||||
|
InvalidInterval,
|
||||||
|
/// Source discovery failed during runtime startup.
|
||||||
|
#[error("failed to refresh observation sources during startup: {source}")]
|
||||||
|
SourceRefresh {
|
||||||
|
#[source]
|
||||||
|
source: DynError,
|
||||||
|
},
|
||||||
|
/// Observer state initialization failed during runtime startup.
|
||||||
|
#[error("failed to initialize observation state: {source}")]
|
||||||
|
ObserverInit {
|
||||||
|
#[source]
|
||||||
|
source: DynError,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Read-side handle for one running observer.
|
||||||
|
pub struct ObservationHandle<O: Observer> {
|
||||||
|
shared: Arc<Mutex<SharedObservationState<O>>>,
|
||||||
|
batches: broadcast::Sender<Arc<ObservationBatch<O::Event>>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<O: Observer> Clone for ObservationHandle<O> {
|
||||||
|
fn clone(&self) -> Self {
|
||||||
|
Self {
|
||||||
|
shared: Arc::clone(&self.shared),
|
||||||
|
batches: self.batches.clone(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<O: Observer> ObservationHandle<O> {
|
||||||
|
/// Returns the latest successful snapshot, if one has been produced.
|
||||||
|
#[must_use]
|
||||||
|
pub fn latest_snapshot(&self) -> Option<ObservationSnapshot<O::Snapshot>> {
|
||||||
|
self.shared.lock().latest_snapshot.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns retained non-empty event batches.
|
||||||
|
#[must_use]
|
||||||
|
pub fn history(&self) -> Vec<Arc<ObservationBatch<O::Event>>> {
|
||||||
|
self.shared.lock().history.iter().cloned().collect()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns the most recent cycle failure, if any.
|
||||||
|
#[must_use]
|
||||||
|
pub fn last_error(&self) -> Option<ObservationFailure> {
|
||||||
|
self.shared.lock().last_error.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Subscribes to future non-empty event batches.
|
||||||
|
#[must_use]
|
||||||
|
pub fn subscribe(&self) -> broadcast::Receiver<Arc<ObservationBatch<O::Event>>> {
|
||||||
|
self.batches.subscribe()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Lifecycle owner for one background observation runtime.
|
||||||
|
pub struct ObservationRuntime<O: Observer> {
|
||||||
|
handle: ObservationHandle<O>,
|
||||||
|
task: Option<JoinHandle<()>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<O: Observer> ObservationRuntime<O> {
|
||||||
|
/// Starts one background observation runtime.
|
||||||
|
pub async fn start<P>(
|
||||||
|
provider: P,
|
||||||
|
observer: O,
|
||||||
|
config: ObservationConfig,
|
||||||
|
) -> Result<Self, ObservationRuntimeError>
|
||||||
|
where
|
||||||
|
P: SourceProvider<O::Source>,
|
||||||
|
{
|
||||||
|
ensure_positive_interval(config.interval)?;
|
||||||
|
|
||||||
|
let sources = provider
|
||||||
|
.sources()
|
||||||
|
.await
|
||||||
|
.map_err(|source| ObservationRuntimeError::SourceRefresh { source })?;
|
||||||
|
|
||||||
|
let source_count = sources.len();
|
||||||
|
let state = observer
|
||||||
|
.init(&sources)
|
||||||
|
.await
|
||||||
|
.map_err(|source| ObservationRuntimeError::ObserverInit { source })?;
|
||||||
|
|
||||||
|
let snapshot = build_snapshot(0, source_count, &observer, &state);
|
||||||
|
let batches = broadcast::channel(config.history_limit.max(1)).0;
|
||||||
|
let shared = Arc::new(Mutex::new(SharedObservationState::new(snapshot)));
|
||||||
|
let handle = ObservationHandle {
|
||||||
|
shared: Arc::clone(&shared),
|
||||||
|
batches,
|
||||||
|
};
|
||||||
|
|
||||||
|
info!(
|
||||||
|
observer = type_name::<O>(),
|
||||||
|
interval_ms = config.interval.as_millis(),
|
||||||
|
history_limit = config.history_limit,
|
||||||
|
source_count,
|
||||||
|
"starting observation runtime"
|
||||||
|
);
|
||||||
|
|
||||||
|
let runtime_handle = handle.clone();
|
||||||
|
let task = tokio::spawn(run_observation_loop(
|
||||||
|
provider,
|
||||||
|
observer,
|
||||||
|
config,
|
||||||
|
shared,
|
||||||
|
runtime_handle.batches.clone(),
|
||||||
|
state,
|
||||||
|
));
|
||||||
|
|
||||||
|
Ok(Self {
|
||||||
|
handle: runtime_handle,
|
||||||
|
task: Some(task),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns a read-side handle for the running observer.
|
||||||
|
#[must_use]
|
||||||
|
pub fn handle(&self) -> ObservationHandle<O> {
|
||||||
|
self.handle.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Splits the runtime into its handle and background task.
|
||||||
|
#[must_use]
|
||||||
|
pub fn into_parts(mut self) -> (ObservationHandle<O>, JoinHandle<()>) {
|
||||||
|
let task = self
|
||||||
|
.task
|
||||||
|
.take()
|
||||||
|
.expect("observation runtime task is always present before into_parts");
|
||||||
|
|
||||||
|
(self.handle.clone(), task)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Aborts the background task.
|
||||||
|
pub fn abort(&mut self) {
|
||||||
|
if let Some(task) = self.task.take() {
|
||||||
|
task.abort();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<O: Observer> Drop for ObservationRuntime<O> {
|
||||||
|
fn drop(&mut self) {
|
||||||
|
self.abort();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
struct SharedObservationState<O: Observer> {
|
||||||
|
latest_snapshot: Option<ObservationSnapshot<O::Snapshot>>,
|
||||||
|
history: VecDeque<Arc<ObservationBatch<O::Event>>>,
|
||||||
|
last_error: Option<ObservationFailure>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<O: Observer> SharedObservationState<O> {
|
||||||
|
fn new(snapshot: ObservationSnapshot<O::Snapshot>) -> Self {
|
||||||
|
Self {
|
||||||
|
latest_snapshot: Some(snapshot),
|
||||||
|
history: VecDeque::new(),
|
||||||
|
last_error: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn run_observation_loop<O, P>(
|
||||||
|
provider: P,
|
||||||
|
observer: O,
|
||||||
|
config: ObservationConfig,
|
||||||
|
shared: Arc<Mutex<SharedObservationState<O>>>,
|
||||||
|
batches: broadcast::Sender<Arc<ObservationBatch<O::Event>>>,
|
||||||
|
mut state: O::State,
|
||||||
|
) where
|
||||||
|
O: Observer,
|
||||||
|
P: SourceProvider<O::Source>,
|
||||||
|
{
|
||||||
|
let mut ticker = build_interval(config.interval);
|
||||||
|
let mut cycle = 1u64;
|
||||||
|
|
||||||
|
ticker.tick().await;
|
||||||
|
|
||||||
|
loop {
|
||||||
|
ticker.tick().await;
|
||||||
|
|
||||||
|
let cycle_outcome = observe_cycle(&provider, &observer, cycle, &mut state).await;
|
||||||
|
|
||||||
|
match cycle_outcome {
|
||||||
|
Ok(success) => record_cycle_success(&shared, &batches, &config, success),
|
||||||
|
Err(failure) => record_cycle_failure(&shared, failure),
|
||||||
|
}
|
||||||
|
|
||||||
|
cycle += 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
struct CycleSuccess<O: Observer> {
|
||||||
|
snapshot: ObservationSnapshot<O::Snapshot>,
|
||||||
|
batch: Option<Arc<ObservationBatch<O::Event>>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn observe_cycle<O, P>(
|
||||||
|
provider: &P,
|
||||||
|
observer: &O,
|
||||||
|
cycle: u64,
|
||||||
|
state: &mut O::State,
|
||||||
|
) -> Result<CycleSuccess<O>, ObservationFailure>
|
||||||
|
where
|
||||||
|
O: Observer,
|
||||||
|
P: SourceProvider<O::Source>,
|
||||||
|
{
|
||||||
|
let sources = provider.sources().await.map_err(|source| {
|
||||||
|
build_failure(cycle, 0, ObservationFailureStage::SourceRefresh, source)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let source_count = sources.len();
|
||||||
|
let events = observer.poll(&sources, state).await.map_err(|source| {
|
||||||
|
build_failure(cycle, source_count, ObservationFailureStage::Poll, source)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let snapshot = build_snapshot(cycle, source_count, observer, state);
|
||||||
|
let batch = build_batch(cycle, source_count, events);
|
||||||
|
|
||||||
|
Ok(CycleSuccess { snapshot, batch })
|
||||||
|
}
|
||||||
|
|
||||||
|
fn record_cycle_success<O: Observer>(
|
||||||
|
shared: &Arc<Mutex<SharedObservationState<O>>>,
|
||||||
|
batches: &broadcast::Sender<Arc<ObservationBatch<O::Event>>>,
|
||||||
|
config: &ObservationConfig,
|
||||||
|
success: CycleSuccess<O>,
|
||||||
|
) {
|
||||||
|
debug!(
|
||||||
|
observer = type_name::<O>(),
|
||||||
|
cycle = success.snapshot.cycle,
|
||||||
|
source_count = success.snapshot.source_count,
|
||||||
|
event_count = success.batch.as_ref().map_or(0, |batch| batch.events.len()),
|
||||||
|
"observation cycle completed"
|
||||||
|
);
|
||||||
|
|
||||||
|
let mut state = shared.lock();
|
||||||
|
state.latest_snapshot = Some(success.snapshot);
|
||||||
|
state.last_error = None;
|
||||||
|
|
||||||
|
let Some(batch) = success.batch else {
|
||||||
|
return;
|
||||||
|
};
|
||||||
|
|
||||||
|
push_history(&mut state.history, Arc::clone(&batch), config.history_limit);
|
||||||
|
drop(state);
|
||||||
|
|
||||||
|
let _ = batches.send(batch);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn record_cycle_failure<O: Observer>(
|
||||||
|
shared: &Arc<Mutex<SharedObservationState<O>>>,
|
||||||
|
failure: ObservationFailure,
|
||||||
|
) {
|
||||||
|
warn!(
|
||||||
|
observer = type_name::<O>(),
|
||||||
|
cycle = failure.cycle,
|
||||||
|
source_count = failure.source_count,
|
||||||
|
stage = ?failure.stage,
|
||||||
|
message = %failure.message,
|
||||||
|
"observation cycle failed"
|
||||||
|
);
|
||||||
|
|
||||||
|
shared.lock().last_error = Some(failure);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn ensure_positive_interval(interval: Duration) -> Result<(), ObservationRuntimeError> {
|
||||||
|
if interval.is_zero() {
|
||||||
|
return Err(ObservationRuntimeError::InvalidInterval);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build_interval(period: Duration) -> tokio::time::Interval {
|
||||||
|
let mut ticker = interval(period);
|
||||||
|
ticker.set_missed_tick_behavior(MissedTickBehavior::Delay);
|
||||||
|
ticker
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build_snapshot<O: Observer>(
|
||||||
|
cycle: u64,
|
||||||
|
source_count: usize,
|
||||||
|
observer: &O,
|
||||||
|
state: &O::State,
|
||||||
|
) -> ObservationSnapshot<O::Snapshot> {
|
||||||
|
ObservationSnapshot {
|
||||||
|
cycle,
|
||||||
|
observed_at: SystemTime::now(),
|
||||||
|
source_count,
|
||||||
|
value: observer.snapshot(state),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build_batch<E>(
|
||||||
|
cycle: u64,
|
||||||
|
source_count: usize,
|
||||||
|
events: Vec<E>,
|
||||||
|
) -> Option<Arc<ObservationBatch<E>>> {
|
||||||
|
if events.is_empty() {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
|
||||||
|
Some(Arc::new(ObservationBatch {
|
||||||
|
cycle,
|
||||||
|
observed_at: SystemTime::now(),
|
||||||
|
source_count,
|
||||||
|
events,
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build_failure(
|
||||||
|
cycle: u64,
|
||||||
|
source_count: usize,
|
||||||
|
stage: ObservationFailureStage,
|
||||||
|
source: DynError,
|
||||||
|
) -> ObservationFailure {
|
||||||
|
ObservationFailure {
|
||||||
|
cycle,
|
||||||
|
observed_at: SystemTime::now(),
|
||||||
|
source_count,
|
||||||
|
stage,
|
||||||
|
message: source.to_string(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn push_history<E>(
|
||||||
|
history: &mut VecDeque<Arc<ObservationBatch<E>>>,
|
||||||
|
batch: Arc<ObservationBatch<E>>,
|
||||||
|
history_limit: usize,
|
||||||
|
) {
|
||||||
|
if history_limit == 0 {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
history.push_back(batch);
|
||||||
|
|
||||||
|
while history.len() > history_limit {
|
||||||
|
history.pop_front();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests;
|
||||||
250
testing-framework/core/src/observation/tests.rs
Normal file
250
testing-framework/core/src/observation/tests.rs
Normal file
@ -0,0 +1,250 @@
|
|||||||
|
use std::{
|
||||||
|
sync::{
|
||||||
|
Arc,
|
||||||
|
atomic::{AtomicUsize, Ordering},
|
||||||
|
},
|
||||||
|
time::Duration,
|
||||||
|
};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use parking_lot::Mutex;
|
||||||
|
use tokio::time::{Instant, sleep};
|
||||||
|
|
||||||
|
use super::{
|
||||||
|
ObservationConfig, ObservationFailureStage, ObservationRuntime, ObservedSource, Observer,
|
||||||
|
SourceProvider,
|
||||||
|
};
|
||||||
|
use crate::scenario::DynError;
|
||||||
|
|
||||||
|
#[derive(Clone)]
|
||||||
|
struct TestSourceProvider {
|
||||||
|
sources: Arc<Mutex<Vec<ObservedSource<u64>>>>,
|
||||||
|
fail_refreshes: Arc<AtomicUsize>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TestSourceProvider {
|
||||||
|
fn new(sources: Vec<ObservedSource<u64>>) -> Self {
|
||||||
|
Self {
|
||||||
|
sources: Arc::new(Mutex::new(sources)),
|
||||||
|
fail_refreshes: Arc::new(AtomicUsize::new(0)),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn replace_sources(&self, sources: Vec<ObservedSource<u64>>) {
|
||||||
|
*self.sources.lock() = sources;
|
||||||
|
}
|
||||||
|
|
||||||
|
fn fail_next_refresh(&self) {
|
||||||
|
self.fail_refreshes.store(1, Ordering::SeqCst);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl SourceProvider<u64> for TestSourceProvider {
|
||||||
|
async fn sources(&self) -> Result<Vec<ObservedSource<u64>>, DynError> {
|
||||||
|
if self.fail_refreshes.swap(0, Ordering::SeqCst) == 1 {
|
||||||
|
return Err("refresh failed".into());
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(self.sources.lock().clone())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||||
|
struct TestSnapshot {
|
||||||
|
total_sources_seen: u64,
|
||||||
|
last_source_count: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||||
|
struct TestEvent {
|
||||||
|
total_sources_seen: u64,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Default)]
|
||||||
|
struct TestState {
|
||||||
|
total_sources_seen: u64,
|
||||||
|
last_source_count: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
struct CountingObserver;
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl Observer for CountingObserver {
|
||||||
|
type Source = u64;
|
||||||
|
type State = TestState;
|
||||||
|
type Snapshot = TestSnapshot;
|
||||||
|
type Event = TestEvent;
|
||||||
|
|
||||||
|
async fn init(
|
||||||
|
&self,
|
||||||
|
sources: &[ObservedSource<Self::Source>],
|
||||||
|
) -> Result<Self::State, DynError> {
|
||||||
|
Ok(TestState {
|
||||||
|
total_sources_seen: sources.iter().map(|source| source.source).sum(),
|
||||||
|
last_source_count: sources.len(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn poll(
|
||||||
|
&self,
|
||||||
|
sources: &[ObservedSource<Self::Source>],
|
||||||
|
state: &mut Self::State,
|
||||||
|
) -> Result<Vec<Self::Event>, DynError> {
|
||||||
|
state.total_sources_seen += sources.iter().map(|source| source.source).sum::<u64>();
|
||||||
|
state.last_source_count = sources.len();
|
||||||
|
|
||||||
|
Ok(vec![TestEvent {
|
||||||
|
total_sources_seen: state.total_sources_seen,
|
||||||
|
}])
|
||||||
|
}
|
||||||
|
|
||||||
|
fn snapshot(&self, state: &Self::State) -> Self::Snapshot {
|
||||||
|
TestSnapshot {
|
||||||
|
total_sources_seen: state.total_sources_seen,
|
||||||
|
last_source_count: state.last_source_count,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn runtime_updates_snapshot_and_history() {
|
||||||
|
let provider = TestSourceProvider::new(vec![ObservedSource::new("node-0", 2)]);
|
||||||
|
let runtime = ObservationRuntime::start(
|
||||||
|
provider,
|
||||||
|
CountingObserver,
|
||||||
|
ObservationConfig {
|
||||||
|
interval: Duration::from_millis(25),
|
||||||
|
history_limit: 2,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("runtime should start");
|
||||||
|
|
||||||
|
let handle = runtime.handle();
|
||||||
|
wait_for_cycle(&handle, 2).await;
|
||||||
|
|
||||||
|
let snapshot = handle.latest_snapshot().expect("snapshot should exist");
|
||||||
|
assert!(snapshot.cycle >= 2);
|
||||||
|
assert_eq!(snapshot.source_count, 1);
|
||||||
|
assert_eq!(snapshot.value.last_source_count, 1);
|
||||||
|
assert!(snapshot.value.total_sources_seen >= 6);
|
||||||
|
|
||||||
|
let history = handle.history();
|
||||||
|
assert_eq!(history.len(), 2);
|
||||||
|
assert!(history.iter().all(|batch| !batch.events.is_empty()));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn runtime_refreshes_sources_each_cycle() {
|
||||||
|
let provider = TestSourceProvider::new(vec![ObservedSource::new("node-0", 1)]);
|
||||||
|
let runtime = ObservationRuntime::start(
|
||||||
|
provider.clone(),
|
||||||
|
CountingObserver,
|
||||||
|
ObservationConfig {
|
||||||
|
interval: Duration::from_millis(25),
|
||||||
|
history_limit: 4,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("runtime should start");
|
||||||
|
|
||||||
|
let handle = runtime.handle();
|
||||||
|
wait_for_cycle(&handle, 1).await;
|
||||||
|
|
||||||
|
provider.replace_sources(vec![
|
||||||
|
ObservedSource::new("node-0", 1),
|
||||||
|
ObservedSource::new("node-1", 3),
|
||||||
|
]);
|
||||||
|
|
||||||
|
wait_for_snapshot_source_count(&handle, 2).await;
|
||||||
|
|
||||||
|
let snapshot = handle.latest_snapshot().expect("snapshot should exist");
|
||||||
|
assert_eq!(snapshot.source_count, 2);
|
||||||
|
assert_eq!(snapshot.value.last_source_count, 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn runtime_records_cycle_failures() {
|
||||||
|
let provider = TestSourceProvider::new(vec![ObservedSource::new("node-0", 1)]);
|
||||||
|
let runtime = ObservationRuntime::start(
|
||||||
|
provider.clone(),
|
||||||
|
CountingObserver,
|
||||||
|
ObservationConfig {
|
||||||
|
interval: Duration::from_millis(25),
|
||||||
|
history_limit: 2,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("runtime should start");
|
||||||
|
|
||||||
|
let handle = runtime.handle();
|
||||||
|
provider.fail_next_refresh();
|
||||||
|
|
||||||
|
wait_for_failure(&handle).await;
|
||||||
|
|
||||||
|
let failure = handle.last_error().expect("failure should exist");
|
||||||
|
assert_eq!(failure.stage, ObservationFailureStage::SourceRefresh);
|
||||||
|
assert_eq!(failure.message, "refresh failed");
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn wait_for_cycle(handle: &super::ObservationHandle<CountingObserver>, cycle: u64) {
|
||||||
|
let deadline = Instant::now() + Duration::from_secs(2);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
let Some(snapshot) = handle.latest_snapshot() else {
|
||||||
|
sleep(Duration::from_millis(10)).await;
|
||||||
|
continue;
|
||||||
|
};
|
||||||
|
|
||||||
|
if snapshot.cycle >= cycle {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
assert!(
|
||||||
|
Instant::now() < deadline,
|
||||||
|
"timed out waiting for cycle {cycle}"
|
||||||
|
);
|
||||||
|
|
||||||
|
sleep(Duration::from_millis(10)).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn wait_for_snapshot_source_count(
|
||||||
|
handle: &super::ObservationHandle<CountingObserver>,
|
||||||
|
source_count: usize,
|
||||||
|
) {
|
||||||
|
let deadline = Instant::now() + Duration::from_secs(2);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
let Some(snapshot) = handle.latest_snapshot() else {
|
||||||
|
sleep(Duration::from_millis(10)).await;
|
||||||
|
continue;
|
||||||
|
};
|
||||||
|
|
||||||
|
if snapshot.source_count == source_count {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
assert!(
|
||||||
|
Instant::now() < deadline,
|
||||||
|
"timed out waiting for source_count {source_count}"
|
||||||
|
);
|
||||||
|
|
||||||
|
sleep(Duration::from_millis(10)).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn wait_for_failure(handle: &super::ObservationHandle<CountingObserver>) {
|
||||||
|
let deadline = Instant::now() + Duration::from_secs(2);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if handle.last_error().is_some() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
assert!(Instant::now() < deadline, "timed out waiting for failure");
|
||||||
|
|
||||||
|
sleep(Duration::from_millis(10)).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -4,7 +4,12 @@ use super::{
|
|||||||
Application, CleanupPolicy, DeploymentPolicy, Expectation, HttpReadinessRequirement,
|
Application, CleanupPolicy, DeploymentPolicy, Expectation, HttpReadinessRequirement,
|
||||||
RetryPolicy, RuntimeExtensionFactory, Workload, internal::CoreBuilderAccess,
|
RetryPolicy, RuntimeExtensionFactory, Workload, internal::CoreBuilderAccess,
|
||||||
};
|
};
|
||||||
use crate::topology::{DeploymentProvider, DeploymentSeed};
|
use crate::{
|
||||||
|
observation::{
|
||||||
|
ObservationConfig, ObservationExtensionFactory, Observer, SourceProviderFactory,
|
||||||
|
},
|
||||||
|
topology::{DeploymentProvider, DeploymentSeed},
|
||||||
|
};
|
||||||
|
|
||||||
type DeploymentProviderHandle<E> = Box<dyn DeploymentProvider<<E as Application>::Deployment>>;
|
type DeploymentProviderHandle<E> = Box<dyn DeploymentProvider<<E as Application>::Deployment>>;
|
||||||
|
|
||||||
@ -60,6 +65,48 @@ pub trait CoreBuilderExt: CoreBuilderAccess + Sized {
|
|||||||
self.map_core_builder(|builder| builder.with_runtime_extension_factory(extension))
|
self.map_core_builder(|builder| builder.with_runtime_extension_factory(extension))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Registers one clonable observer as a runtime extension.
|
||||||
|
#[must_use]
|
||||||
|
fn with_observer<O>(
|
||||||
|
self,
|
||||||
|
observer: O,
|
||||||
|
source_provider_factory: impl SourceProviderFactory<Self::Env, O::Source>,
|
||||||
|
config: ObservationConfig,
|
||||||
|
) -> Self
|
||||||
|
where
|
||||||
|
O: Observer + Clone,
|
||||||
|
Self::Env: Application,
|
||||||
|
{
|
||||||
|
let extension = ObservationExtensionFactory::<Self::Env, O>::new(
|
||||||
|
observer,
|
||||||
|
source_provider_factory,
|
||||||
|
config,
|
||||||
|
);
|
||||||
|
|
||||||
|
self.with_runtime_extension_factory(Box::new(extension))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Registers one observer built lazily per run as a runtime extension.
|
||||||
|
#[must_use]
|
||||||
|
fn with_observer_factory<O>(
|
||||||
|
self,
|
||||||
|
observer_builder: impl Fn() -> O + Send + Sync + 'static,
|
||||||
|
source_provider_factory: impl SourceProviderFactory<Self::Env, O::Source>,
|
||||||
|
config: ObservationConfig,
|
||||||
|
) -> Self
|
||||||
|
where
|
||||||
|
O: Observer,
|
||||||
|
Self::Env: Application,
|
||||||
|
{
|
||||||
|
let extension = ObservationExtensionFactory::<Self::Env, O>::from_parts(
|
||||||
|
observer_builder,
|
||||||
|
source_provider_factory,
|
||||||
|
config,
|
||||||
|
);
|
||||||
|
|
||||||
|
self.with_runtime_extension_factory(Box::new(extension))
|
||||||
|
}
|
||||||
|
|
||||||
#[must_use]
|
#[must_use]
|
||||||
fn with_run_duration(self, duration: Duration) -> Self {
|
fn with_run_duration(self, duration: Duration) -> Self {
|
||||||
self.map_core_builder(|builder| builder.with_run_duration(duration))
|
self.map_core_builder(|builder| builder.with_run_duration(duration))
|
||||||
|
|||||||
@ -16,8 +16,8 @@ use crate::scenario::{
|
|||||||
type WorkloadOutcome = Result<(), DynError>;
|
type WorkloadOutcome = Result<(), DynError>;
|
||||||
|
|
||||||
const MIN_NODE_CONTROL_COOLDOWN: Duration = Duration::from_secs(30);
|
const MIN_NODE_CONTROL_COOLDOWN: Duration = Duration::from_secs(30);
|
||||||
const DEFAULT_BLOCK_FEED_SETTLE_WAIT: Duration = Duration::from_secs(1);
|
const DEFAULT_POST_WORKLOAD_SETTLE_WAIT: Duration = Duration::from_secs(1);
|
||||||
const MIN_BLOCK_FEED_SETTLE_WAIT: Duration = Duration::from_secs(2);
|
const MIN_POST_WORKLOAD_SETTLE_WAIT: Duration = Duration::from_secs(2);
|
||||||
const EXPECTATION_CAPTURE_CHECK_INTERVAL: Duration = Duration::from_secs(1);
|
const EXPECTATION_CAPTURE_CHECK_INTERVAL: Duration = Duration::from_secs(1);
|
||||||
const UNKNOWN_PANIC: &str = "<unknown panic>";
|
const UNKNOWN_PANIC: &str = "<unknown panic>";
|
||||||
|
|
||||||
@ -183,7 +183,8 @@ impl<E: Application> Runner<E> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
async fn settle_before_expectations(context: &RunContext<E>) {
|
async fn settle_before_expectations(context: &RunContext<E>) {
|
||||||
// Give the feed a short catch-up window before evaluating expectations.
|
// Give runtime extensions a short catch-up window before evaluating
|
||||||
|
// expectations.
|
||||||
let Some(wait) = Self::settle_wait_duration(context) else {
|
let Some(wait) = Self::settle_wait_duration(context) else {
|
||||||
return;
|
return;
|
||||||
};
|
};
|
||||||
@ -200,12 +201,12 @@ impl<E: Application> Runner<E> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
let wait = if configured_wait.is_zero() {
|
let wait = if configured_wait.is_zero() {
|
||||||
DEFAULT_BLOCK_FEED_SETTLE_WAIT
|
DEFAULT_POST_WORKLOAD_SETTLE_WAIT
|
||||||
} else {
|
} else {
|
||||||
configured_wait
|
configured_wait
|
||||||
};
|
};
|
||||||
|
|
||||||
Some(wait.max(MIN_BLOCK_FEED_SETTLE_WAIT))
|
Some(wait.max(MIN_POST_WORKLOAD_SETTLE_WAIT))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Evaluates every registered expectation, aggregating failures so callers
|
/// Evaluates every registered expectation, aggregating failures so callers
|
||||||
@ -231,8 +232,8 @@ impl<E: Application> Runner<E> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn cooldown_duration(context: &RunContext<E>) -> Option<Duration> {
|
fn cooldown_duration(context: &RunContext<E>) -> Option<Duration> {
|
||||||
// Managed environments need a minimum cooldown so feed and expectations
|
// Managed environments need a minimum cooldown so runtime extensions and
|
||||||
// observe stabilized state.
|
// expectations observe stabilized state.
|
||||||
let needs_stabilization = context.cluster_control_profile().framework_owns_lifecycle();
|
let needs_stabilization = context.cluster_control_profile().framework_owns_lifecycle();
|
||||||
|
|
||||||
let mut wait = context.expectation_cooldown();
|
let mut wait = context.expectation_cooldown();
|
||||||
|
|||||||
@ -31,8 +31,8 @@ pub enum ComposeRunnerError {
|
|||||||
NodeClients(#[from] NodeClientError),
|
NodeClients(#[from] NodeClientError),
|
||||||
#[error(transparent)]
|
#[error(transparent)]
|
||||||
Telemetry(#[from] MetricsError),
|
Telemetry(#[from] MetricsError),
|
||||||
#[error("feed requires at least one node client")]
|
#[error("observation runtime requires at least one node client")]
|
||||||
BlockFeedMissing,
|
ObservationMissing,
|
||||||
#[error("runtime preflight failed: no node clients available")]
|
#[error("runtime preflight failed: no node clients available")]
|
||||||
RuntimePreflight,
|
RuntimePreflight,
|
||||||
#[error("runtime extension setup failed: {source}")]
|
#[error("runtime extension setup failed: {source}")]
|
||||||
@ -45,8 +45,8 @@ pub enum ComposeRunnerError {
|
|||||||
#[source]
|
#[source]
|
||||||
source: DynError,
|
source: DynError,
|
||||||
},
|
},
|
||||||
#[error("failed to start feed: {source}")]
|
#[error("failed to start observation runtime: {source}")]
|
||||||
BlockFeed {
|
ObservationRuntime {
|
||||||
#[source]
|
#[source]
|
||||||
source: DynError,
|
source: DynError,
|
||||||
},
|
},
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user