7.8 KiB
Observation Runtime Plan
Why this work exists
TF is good at deployment plumbing. It is weak at continuous observation.
Today, the same problems are solved repeatedly with custom loops:
- TF block feed logic in Logos
- Cucumber manual-cluster polling loops
- ad hoc catch-up scans for wallet and chain state
- app-local state polling in expectations
That is the gap this work should close.
The goal is not a generic "distributed systems DSL". The goal is one reusable observation runtime that:
- continuously collects data from dynamic sources
- keeps typed materialized state
- exposes both current snapshot and delta/history views
- fits naturally in TF scenarios and Cucumber manual-cluster code
Constraints
TF constraints
- TF abstractions must stay universal and simple.
- TF must not know app semantics like blocks, wallets, leaders, jobs, or topics.
- TF must remain useful for simple apps such as
openraft_kv, not only Logos.
App constraints
- Apps must be able to build richer abstractions on top of TF.
- Logos must be able to support:
- current block-feed replacement
- fork-aware chain state
- public-peer sync targets
- multi-wallet UTXO tracking
- Apps must be able to adopt this incrementally.
Migration constraints
- We do not want a flag-day rewrite.
- Existing loops can coexist with the new runtime until replacements are proven.
Non-goals
This work should not:
- put feed back onto the base
Applicationtrait - build app-specific semantics into TF core
- replace filesystem blockchain snapshots used for startup/restore
- force every app to use continuous observation
- introduce a large public abstraction stack that nobody can explain
Core idea
Introduce one TF-level observation runtime.
That runtime owns:
- source refresh
- scheduling
- polling/ingestion
- bounded history
- latest snapshot caching
- delta publication
- freshness/error tracking
- lifecycle hooks for TF and Cucumber
Apps own:
- source types
- raw observation logic
- materialized state
- snapshot shape
- delta/event shape
- higher-level projections such as wallet state
Public TF surface
The TF public surface should stay small.
ObservedSource<S>
A named source instance.
Used for:
- local node clients
- public peer endpoints
- any other app-owned source type
SourceProvider<S>
Returns the current source set.
This must support dynamic source lists because:
- manual cluster nodes come and go
- Cucumber worlds may attach public peers
- node control may restart or replace sources during a run
Observer
App-owned observation logic.
It defines:
SourceStateSnapshotEvent
And it implements:
init(...)poll(...)snapshot(...)
The important boundary is:
- TF owns the runtime
- app code owns materialization
ObservationRuntime
The engine that:
- starts the loop
- refreshes sources
- calls
poll(...) - stores history
- publishes deltas
- updates latest snapshot
- tracks last error and freshness
ObservationHandle
The read-side interface for workloads, expectations, and Cucumber steps.
It should expose at least:
- latest snapshot
- delta subscription
- bounded history
- last error
Intended shape
pub struct ObservedSource<S> {
pub name: String,
pub source: S,
}
#[async_trait]
pub trait SourceProvider<S>: Send + Sync + 'static {
async fn sources(&self) -> Vec<ObservedSource<S>>;
}
#[async_trait]
pub trait Observer: Send + Sync + 'static {
type Source: Clone + Send + Sync + 'static;
type State: Send + Sync + 'static;
type Snapshot: Clone + Send + Sync + 'static;
type Event: Clone + Send + Sync + 'static;
async fn init(
&self,
sources: &[ObservedSource<Self::Source>],
) -> Result<Self::State, DynError>;
async fn poll(
&self,
sources: &[ObservedSource<Self::Source>],
state: &mut Self::State,
) -> Result<Vec<Self::Event>, DynError>;
fn snapshot(&self, state: &Self::State) -> Self::Snapshot;
}
This is enough.
If more helper layers are needed, they should stay internal first.
How current use cases fit
openraft_kv
Use one simple observer.
- sources: node clients
- state: latest per-node Raft state
- snapshot: sorted node-state view
- events: optional deltas, possibly empty at first
This is the simplest proving case. It validates the runtime without dragging in Logos complexity.
Logos block feed replacement
Use one shared chain observer.
- sources: local node clients
- state:
- node heads
- block graph
- heights
- seen headers
- recent history
- snapshot:
- current head/lib/graph summary
- events:
- newly discovered blocks
This covers both existing Logos feed use cases:
- current snapshot consumers
- delta/subscription consumers
Cucumber manual-cluster sync
Use the same observer runtime with a different source set.
- sources:
- local manual-cluster node clients
- public peer endpoints
- state:
- local consensus views
- public consensus views
- derived majority public target
- snapshot:
- current local and public sync picture
This removes custom poll/sleep loops from steps.
Multi-wallet fork-aware tracking
This should not be a TF concept.
It should be a Logos projection built on top of the shared chain observer.
- input: chain observer state
- output: per-header wallet state cache keyed by block header
- property: naturally fork-aware because it follows actual ancestry
That replaces repeated backward scans from tip with continuous maintained state.
Logos layering
Logos should not put every concern into one giant impl.
Recommended layering:
-
Chain source adapter
- local node reads
- public peer reads
-
Shared chain observer
- catch-up
- continuous ingestion
- graph/history materialization
-
Logos projections
- head view
- public sync target
- fork graph queries
- wallet state
- tx inclusion helpers
TF provides the runtime. Logos provides the domain model built on top.
Adoption plan
Phase 1: add TF observation runtime
- add
ObservedSource,SourceProvider,Observer,ObservationRuntime,ObservationHandle - keep the public API small
- no app migrations yet
Phase 2: prove it on openraft_kv
- add one simple observer over
/state - migrate one expectation to use the observation handle
- validate local, compose, and k8s
Phase 3: add Logos shared chain observer
- implement it alongside current feed/loops
- do not remove existing consumers yet
- prove snapshot and delta outputs are useful
Phase 4: migrate one Logos consumer at a time
Suggested order:
- fork/head snapshot consumer
- tx inclusion consumer
- Cucumber sync-to-public-chain logic
- wallet/UTXO tracking
Phase 5: delete old loops and feed paths
- only after the new runtime has replaced real consumers cleanly
Validation gates
Each phase should have clear checks.
Runtime-level
- crate-level
cargo check - targeted tests for runtime lifecycle and history retention
- explicit tests for dynamic source refresh
App-level
openraft_kv:- local failover
- compose failover
- k8s failover
- Logos:
- one snapshot consumer migrated
- one delta consumer migrated
- Cucumber:
- one manual-cluster sync path migrated
Open questions
These should stay open until implementation forces a decision:
- whether
ObservationHandleshould expose full history directly or only cursor/subscription access - how much error/freshness metadata belongs in the generic runtime vs app snapshot types
- whether multiple observers should share one scheduler/runtime instance or simply run independently first
Design guardrails
When implementing this work:
- keep TF public abstractions minimal
- keep app semantics out of TF core
- do not chase a generic testing DSL
- build from reusable blocks, not one-off mega impls
- keep migration incremental
- prefer simple, explainable runtime behavior over clever abstraction