mirror of
https://github.com/logos-blockchain/logos-blockchain-testing.git
synced 2026-04-13 06:33:35 +00:00
315 lines
7.8 KiB
Markdown
315 lines
7.8 KiB
Markdown
# Observation Runtime Plan
|
|
|
|
## Why this work exists
|
|
|
|
TF is good at deployment plumbing. It is weak at continuous observation.
|
|
|
|
Today, the same problems are solved repeatedly with custom loops:
|
|
- TF block feed logic in Logos
|
|
- Cucumber manual-cluster polling loops
|
|
- ad hoc catch-up scans for wallet and chain state
|
|
- app-local state polling in expectations
|
|
|
|
That is the gap this work should close.
|
|
|
|
The goal is not a generic "distributed systems DSL".
|
|
The goal is one reusable observation runtime that:
|
|
- continuously collects data from dynamic sources
|
|
- keeps typed materialized state
|
|
- exposes both current snapshot and delta/history views
|
|
- fits naturally in TF scenarios and Cucumber manual-cluster code
|
|
|
|
## Constraints
|
|
|
|
### TF constraints
|
|
- TF abstractions must stay universal and simple.
|
|
- TF must not know app semantics like blocks, wallets, leaders, jobs, or topics.
|
|
- TF must remain useful for simple apps such as `openraft_kv`, not only Logos.
|
|
|
|
### App constraints
|
|
- Apps must be able to build richer abstractions on top of TF.
|
|
- Logos must be able to support:
|
|
- current block-feed replacement
|
|
- fork-aware chain state
|
|
- public-peer sync targets
|
|
- multi-wallet UTXO tracking
|
|
- Apps must be able to adopt this incrementally.
|
|
|
|
### Migration constraints
|
|
- We do not want a flag-day rewrite.
|
|
- Existing loops can coexist with the new runtime until replacements are proven.
|
|
|
|
## Non-goals
|
|
|
|
This work should not:
|
|
- put feed back onto the base `Application` trait
|
|
- build app-specific semantics into TF core
|
|
- replace filesystem blockchain snapshots used for startup/restore
|
|
- force every app to use continuous observation
|
|
- introduce a large public abstraction stack that nobody can explain
|
|
|
|
## Core idea
|
|
|
|
Introduce one TF-level observation runtime.
|
|
|
|
That runtime owns:
|
|
- source refresh
|
|
- scheduling
|
|
- polling/ingestion
|
|
- bounded history
|
|
- latest snapshot caching
|
|
- delta publication
|
|
- freshness/error tracking
|
|
- lifecycle hooks for TF and Cucumber
|
|
|
|
Apps own:
|
|
- source types
|
|
- raw observation logic
|
|
- materialized state
|
|
- snapshot shape
|
|
- delta/event shape
|
|
- higher-level projections such as wallet state
|
|
|
|
## Public TF surface
|
|
|
|
The TF public surface should stay small.
|
|
|
|
### `ObservedSource<S>`
|
|
A named source instance.
|
|
|
|
Used for:
|
|
- local node clients
|
|
- public peer endpoints
|
|
- any other app-owned source type
|
|
|
|
### `SourceProvider<S>`
|
|
Returns the current source set.
|
|
|
|
This must support dynamic source lists because:
|
|
- manual cluster nodes come and go
|
|
- Cucumber worlds may attach public peers
|
|
- node control may restart or replace sources during a run
|
|
|
|
### `Observer`
|
|
App-owned observation logic.
|
|
|
|
It defines:
|
|
- `Source`
|
|
- `State`
|
|
- `Snapshot`
|
|
- `Event`
|
|
|
|
And it implements:
|
|
- `init(...)`
|
|
- `poll(...)`
|
|
- `snapshot(...)`
|
|
|
|
The important boundary is:
|
|
- TF owns the runtime
|
|
- app code owns materialization
|
|
|
|
### `ObservationRuntime`
|
|
The engine that:
|
|
- starts the loop
|
|
- refreshes sources
|
|
- calls `poll(...)`
|
|
- stores history
|
|
- publishes deltas
|
|
- updates latest snapshot
|
|
- tracks last error and freshness
|
|
|
|
### `ObservationHandle`
|
|
The read-side interface for workloads, expectations, and Cucumber steps.
|
|
|
|
It should expose at least:
|
|
- latest snapshot
|
|
- delta subscription
|
|
- bounded history
|
|
- last error
|
|
|
|
## Intended shape
|
|
|
|
```rust
|
|
pub struct ObservedSource<S> {
|
|
pub name: String,
|
|
pub source: S,
|
|
}
|
|
|
|
#[async_trait]
|
|
pub trait SourceProvider<S>: Send + Sync + 'static {
|
|
async fn sources(&self) -> Vec<ObservedSource<S>>;
|
|
}
|
|
|
|
#[async_trait]
|
|
pub trait Observer: Send + Sync + 'static {
|
|
type Source: Clone + Send + Sync + 'static;
|
|
type State: Send + Sync + 'static;
|
|
type Snapshot: Clone + Send + Sync + 'static;
|
|
type Event: Clone + Send + Sync + 'static;
|
|
|
|
async fn init(
|
|
&self,
|
|
sources: &[ObservedSource<Self::Source>],
|
|
) -> Result<Self::State, DynError>;
|
|
|
|
async fn poll(
|
|
&self,
|
|
sources: &[ObservedSource<Self::Source>],
|
|
state: &mut Self::State,
|
|
) -> Result<Vec<Self::Event>, DynError>;
|
|
|
|
fn snapshot(&self, state: &Self::State) -> Self::Snapshot;
|
|
}
|
|
```
|
|
|
|
This is enough.
|
|
|
|
If more helper layers are needed, they should stay internal first.
|
|
|
|
## How current use cases fit
|
|
|
|
### `openraft_kv`
|
|
Use one simple observer.
|
|
|
|
- sources: node clients
|
|
- state: latest per-node Raft state
|
|
- snapshot: sorted node-state view
|
|
- events: optional deltas, possibly empty at first
|
|
|
|
This is the simplest proving case.
|
|
It validates the runtime without dragging in Logos complexity.
|
|
|
|
### Logos block feed replacement
|
|
Use one shared chain observer.
|
|
|
|
- sources: local node clients
|
|
- state:
|
|
- node heads
|
|
- block graph
|
|
- heights
|
|
- seen headers
|
|
- recent history
|
|
- snapshot:
|
|
- current head/lib/graph summary
|
|
- events:
|
|
- newly discovered blocks
|
|
|
|
This covers both existing Logos feed use cases:
|
|
- current snapshot consumers
|
|
- delta/subscription consumers
|
|
|
|
### Cucumber manual-cluster sync
|
|
Use the same observer runtime with a different source set.
|
|
|
|
- sources:
|
|
- local manual-cluster node clients
|
|
- public peer endpoints
|
|
- state:
|
|
- local consensus views
|
|
- public consensus views
|
|
- derived majority public target
|
|
- snapshot:
|
|
- current local and public sync picture
|
|
|
|
This removes custom poll/sleep loops from steps.
|
|
|
|
### Multi-wallet fork-aware tracking
|
|
This should not be a TF concept.
|
|
|
|
It should be a Logos projection built on top of the shared chain observer.
|
|
|
|
- input: chain observer state
|
|
- output: per-header wallet state cache keyed by block header
|
|
- property: naturally fork-aware because it follows actual ancestry
|
|
|
|
That replaces repeated backward scans from tip with continuous maintained state.
|
|
|
|
## Logos layering
|
|
|
|
Logos should not put every concern into one giant impl.
|
|
|
|
Recommended layering:
|
|
|
|
1. **Chain source adapter**
|
|
- local node reads
|
|
- public peer reads
|
|
|
|
2. **Shared chain observer**
|
|
- catch-up
|
|
- continuous ingestion
|
|
- graph/history materialization
|
|
|
|
3. **Logos projections**
|
|
- head view
|
|
- public sync target
|
|
- fork graph queries
|
|
- wallet state
|
|
- tx inclusion helpers
|
|
|
|
TF provides the runtime.
|
|
Logos provides the domain model built on top.
|
|
|
|
## Adoption plan
|
|
|
|
### Phase 1: add TF observation runtime
|
|
- add `ObservedSource`, `SourceProvider`, `Observer`, `ObservationRuntime`, `ObservationHandle`
|
|
- keep the public API small
|
|
- no app migrations yet
|
|
|
|
### Phase 2: prove it on `openraft_kv`
|
|
- add one simple observer over `/state`
|
|
- migrate one expectation to use the observation handle
|
|
- validate local, compose, and k8s
|
|
|
|
### Phase 3: add Logos shared chain observer
|
|
- implement it alongside current feed/loops
|
|
- do not remove existing consumers yet
|
|
- prove snapshot and delta outputs are useful
|
|
|
|
### Phase 4: migrate one Logos consumer at a time
|
|
Suggested order:
|
|
1. fork/head snapshot consumer
|
|
2. tx inclusion consumer
|
|
3. Cucumber sync-to-public-chain logic
|
|
4. wallet/UTXO tracking
|
|
|
|
### Phase 5: delete old loops and feed paths
|
|
- only after the new runtime has replaced real consumers cleanly
|
|
|
|
## Validation gates
|
|
|
|
Each phase should have clear checks.
|
|
|
|
### Runtime-level
|
|
- crate-level `cargo check`
|
|
- targeted tests for runtime lifecycle and history retention
|
|
- explicit tests for dynamic source refresh
|
|
|
|
### App-level
|
|
- `openraft_kv`:
|
|
- local failover
|
|
- compose failover
|
|
- k8s failover
|
|
- Logos:
|
|
- one snapshot consumer migrated
|
|
- one delta consumer migrated
|
|
- Cucumber:
|
|
- one manual-cluster sync path migrated
|
|
|
|
## Open questions
|
|
|
|
These should stay open until implementation forces a decision:
|
|
- whether `ObservationHandle` should expose full history directly or only cursor/subscription access
|
|
- how much error/freshness metadata belongs in the generic runtime vs app snapshot types
|
|
- whether multiple observers should share one scheduler/runtime instance or simply run independently first
|
|
|
|
## Design guardrails
|
|
|
|
When implementing this work:
|
|
- keep TF public abstractions minimal
|
|
- keep app semantics out of TF core
|
|
- do not chase a generic testing DSL
|
|
- build from reusable blocks, not one-off mega impls
|
|
- keep migration incremental
|
|
- prefer simple, explainable runtime behavior over clever abstraction
|