nim-sds/sds.nimble

316 lines
11 KiB
Plaintext
Raw Normal View History

import strutils, os
# Package
refactor(persistence): snapshot-based interface (5 procs, atomic per-op) (#72) * feat: propagate persistence backend errors via Result The Persistence contract previously returned `Future[void]` for writes and `Future[ChannelSnapshot]` for the loader, with `raises: []`. Backends had no way to report a failure, so a failed write or a failed/partial read was silently swallowed — and on the read path a mid-scan failure could bootstrap a *truncated* channel snapshot, corrupting the rebuilt bloom filter and lamport clock across a restart. Make every contract field Result-returning: * mutating ops -> Future[Result[void, string]] * loadAllForChannel -> Future[Result[ChannelSnapshot, string]] The backend-supplied error string is mapped to a new `ReliabilityError.rePersistenceError` (logged once at the boundary via `reliabilityErr`) and threaded up through every persistence-touching proc to the public API, where the caller decides what to do. Request-driven paths (wrap/unwrap/markDependenciesMet/ensureChannel/removeChannel/reset) propagate the error; background maintenance loops (periodicBufferSweep, periodicRepairSweep) log and retry on the next tick, since they have no synchronous caller. Tests: in-memory backend gains a `failingOps` injection hook; new "Persistence: error propagation" suite asserts read/write/drop failures surface as `rePersistenceError`. Full suite passes (90 OK). BREAKING CHANGE: the `Persistence` contract signature changed; custom backends must return `Result` and `ok()` on success. Bumped to 0.3.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(persistence): add snapshot types and codec (phase 0) Introduce atomic-snapshot persistence types that will replace the current fine-grained 13-proc Persistence interface. This commit is purely additive: no existing call site changes, no behaviour change. New types (sds/types/): - channel_meta.nim — ChannelMeta (atomic per-channel snapshot blob), ChannelData (bootstrap payload), OutgoingRepairKV / IncomingRepairKV (flattened map entries for protobuf wire shape). - history_update.nim — HistoryUpdate (combined append/evict payload for the message log). New codec (sds/snapshot_codec.nim): - Protobuf encode/decode for all new types, reusing the existing SdsMessage and HistoryEntry encoders from sds/protobuf.nim. - Explicit schemaVersion=1 on ChannelMeta; decoder rejects unknown versions loudly rather than silently truncating. - Time encoded as int64 unix milliseconds. Tests (tests/test_snapshot_codec.nim): - 13 round-trip cases covering empty, single-entry, full-buffer, and repair-heavy snapshots; ChannelData ordering; HistoryUpdate variants; schemaVersion rejection. Planning artefacts: - ANALYSIS_SDS_PERSISTENCE.md — problem statement (partial-write divergence, chatty call rate, non-fatal-error policy gap). - ANALYSIS_SNAPSHOT_SAVE_POINTS.md — exact save points per protocol op and projected call rates. - PLAN_SNAPSHOT_PERSISTENCE.md — phased refactor plan; this commit implements phase 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(persistence): add PersistenceV2 interface alongside legacy (phase 1) Introduce the 5-proc snapshot-based Persistence interface that will replace the legacy 13-proc one. Both coexist on `ReliabilityManager` so phase 2 can migrate protocol ops one at a time without breaking existing callers. New file: - sds/types/persistence_v2.nim — `PersistenceV2` type with saveChannelMeta / updateHistory / loadChannel / dropChannel / setRetrievalHint. `noOpPersistenceV2()` default. Doc-comments capture the atomicity pairing (meta save + history update issued back-to-back under the channel lock) and the non-fatal failure policy from PLAN §8. Modified: - sds/types/reliability_manager.nim — adds `persistenceV2: PersistenceV2` field alongside `persistence`; constructor takes both, both default to no-op. - sds.nim — `newReliabilityManager` plumbs the new optional parameter. - AGENTS.md / CLAUDE.md — GitNexus index re-indexed after phase 0 + phase 1 additions; symbol counts updated by `npx gitnexus analyze`. No call site uses the new interface yet — that's phase 2. All existing tests still pass against the legacy interface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): migrate runRepairSweep to PersistenceV2 (phase 2.1) Per-entry removeIncomingRepair / removeOutgoingRepair calls are replaced by a single trySaveMeta per *dirty* channel at the end of that channel's sweep. Failure is logged but does NOT abort the sweep — in-memory state is the source of truth (PLAN_SNAPSHOT_PERSISTENCE.md §8). Helpers added in sds/sds_utils.nim: - snapshotMeta(channel) — capture current ChannelContext as ChannelMeta blob (flattens Table-keyed buffers to seqs for the wire shape). - trySaveMeta(rm, channelId, channel) — best-effort meta snapshot save; logs on failure, never propagates. - tryUpdateHistory(rm, channelId, append, evict) — best-effort history update; skips the call entirely when both lists are empty (HistoryUpdate contract). Call-rate impact for runRepairSweep: - Before: N persistence calls per expired entry per channel. - After: at most 1 saveChannelMeta per dirty channel; 0 on idle channels (matches the dirty-flag floor in ANALYSIS_SNAPSHOT_SAVE_POINTS). All existing tests pass — including the 3 SDS-R Repair Sweep tests that directly exercise this proc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): migrate checkUnacknowledgedMessages to PersistenceV2 (phase 2.2) Per-entry saveOutgoing / removeOutgoing calls are replaced by one trySaveMeta at the end of the pass, conditional on a dirty flag (resend attempt incremented, or entry expired). Pass succeeds even if the save fails — next tick reissues the snapshot. Call-rate impact: - Before: N persistence calls per affected entry per pass. - After: at most 1 saveChannelMeta per pass; 0 when nothing aged out. All existing tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): add V2 meta snapshot saves to foreground ops (phase 2A) Wires `trySaveMeta` into the three public protocol ops that mutate per-channel state — wrapOutgoingMessage, unwrapReceivedMessage, and markDependenciesMet — at the operation's end, under the channel lock. Legacy fine-grained persistence calls REMAIN in place; this commit is additive. Both interfaces persist the same state simultaneously, so all existing tests pass and a real backend wired to either interface continues to work. Phase 2B will strip the legacy calls. Save points match the §"Save Points" table in ANALYSIS_SNAPSHOT_SAVE_POINTS.md exactly: - wrapOutgoingMessage: 1 save (always) - unwrapReceivedMessage: 1 save on every path including duplicate (the duplicate path still mutates the repair buffers) - markDependenciesMet: 1 save after the processIncomingBuffer cascade Non-fatal failure policy (PLAN §8): trySaveMeta logs and continues; the protocol op never returns rePersistenceError for snapshot failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): strip legacy interface from protocol path; migrate tests to V2 (phase 2B+2C+2D) End-state of phase 2: the protocol code no longer issues any legacy fine-grained Persistence calls. All state survives via the snapshot-based PersistenceV2 interface — one trySaveMeta per op end, plus tryUpdateHistory batched inside addToHistory. The legacy Persistence field on ReliabilityManager remains for backwards compatibility; phase 3 deletes it. Protocol changes (sds.nim, sds/sds_utils.nim): - reviewAckStatus, processIncomingBuffer, updateLamportTimestamp → pure in-memory; no per-mutation persistence. - addToHistory: replaces appendLogEntry+removeLogEntry with a single tryUpdateHistory call carrying (append, evict) atomically. - getRecentHistoryEntries: setRetrievalHint switched to V2; non-fatal. - wrapOutgoingMessage, unwrapReceivedMessage, markDependenciesMet: all per-row saveOutgoing / removeOutgoing / saveIncoming / removeIncoming / saveOutgoingRepair / removeOutgoingRepair / saveIncomingRepair / removeIncomingRepair calls removed (16 call sites in total). State is captured by the op-end trySaveMeta added in phase 2A. - getOrCreateChannel: bootstraps from persistenceV2.loadChannel. - dropChannelFromPersistence: uses persistenceV2.dropChannel. Failure policy (PLAN_SNAPSHOT_PERSISTENCE.md §8): - Foreground ops (wrap, unwrap, markDeps, sweeps): non-fatal — trySaveMeta / tryUpdateHistory log and continue; the protocol op returns ok regardless of disk failure. In-memory state is the source of truth; the next op re-issues a complete snapshot and disk catches up automatically. - Durability-intent ops (removeChannel, resetReliabilityManager via dropChannelFromPersistence; getOrCreateChannel via loadChannel): still propagate rePersistenceError, because the caller asked us to confirm a disk operation and we cannot silently lie. Test infrastructure: - tests/in_memory_persistence_v2.nim: new V2 adapter mock that decomposes the meta blob into the existing InMemoryStore shape so test assertions on store.outgoing / store.incoming / etc. continue to work without change. - tests/test_persistence.nim: 17 tests, all rewritten against V2. - 13 state-survival tests carry over with identical assertions. - "loadChannel failure surfaces as err on bootstrap" — bootstrap keeps durability-intent semantics. - "saveChannelMeta failure during send does NOT surface" — deliberate inversion of the legacy "write failure surfaces as err" test. Asserts the new non-fatal policy: op returns ok, in-memory state correct, disk re-syncs on the next op. - "updateHistory failure during send does NOT surface" — same policy applied to the history path. - "dropChannel failure during removeChannel surfaces as err" — kept. - All 17 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): delete legacy interface; rename PersistenceV2 -> Persistence (phase 3) End-state of the snapshot-persistence refactor. The legacy 13-proc Persistence interface and its noOpPersistence are gone; the 5-proc snapshot-based interface (formerly PersistenceV2) takes their place under the canonical name. Source: - sds/types/persistence.nim: replaced 13-proc contract with the 5-proc snapshot interface (saveChannelMeta, updateHistory, loadChannel, dropChannel, setRetrievalHint). noOpPersistence returns ok everywhere and an empty ChannelData on load. - sds/types/persistence_v2.nim: removed. - sds/types/reliability_manager.nim: dropped the second persistenceV2 field; constructor takes a single `persistence: Persistence`. - sds/sds_utils.nim: rm.persistenceV2.X -> rm.persistence.X; doc-comments updated. - sds.nim: dropped the persistenceV2 parameter from newReliabilityManager. Tests: - tests/in_memory_persistence_v2.nim: removed; its content moved to... - tests/in_memory_persistence.nim: replaces the old legacy mock with the snapshot adapter under the canonical filename. Same InMemoryStore shape so test assertions stay unchanged. - tests/test_persistence.nim: ctor param renamed, suite name de-prefixed. FFI smoke (`nimble libsdsDynamicMac`, refc/threads:on): builds clean. All 4 test suites pass: - test_bloom - test_reliability - test_persistence (17 V2 tests) - test_snapshot_codec (13 codec round-trip tests) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Persisting persistence redesign plan for reference * refactor(persistence): R2 pending-write queue + per-op accumulator (PR #72 review fix) Addresses all three substantive review findings on PR #72 in one structural change: fold the per-op accumulator and the R2 retry buffer into a single queue on `ChannelContext`, flushed once at op end. Changes: - sds/types/channel_context.nim: add `pendingHistoryAppends` (`OrderedSet[SdsMessageID]`) and `pendingHistoryEvicts` (`HashSet[SdsMessageID]`) fields. Only ids are stored — the full SdsMessage is looked up from `messageHistory` at flush time. Documented invariant: every id in pendingHistoryAppends is also in messageHistory, upheld by the merge rule. - sds/sds_utils.nim: * `queueHistoryAppend(channel, msgId)` / `queueHistoryEvict(channel, msgId)` — "latest-wins" merge: append cancels any pending evict and vice versa. Symmetric, simple, handles the evict-then-re-add sequence correctly (SDS-R repair re-delivering an evicted message while the backend is unreachable). * `tryUpdateHistory(rm, channelId)` — no more list params; flushes the channel's pending queue. Dual role: per-op accumulator (multiple `addToHistory` calls within one op queue together and flush as one round-trip) AND R2 retry buffer (a failed flush leaves the queue populated for the next op to retry). * `addToHistory` queues via the helpers; does not call persistence. * Pending queue cleared on `cleanup` and `removeChannel`. - sds.nim: * `processIncomingBuffer` returns to its single-arg signature — the queue lives on the channel, no parameter threading needed. * `wrapOutgoingMessage`, `unwrapReceivedMessage` (all three paths), `markDependenciesMet` issue exactly one `trySaveMeta` + `tryUpdateHistory` pair at op end, under the lock, with no intervening `await`-of-other-work. Matches the Persistence atomicity contract documented in `sds/types/persistence.nim`. * Pending queue cleared in `resetReliabilityManager`. - tests/test_persistence.nim: * Direct `addToHistory` callers (state-survival setup) now follow with explicit `tryUpdateHistory(channelId)` to flush. Reflects the production op-end flush pattern. * New: `updateHistory failure is retried via R2 pending-write queue` — verifies that two failed sends leave both messages on the queue, and a third successful send drains the whole queue in one call. * New: `pending queue survives idle ops` — verifies that an op with no history changes of its own still flushes a previously-failed batch at op end. * New: `evict-then-re-add merge rule preserves the re-added message on disk` — regression for the "latest-wins" merge rule. The original "evict-wins" rule would silently drop the re-add and leave the message permanently absent from disk; this test would fail under that rule and passes under the corrected one. Resolves PR #72 review comments: - #1 (delta loss on failed updateHistory) — R2 retry queue. - #2 (cascade chattiness — N updateHistory calls per op) — queue collects cascaded entries, flushed as one batch. - #3 (atomicity contract mismatch) — implementation now matches the documented "saveChannelMeta then updateHistory back-to-back" pairing. Test summary: 50 tests pass (47 prior + 3 new R2/merge-rule tests). FFI dylib (`nimble libsdsDynamicMac`, refc + threads:on): clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 12:24:38 +02:00
version = "0.3.0"
author = "Logos Messaging Team"
description = "E2E Scalable Data Sync API"
license = "MIT"
srcDir = "sds"
# Dependencies
requires "nim >= 2.2.4"
requires "chronos >= 4.0.4"
requires "libp2p >= 1.15.2"
requires "chronicles"
requires "stew"
requires "stint"
requires "metrics"
requires "results"
requires "taskpools >= 0.1.0" ## This should be removed when using nim-ffi dependency
proc buildLibrary(
outLibNameAndExt: string,
name: string,
srcDir = "./",
extra_params = "",
`type` = "static",
) =
if not dirExists "build":
mkDir "build"
if `type` == "static":
exec "nim c" & " --out:build/" & outLibNameAndExt &
" --threads:on --app:staticlib --opt:size --noMain --mm:refc --header --nimMainPrefix:libsds " &
extra_params & " " & srcDir & name & ".nim"
else:
2025-09-11 14:30:08 +02:00
when defined(windows):
exec "nim c" & " --out:build/" & outLibNameAndExt &
" --threads:on --app:lib --opt:size --noMain --mm:refc --header --nimMainPrefix:libsds " &
2025-09-11 14:30:08 +02:00
extra_params & " " & srcDir & name & ".nim"
else:
exec "nim c" & " --out:build/" & outLibNameAndExt &
" --threads:on --app:lib --opt:size --noMain --mm:refc --header --nimMainPrefix:libsds " &
2025-09-11 14:30:08 +02:00
extra_params & " " & srcDir & name & ".nim"
proc getMyCpu(): string =
## Returns a Nim-compatible CPU name (e.g. amd64, arm64) for the host.
## Respects the ARCH environment variable when set.
let envArch = getEnv("ARCH")
if envArch != "":
return envArch
when defined(arm64):
return "arm64"
elif defined(amd64):
return "amd64"
else:
let (archFromUname, _) = gorgeEx("uname -m")
let a = archFromUname.strip()
return
if a == "x86_64":
"amd64"
elif a == "aarch64":
"arm64"
else:
a
# Tasks
task test, "Run the test suite":
feat: make Persistence interface async (#69) * feat: make Persistence interface async The 14 Persistence proc fields now return Future[...] with {.async: (raises: []), gcsafe.}, allowing real I/O backends (SQLite, encrypted file, network) to suspend rather than block the Chronos event loop the manager runs on. Propagates through: - ReliabilityManager.lock: system.Lock -> chronos.AsyncLock. Acquired across awaits cleanly; matches the single-threaded Chronos worker the FFI uses. Multi-OS-thread use is now explicitly the caller's responsibility. - sds_utils + sds.nim public API procs (wrapOutgoingMessage, unwrapReceivedMessage, markDependenciesMet, setCallbacks, resetReliabilityManager, cleanup, ensureChannel, removeChannel, the getter snapshots, etc.) are now async. - FFI request handlers in library/sds_thread/... await the new API. - Tests converted via an asyncTest template that wraps each test body in an async proc; setup/teardown use waitFor for their single async call (ensureChannel / cleanup). Lock scope is preserved exactly: the same call sites that held the kernel Lock today hold AsyncLock now -- no new locking added. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor: drop asyncSpawn, add asyncSetup/asyncTeardown Three asyncSpawn usages removed: - sds.nim startPeriodicTasks: stored the periodic-task futures on ReliabilityManager (new field `periodicTasks: seq[FutureBase]`) so cleanup can cancel them on shutdown instead of leaking the loops against a cleared manager. - library/sds_thread/sds_thread.nim: fireSync moved BEFORE processing, then `await SdsThreadRequest.process(...)` instead of asyncSpawn'ing it. Aligns the worker with the SP-channel + lock assumption that there are no concurrent requests; caller throughput is unchanged because the caller only waits for receipt (fireSync), not processing. - tests TestBus repair callback: replaced asyncSpawn(deliverExcept...) with an explicit pending-delivery queue drained by `bus.drain()`. Integration tests no longer rely on `sleepAsync(10ms)` to let spawned deliveries finish — they await drain instead. Tests also pick up an asyncSetup/asyncTeardown pair (tests/async_unittest.nim) so suite fixtures can `await` directly. All `waitFor` in setup/teardown blocks is gone; only the top-level asyncTest wrapper still uses waitFor (once, to drive the async proc to completion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Correctly propagate error hidden by new async move * Correctly handle future cancellation exceptions, +some housekeeping * Apply suggestion from @Ivansete-status Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com> * Stylistics, async default implication addressed, nph style run * Remove leaking CancelledFuture from public facing + as a consequence it is tuneled into handling CatchableError everywhere --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>
2026-05-25 22:30:15 +02:00
exec "nim c -r --outdir:build tests/test_bloom.nim"
exec "nim c -r --outdir:build tests/test_reliability.nim"
exec "nim c -r --outdir:build tests/test_persistence.nim"
refactor(persistence): snapshot-based interface (5 procs, atomic per-op) (#72) * feat: propagate persistence backend errors via Result The Persistence contract previously returned `Future[void]` for writes and `Future[ChannelSnapshot]` for the loader, with `raises: []`. Backends had no way to report a failure, so a failed write or a failed/partial read was silently swallowed — and on the read path a mid-scan failure could bootstrap a *truncated* channel snapshot, corrupting the rebuilt bloom filter and lamport clock across a restart. Make every contract field Result-returning: * mutating ops -> Future[Result[void, string]] * loadAllForChannel -> Future[Result[ChannelSnapshot, string]] The backend-supplied error string is mapped to a new `ReliabilityError.rePersistenceError` (logged once at the boundary via `reliabilityErr`) and threaded up through every persistence-touching proc to the public API, where the caller decides what to do. Request-driven paths (wrap/unwrap/markDependenciesMet/ensureChannel/removeChannel/reset) propagate the error; background maintenance loops (periodicBufferSweep, periodicRepairSweep) log and retry on the next tick, since they have no synchronous caller. Tests: in-memory backend gains a `failingOps` injection hook; new "Persistence: error propagation" suite asserts read/write/drop failures surface as `rePersistenceError`. Full suite passes (90 OK). BREAKING CHANGE: the `Persistence` contract signature changed; custom backends must return `Result` and `ok()` on success. Bumped to 0.3.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(persistence): add snapshot types and codec (phase 0) Introduce atomic-snapshot persistence types that will replace the current fine-grained 13-proc Persistence interface. This commit is purely additive: no existing call site changes, no behaviour change. New types (sds/types/): - channel_meta.nim — ChannelMeta (atomic per-channel snapshot blob), ChannelData (bootstrap payload), OutgoingRepairKV / IncomingRepairKV (flattened map entries for protobuf wire shape). - history_update.nim — HistoryUpdate (combined append/evict payload for the message log). New codec (sds/snapshot_codec.nim): - Protobuf encode/decode for all new types, reusing the existing SdsMessage and HistoryEntry encoders from sds/protobuf.nim. - Explicit schemaVersion=1 on ChannelMeta; decoder rejects unknown versions loudly rather than silently truncating. - Time encoded as int64 unix milliseconds. Tests (tests/test_snapshot_codec.nim): - 13 round-trip cases covering empty, single-entry, full-buffer, and repair-heavy snapshots; ChannelData ordering; HistoryUpdate variants; schemaVersion rejection. Planning artefacts: - ANALYSIS_SDS_PERSISTENCE.md — problem statement (partial-write divergence, chatty call rate, non-fatal-error policy gap). - ANALYSIS_SNAPSHOT_SAVE_POINTS.md — exact save points per protocol op and projected call rates. - PLAN_SNAPSHOT_PERSISTENCE.md — phased refactor plan; this commit implements phase 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(persistence): add PersistenceV2 interface alongside legacy (phase 1) Introduce the 5-proc snapshot-based Persistence interface that will replace the legacy 13-proc one. Both coexist on `ReliabilityManager` so phase 2 can migrate protocol ops one at a time without breaking existing callers. New file: - sds/types/persistence_v2.nim — `PersistenceV2` type with saveChannelMeta / updateHistory / loadChannel / dropChannel / setRetrievalHint. `noOpPersistenceV2()` default. Doc-comments capture the atomicity pairing (meta save + history update issued back-to-back under the channel lock) and the non-fatal failure policy from PLAN §8. Modified: - sds/types/reliability_manager.nim — adds `persistenceV2: PersistenceV2` field alongside `persistence`; constructor takes both, both default to no-op. - sds.nim — `newReliabilityManager` plumbs the new optional parameter. - AGENTS.md / CLAUDE.md — GitNexus index re-indexed after phase 0 + phase 1 additions; symbol counts updated by `npx gitnexus analyze`. No call site uses the new interface yet — that's phase 2. All existing tests still pass against the legacy interface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): migrate runRepairSweep to PersistenceV2 (phase 2.1) Per-entry removeIncomingRepair / removeOutgoingRepair calls are replaced by a single trySaveMeta per *dirty* channel at the end of that channel's sweep. Failure is logged but does NOT abort the sweep — in-memory state is the source of truth (PLAN_SNAPSHOT_PERSISTENCE.md §8). Helpers added in sds/sds_utils.nim: - snapshotMeta(channel) — capture current ChannelContext as ChannelMeta blob (flattens Table-keyed buffers to seqs for the wire shape). - trySaveMeta(rm, channelId, channel) — best-effort meta snapshot save; logs on failure, never propagates. - tryUpdateHistory(rm, channelId, append, evict) — best-effort history update; skips the call entirely when both lists are empty (HistoryUpdate contract). Call-rate impact for runRepairSweep: - Before: N persistence calls per expired entry per channel. - After: at most 1 saveChannelMeta per dirty channel; 0 on idle channels (matches the dirty-flag floor in ANALYSIS_SNAPSHOT_SAVE_POINTS). All existing tests pass — including the 3 SDS-R Repair Sweep tests that directly exercise this proc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): migrate checkUnacknowledgedMessages to PersistenceV2 (phase 2.2) Per-entry saveOutgoing / removeOutgoing calls are replaced by one trySaveMeta at the end of the pass, conditional on a dirty flag (resend attempt incremented, or entry expired). Pass succeeds even if the save fails — next tick reissues the snapshot. Call-rate impact: - Before: N persistence calls per affected entry per pass. - After: at most 1 saveChannelMeta per pass; 0 when nothing aged out. All existing tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): add V2 meta snapshot saves to foreground ops (phase 2A) Wires `trySaveMeta` into the three public protocol ops that mutate per-channel state — wrapOutgoingMessage, unwrapReceivedMessage, and markDependenciesMet — at the operation's end, under the channel lock. Legacy fine-grained persistence calls REMAIN in place; this commit is additive. Both interfaces persist the same state simultaneously, so all existing tests pass and a real backend wired to either interface continues to work. Phase 2B will strip the legacy calls. Save points match the §"Save Points" table in ANALYSIS_SNAPSHOT_SAVE_POINTS.md exactly: - wrapOutgoingMessage: 1 save (always) - unwrapReceivedMessage: 1 save on every path including duplicate (the duplicate path still mutates the repair buffers) - markDependenciesMet: 1 save after the processIncomingBuffer cascade Non-fatal failure policy (PLAN §8): trySaveMeta logs and continues; the protocol op never returns rePersistenceError for snapshot failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): strip legacy interface from protocol path; migrate tests to V2 (phase 2B+2C+2D) End-state of phase 2: the protocol code no longer issues any legacy fine-grained Persistence calls. All state survives via the snapshot-based PersistenceV2 interface — one trySaveMeta per op end, plus tryUpdateHistory batched inside addToHistory. The legacy Persistence field on ReliabilityManager remains for backwards compatibility; phase 3 deletes it. Protocol changes (sds.nim, sds/sds_utils.nim): - reviewAckStatus, processIncomingBuffer, updateLamportTimestamp → pure in-memory; no per-mutation persistence. - addToHistory: replaces appendLogEntry+removeLogEntry with a single tryUpdateHistory call carrying (append, evict) atomically. - getRecentHistoryEntries: setRetrievalHint switched to V2; non-fatal. - wrapOutgoingMessage, unwrapReceivedMessage, markDependenciesMet: all per-row saveOutgoing / removeOutgoing / saveIncoming / removeIncoming / saveOutgoingRepair / removeOutgoingRepair / saveIncomingRepair / removeIncomingRepair calls removed (16 call sites in total). State is captured by the op-end trySaveMeta added in phase 2A. - getOrCreateChannel: bootstraps from persistenceV2.loadChannel. - dropChannelFromPersistence: uses persistenceV2.dropChannel. Failure policy (PLAN_SNAPSHOT_PERSISTENCE.md §8): - Foreground ops (wrap, unwrap, markDeps, sweeps): non-fatal — trySaveMeta / tryUpdateHistory log and continue; the protocol op returns ok regardless of disk failure. In-memory state is the source of truth; the next op re-issues a complete snapshot and disk catches up automatically. - Durability-intent ops (removeChannel, resetReliabilityManager via dropChannelFromPersistence; getOrCreateChannel via loadChannel): still propagate rePersistenceError, because the caller asked us to confirm a disk operation and we cannot silently lie. Test infrastructure: - tests/in_memory_persistence_v2.nim: new V2 adapter mock that decomposes the meta blob into the existing InMemoryStore shape so test assertions on store.outgoing / store.incoming / etc. continue to work without change. - tests/test_persistence.nim: 17 tests, all rewritten against V2. - 13 state-survival tests carry over with identical assertions. - "loadChannel failure surfaces as err on bootstrap" — bootstrap keeps durability-intent semantics. - "saveChannelMeta failure during send does NOT surface" — deliberate inversion of the legacy "write failure surfaces as err" test. Asserts the new non-fatal policy: op returns ok, in-memory state correct, disk re-syncs on the next op. - "updateHistory failure during send does NOT surface" — same policy applied to the history path. - "dropChannel failure during removeChannel surfaces as err" — kept. - All 17 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(persistence): delete legacy interface; rename PersistenceV2 -> Persistence (phase 3) End-state of the snapshot-persistence refactor. The legacy 13-proc Persistence interface and its noOpPersistence are gone; the 5-proc snapshot-based interface (formerly PersistenceV2) takes their place under the canonical name. Source: - sds/types/persistence.nim: replaced 13-proc contract with the 5-proc snapshot interface (saveChannelMeta, updateHistory, loadChannel, dropChannel, setRetrievalHint). noOpPersistence returns ok everywhere and an empty ChannelData on load. - sds/types/persistence_v2.nim: removed. - sds/types/reliability_manager.nim: dropped the second persistenceV2 field; constructor takes a single `persistence: Persistence`. - sds/sds_utils.nim: rm.persistenceV2.X -> rm.persistence.X; doc-comments updated. - sds.nim: dropped the persistenceV2 parameter from newReliabilityManager. Tests: - tests/in_memory_persistence_v2.nim: removed; its content moved to... - tests/in_memory_persistence.nim: replaces the old legacy mock with the snapshot adapter under the canonical filename. Same InMemoryStore shape so test assertions stay unchanged. - tests/test_persistence.nim: ctor param renamed, suite name de-prefixed. FFI smoke (`nimble libsdsDynamicMac`, refc/threads:on): builds clean. All 4 test suites pass: - test_bloom - test_reliability - test_persistence (17 V2 tests) - test_snapshot_codec (13 codec round-trip tests) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Persisting persistence redesign plan for reference * refactor(persistence): R2 pending-write queue + per-op accumulator (PR #72 review fix) Addresses all three substantive review findings on PR #72 in one structural change: fold the per-op accumulator and the R2 retry buffer into a single queue on `ChannelContext`, flushed once at op end. Changes: - sds/types/channel_context.nim: add `pendingHistoryAppends` (`OrderedSet[SdsMessageID]`) and `pendingHistoryEvicts` (`HashSet[SdsMessageID]`) fields. Only ids are stored — the full SdsMessage is looked up from `messageHistory` at flush time. Documented invariant: every id in pendingHistoryAppends is also in messageHistory, upheld by the merge rule. - sds/sds_utils.nim: * `queueHistoryAppend(channel, msgId)` / `queueHistoryEvict(channel, msgId)` — "latest-wins" merge: append cancels any pending evict and vice versa. Symmetric, simple, handles the evict-then-re-add sequence correctly (SDS-R repair re-delivering an evicted message while the backend is unreachable). * `tryUpdateHistory(rm, channelId)` — no more list params; flushes the channel's pending queue. Dual role: per-op accumulator (multiple `addToHistory` calls within one op queue together and flush as one round-trip) AND R2 retry buffer (a failed flush leaves the queue populated for the next op to retry). * `addToHistory` queues via the helpers; does not call persistence. * Pending queue cleared on `cleanup` and `removeChannel`. - sds.nim: * `processIncomingBuffer` returns to its single-arg signature — the queue lives on the channel, no parameter threading needed. * `wrapOutgoingMessage`, `unwrapReceivedMessage` (all three paths), `markDependenciesMet` issue exactly one `trySaveMeta` + `tryUpdateHistory` pair at op end, under the lock, with no intervening `await`-of-other-work. Matches the Persistence atomicity contract documented in `sds/types/persistence.nim`. * Pending queue cleared in `resetReliabilityManager`. - tests/test_persistence.nim: * Direct `addToHistory` callers (state-survival setup) now follow with explicit `tryUpdateHistory(channelId)` to flush. Reflects the production op-end flush pattern. * New: `updateHistory failure is retried via R2 pending-write queue` — verifies that two failed sends leave both messages on the queue, and a third successful send drains the whole queue in one call. * New: `pending queue survives idle ops` — verifies that an op with no history changes of its own still flushes a previously-failed batch at op end. * New: `evict-then-re-add merge rule preserves the re-added message on disk` — regression for the "latest-wins" merge rule. The original "evict-wins" rule would silently drop the re-add and leave the message permanently absent from disk; this test would fail under that rule and passes under the corrected one. Resolves PR #72 review comments: - #1 (delta loss on failed updateHistory) — R2 retry queue. - #2 (cascade chattiness — N updateHistory calls per op) — queue collects cascaded entries, flushed as one batch. - #3 (atomicity contract mismatch) — implementation now matches the documented "saveChannelMeta then updateHistory back-to-back" pairing. Test summary: 50 tests pass (47 prior + 3 new R2/merge-rule tests). FFI dylib (`nimble libsdsDynamicMac`, refc + threads:on): clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 12:24:38 +02:00
exec "nim c -r --outdir:build tests/test_snapshot_codec.nim"
task libsdsDynamicWindows, "Generate bindings":
let outLibNameAndExt = "libsds.dll"
let name = "libsds"
buildLibrary outLibNameAndExt,
name, "library/",
2025-09-11 14:30:08 +02:00
"""-d:chronicles_line_numbers --warning:Deprecated:off --warning:UnusedImport:on -d:chronicles_log_level=TRACE """,
"dynamic"
task libsdsDynamicLinux, "Generate bindings":
let outLibNameAndExt = "libsds.so"
let name = "libsds"
buildLibrary outLibNameAndExt,
name, "library/",
"""-d:chronicles_line_numbers --warning:Deprecated:off --warning:UnusedImport:on -d:chronicles_log_level=TRACE """,
"dynamic"
task libsdsDynamicMac, "Generate bindings":
let outLibNameAndExt = "libsds.dylib"
let name = "libsds"
let cpu = getMyCpu()
let clangArch = if cpu == "amd64": "x86_64" else: cpu
let sdkPath = staticExec("xcrun --show-sdk-path").strip()
let archFlags =
"--cpu:" & cpu & " --passC:\"-arch " & clangArch & "\" --passL:\"-arch " & clangArch &
"\" --passC:\"-isysroot " & sdkPath & "\" --passL:\"-isysroot " & sdkPath & "\""
buildLibrary outLibNameAndExt,
name,
"library/",
archFlags &
" -d:chronicles_line_numbers --warning:Deprecated:off --warning:UnusedImport:on -d:chronicles_log_level=TRACE",
"dynamic"
task libsdsStaticWindows, "Generate bindings":
let outLibNameAndExt = "libsds.lib"
let name = "libsds"
buildLibrary outLibNameAndExt,
name, "library/",
"""-d:chronicles_line_numbers --warning:Deprecated:off --warning:UnusedImport:on -d:chronicles_log_level=TRACE """,
"static"
task libsdsStaticLinux, "Generate bindings":
let outLibNameAndExt = "libsds.a"
let name = "libsds"
buildLibrary outLibNameAndExt,
name, "library/",
"""-d:chronicles_line_numbers --warning:Deprecated:off --warning:UnusedImport:on -d:chronicles_log_level=TRACE """,
"static"
task libsdsStaticMac, "Generate bindings":
let outLibNameAndExt = "libsds.a"
let name = "libsds"
let cpu = getMyCpu()
let clangArch = if cpu == "amd64": "x86_64" else: cpu
let sdkPath = staticExec("xcrun --show-sdk-path").strip()
let archFlags =
"--cpu:" & cpu & " --passC:\"-arch " & clangArch & "\" --passL:\"-arch " & clangArch &
"\" --passC:\"-isysroot " & sdkPath & "\" --passL:\"-isysroot " & sdkPath & "\""
buildLibrary outLibNameAndExt,
name,
"library/",
archFlags &
" -d:chronicles_line_numbers --warning:Deprecated:off --warning:UnusedImport:on -d:chronicles_log_level=TRACE",
"static"
# Build Mobile iOS
proc buildMobileIOS(srcDir = ".", sdkPath = "") =
echo "Building iOS libsds library"
let outDir = "build"
let nimcacheDir = outDir & "/nimcache"
if dirExists nimcacheDir:
rmDir nimcacheDir
if not dirExists outDir:
mkDir outDir
if sdkPath.len == 0:
quit "Error: Xcode/iOS SDK not found"
let aFile = outDir & "/libsds.a"
let aFileTmp = outDir & "/libsds_tmp.a"
let cpu = getMyCpu()
let clangArch = if cpu == "amd64": "x86_64" else: cpu
# 1) Generate C sources from Nim (no linking)
# Use unique symbol prefix to avoid conflicts with other Nim libraries
exec "nim c" & " --nimcache:" & nimcacheDir & " --os:ios --cpu:" & cpu &
" --compileOnly:on" & " --noMain --mm:refc" & " --threads:on --opt:size --header" &
" --nimMainPrefix:libsds" & " --cc:clang" & " -d:useMalloc" & " " & srcDir &
"/libsds.nim"
# 2) Compile all generated C files to object files with hidden visibility
# This prevents symbol conflicts with other Nim libraries (e.g., libnim_status_client)
# Locate nimbase.h: try next to the nim binary first (jiro4989/setup-nim-action
# puts nim at .nim_runtime/bin/nim with lib/ alongside), then fall back to the
# choosenim toolchain directory (~/.choosenim/toolchains/nim-VERSION/lib/).
let (nimBin, _) = gorgeEx("which nim")
let nimLibFromBin = parentDir(parentDir(nimBin.strip())) / "lib"
let nimLibChoosenim = getHomeDir() / ".choosenim/toolchains/nim-" & NimVersion & "/lib"
let nimLibDir =
if fileExists(nimLibFromBin / "nimbase.h"): nimLibFromBin
else: nimLibChoosenim
let clangFlags =
"-arch " & clangArch & " -isysroot " & sdkPath & " -I" & nimLibDir &
" -fembed-bitcode -miphoneos-version-min=16.2 -O2" & " -fvisibility=hidden"
var objectFiles: seq[string] = @[]
for cFile in listFiles(nimcacheDir):
if cFile.endsWith(".c"):
let oFile = cFile.changeFileExt("o")
exec "clang " & clangFlags & " -c " & cFile & " -o " & oFile
objectFiles.add(oFile)
# 3) Create static library from all object files
exec "ar rcs " & aFileTmp & " " & objectFiles.join(" ")
# 4) Use libtool to localize all non-public symbols
# Keep only Sds* functions as global, hide everything else to prevent conflicts
# with nim runtime symbols from libnim_status_client
let keepSymbols =
"_Sds*:_libsdsNimMain:_libsdsDatInit*:_libsdsInit*:_NimMainModule__libsds*"
exec "xcrun libtool -static -o " & aFile & " " & aFileTmp &
" -exported_symbols_list /dev/stdin <<< '" & keepSymbols & "' 2>/dev/null || cp " &
aFileTmp & " " & aFile
echo "✔ iOS library created: " & aFile
task libsdsIOS, "Build the mobile bindings for iOS":
let srcDir = "./library"
var sdkPath = getEnv("IOS_SDK_PATH")
if sdkPath.len == 0:
let (detected, exitCode) = gorgeEx("xcrun --show-sdk-path --sdk iphoneos")
if exitCode == 0:
sdkPath = detected.strip()
buildMobileIOS srcDir, sdkPath
### Mobile Android
proc checkAndroidNdk() =
let ndkRoot = getEnv("ANDROID_NDK_ROOT")
if ndkRoot.len == 0:
quit """Error: ANDROID_NDK_ROOT is not set."""
if not dirExists(ndkRoot):
quit "Error: ANDROID_NDK_ROOT points to a non-existent directory: " & ndkRoot
# source.properties contains Pkg.Revision — present in every NDK since r10.
let propsFile = ndkRoot / "source.properties"
if not fileExists(propsFile):
quit "Error: " & ndkRoot & " does not look like a valid NDK (source.properties not found)."
let (props, _) = gorgeEx("cat " & propsFile)
var revision = ""
for line in props.splitLines():
if line.startsWith("Pkg.Revision"):
let parts = line.split('=')
if parts.len == 2:
revision = parts[1].strip()
if revision.len == 0:
quit "Error: Could not read NDK version from " & propsFile
echo "Android NDK version: " & revision
proc buildMobileAndroid(srcDir = ".", extra_params = "") =
let cpu = getMyCpu()
let ndkRoot = getEnv("ANDROID_NDK_ROOT")
let androidTarget = "30"
# Map Nim CPU name → NDK target triple and include dirname.
let (androidArch, archDirname) =
if cpu == "arm64": ("aarch64-linux-android", "aarch64-linux-android")
elif cpu == "amd64": ("x86_64-linux-android", "x86_64-linux-android")
elif cpu == "i386": ("i686-linux-android", "i686-linux-android")
else: ("armv7a-linux-androideabi","arm-linux-androideabi")
# NDK prebuilt toolchain — location differs by host OS.
let (hostOS, _) = gorgeEx("uname -s")
let ndkHostTag =
if hostOS.strip() == "Darwin": "darwin-x86_64"
else: "linux-x86_64"
let toolchainDir = ndkRoot / "toolchains/llvm/prebuilt" / ndkHostTag
let sysroot = toolchainDir / "sysroot"
let ndkClang = toolchainDir / "bin" / (androidArch & androidTarget & "-clang")
let outDir = "build"
if not dirExists outDir:
mkDir outDir
exec "nim c" &
" --out:" & outDir & "/libsds.so" &
" --threads:on --app:lib --opt:size --noMain --mm:refc --nimMainPrefix:libsds" &
" --cc:clang" &
" --clang.exe:\"" & ndkClang & "\"" &
" --clang.linkerexe:\"" & ndkClang & "\"" &
" --cpu:" & cpu &
" --os:android" &
" -d:androidNDK" &
" -d:chronosEventEngine=epoll" &
" --passC:\"--sysroot=" & sysroot & "\"" &
" --passL:\"--sysroot=" & sysroot & "\"" &
" --passC:\"--target=" & androidArch & androidTarget & "\"" &
" --passL:\"--target=" & androidArch & androidTarget & "\"" &
" --passC:\"-I" & sysroot & "/usr/include\"" &
" --passC:\"-I" & sysroot & "/usr/include/" & archDirname & "\"" &
" --passL:\"-L" & sysroot & "/usr/lib/" & archDirname & "/" & androidTarget & "\"" &
" --passL:-llog" &
" -d:chronicles_sinks=textlines[dynamic]" &
" --header" &
" " & extra_params &
" " & srcDir & "/libsds.nim"
task libsdsAndroid, "Build the mobile bindings for Android (uses ARCH env var)":
checkAndroidNdk()
let srcDir = "./library"
buildMobileAndroid srcDir, "-d:chronicles_log_level=ERROR"
task libsdsAndroidArm64, "Build Android arm64 bindings":
checkAndroidNdk()
putEnv("ARCH", "arm64")
buildMobileAndroid "./library", "-d:chronicles_log_level=ERROR"
task libsdsAndroidAmd64, "Build Android amd64 bindings":
checkAndroidNdk()
putEnv("ARCH", "amd64")
buildMobileAndroid "./library", "-d:chronicles_log_level=ERROR"
task libsdsAndroidX86, "Build Android x86 bindings":
checkAndroidNdk()
putEnv("ARCH", "i386")
buildMobileAndroid "./library", "-d:chronicles_log_level=ERROR"
task libsdsAndroidArm, "Build Android arm bindings":
checkAndroidNdk()
putEnv("ARCH", "arm")
buildMobileAndroid "./library", "-d:chronicles_log_level=ERROR"
task libsds, "Build the shared library for the current platform":
when defined(macosx):
exec "nimble libsdsDynamicMac"
elif defined(windows):
exec "nimble libsdsDynamicWindows"
else:
exec "nimble libsdsDynamicLinux"
task clean, "Remove build artifacts":
if dirExists "build":
rmDir "build"