Per-entry removeIncomingRepair / removeOutgoingRepair calls are replaced
by a single trySaveMeta per *dirty* channel at the end of that channel's
sweep. Failure is logged but does NOT abort the sweep — in-memory state
is the source of truth (PLAN_SNAPSHOT_PERSISTENCE.md §8).
Helpers added in sds/sds_utils.nim:
- snapshotMeta(channel) — capture current ChannelContext as ChannelMeta
blob (flattens Table-keyed buffers to seqs for the wire shape).
- trySaveMeta(rm, channelId, channel) — best-effort meta snapshot save;
logs on failure, never propagates.
- tryUpdateHistory(rm, channelId, append, evict) — best-effort history
update; skips the call entirely when both lists are empty (HistoryUpdate
contract).
Call-rate impact for runRepairSweep:
- Before: N persistence calls per expired entry per channel.
- After: at most 1 saveChannelMeta per dirty channel; 0 on idle channels
(matches the dirty-flag floor in ANALYSIS_SNAPSHOT_SAVE_POINTS).
All existing tests pass — including the 3 SDS-R Repair Sweep tests that
directly exercise this proc.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Introduce the 5-proc snapshot-based Persistence interface that will
replace the legacy 13-proc one. Both coexist on `ReliabilityManager` so
phase 2 can migrate protocol ops one at a time without breaking existing
callers.
New file:
- sds/types/persistence_v2.nim — `PersistenceV2` type with
saveChannelMeta / updateHistory / loadChannel / dropChannel /
setRetrievalHint. `noOpPersistenceV2()` default. Doc-comments capture
the atomicity pairing (meta save + history update issued back-to-back
under the channel lock) and the non-fatal failure policy from PLAN §8.
Modified:
- sds/types/reliability_manager.nim — adds `persistenceV2: PersistenceV2`
field alongside `persistence`; constructor takes both, both default to
no-op.
- sds.nim — `newReliabilityManager` plumbs the new optional parameter.
- AGENTS.md / CLAUDE.md — GitNexus index re-indexed after phase 0 +
phase 1 additions; symbol counts updated by `npx gitnexus analyze`.
No call site uses the new interface yet — that's phase 2. All existing
tests still pass against the legacy interface.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Introduce atomic-snapshot persistence types that will replace the current
fine-grained 13-proc Persistence interface. This commit is purely additive:
no existing call site changes, no behaviour change.
New types (sds/types/):
- channel_meta.nim — ChannelMeta (atomic per-channel snapshot blob),
ChannelData (bootstrap payload), OutgoingRepairKV / IncomingRepairKV
(flattened map entries for protobuf wire shape).
- history_update.nim — HistoryUpdate (combined append/evict payload for
the message log).
New codec (sds/snapshot_codec.nim):
- Protobuf encode/decode for all new types, reusing the existing
SdsMessage and HistoryEntry encoders from sds/protobuf.nim.
- Explicit schemaVersion=1 on ChannelMeta; decoder rejects unknown
versions loudly rather than silently truncating.
- Time encoded as int64 unix milliseconds.
Tests (tests/test_snapshot_codec.nim):
- 13 round-trip cases covering empty, single-entry, full-buffer, and
repair-heavy snapshots; ChannelData ordering; HistoryUpdate variants;
schemaVersion rejection.
Planning artefacts:
- ANALYSIS_SDS_PERSISTENCE.md — problem statement (partial-write
divergence, chatty call rate, non-fatal-error policy gap).
- ANALYSIS_SNAPSHOT_SAVE_POINTS.md — exact save points per protocol op
and projected call rates.
- PLAN_SNAPSHOT_PERSISTENCE.md — phased refactor plan; this commit
implements phase 0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Persistence contract previously returned `Future[void]` for writes and
`Future[ChannelSnapshot]` for the loader, with `raises: []`. Backends had no
way to report a failure, so a failed write or a failed/partial read was
silently swallowed — and on the read path a mid-scan failure could bootstrap
a *truncated* channel snapshot, corrupting the rebuilt bloom filter and
lamport clock across a restart.
Make every contract field Result-returning:
* mutating ops -> Future[Result[void, string]]
* loadAllForChannel -> Future[Result[ChannelSnapshot, string]]
The backend-supplied error string is mapped to a new
`ReliabilityError.rePersistenceError` (logged once at the boundary via
`reliabilityErr`) and threaded up through every persistence-touching proc to
the public API, where the caller decides what to do. Request-driven paths
(wrap/unwrap/markDependenciesMet/ensureChannel/removeChannel/reset) propagate
the error; background maintenance loops (periodicBufferSweep,
periodicRepairSweep) log and retry on the next tick, since they have no
synchronous caller.
Tests: in-memory backend gains a `failingOps` injection hook; new
"Persistence: error propagation" suite asserts read/write/drop failures
surface as `rePersistenceError`. Full suite passes (90 OK).
BREAKING CHANGE: the `Persistence` contract signature changed; custom
backends must return `Result` and `ok()` on success. Bumped to 0.3.0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Otherwise we end up with cache collisions like this in CI :
```
> Error: cannot open '/tmp/nim/libsds_d/@z..@f..@f..@f..@f..@f..
@ffgber@f6y2zz1uv2lzi4ln2717py8m0aix64u56-avz-hajenccrq-2.2.4@favz
@fyvo@fflfgrz@frkprcgvbaf.nim.c'
```
* feat: make Persistence interface async
The 14 Persistence proc fields now return Future[...] with
{.async: (raises: []), gcsafe.}, allowing real I/O backends (SQLite,
encrypted file, network) to suspend rather than block the Chronos event
loop the manager runs on.
Propagates through:
- ReliabilityManager.lock: system.Lock -> chronos.AsyncLock. Acquired
across awaits cleanly; matches the single-threaded Chronos worker the
FFI uses. Multi-OS-thread use is now explicitly the caller's
responsibility.
- sds_utils + sds.nim public API procs (wrapOutgoingMessage,
unwrapReceivedMessage, markDependenciesMet, setCallbacks,
resetReliabilityManager, cleanup, ensureChannel, removeChannel, the
getter snapshots, etc.) are now async.
- FFI request handlers in library/sds_thread/... await the new API.
- Tests converted via an asyncTest template that wraps each test body
in an async proc; setup/teardown use waitFor for their single async
call (ensureChannel / cleanup).
Lock scope is preserved exactly: the same call sites that held the
kernel Lock today hold AsyncLock now -- no new locking added.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* refactor: drop asyncSpawn, add asyncSetup/asyncTeardown
Three asyncSpawn usages removed:
- sds.nim startPeriodicTasks: stored the periodic-task futures on
ReliabilityManager (new field `periodicTasks: seq[FutureBase]`) so
cleanup can cancel them on shutdown instead of leaking the loops
against a cleared manager.
- library/sds_thread/sds_thread.nim: fireSync moved BEFORE processing,
then `await SdsThreadRequest.process(...)` instead of asyncSpawn'ing
it. Aligns the worker with the SP-channel + lock assumption that
there are no concurrent requests; caller throughput is unchanged
because the caller only waits for receipt (fireSync), not processing.
- tests TestBus repair callback: replaced asyncSpawn(deliverExcept...)
with an explicit pending-delivery queue drained by `bus.drain()`.
Integration tests no longer rely on `sleepAsync(10ms)` to let
spawned deliveries finish — they await drain instead.
Tests also pick up an asyncSetup/asyncTeardown pair (tests/async_unittest.nim)
so suite fixtures can `await` directly. All `waitFor` in setup/teardown
blocks is gone; only the top-level asyncTest wrapper still uses waitFor
(once, to drive the async proc to completion).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Correctly propagate error hidden by new async move
* Correctly handle future cancellation exceptions, +some housekeeping
* Apply suggestion from @Ivansete-status
Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>
* Stylistics, async default implication addressed, nph style run
* Remove leaking CancelledFuture from public facing + as a consequence it is tuneled into handling CatchableError everywhere
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>
Changes include:
- Removing all submodules from vendor folder.
- Updating sds.nimble with required depndencies.
- Generating a nimble.lock file using Nimble.
- Updated Nim code to reference depndencies correctly.
- Added nix/deps.nix fixed output derivation that calls Nimble.
- Updated nixpkgs to use 25.11 commit which provides Nimbe 0.20.1.
- Disabled Nix Android builds on MacOS due to Nimble segfault.
Signed-off-by: Jakub Sokołowski <jakub@status.im>
This aims to avoid the error that appeared in status-go's
test-functional:
ERROR root:statusgo_container.py:146 Container health check failed:
Container is not running. Status: exited. Logs (last 10 lines):
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/vendor/nim-chronos/chronos/internal/asyncfutures.nim(624)
pollFor
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/vendor/nim-chronos/chronos/internal/asyncengine.nim(150)
poll
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/library/sds_thread/sds_thread.nim(45)
runSds
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/vendor/nim-chronos/chronos/internal/asyncmacro.nim(71)
process
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/library/sds_thread/inter_thread_communication/requests/sds_message_request.nim(65)
process
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/src/reliability.nim(190)
unwrapReceivedMessage
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/src/protobuf.nim(55)
extractChannelId
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/vendor/nim-results/results.nim(445)
value
/go/src/github.com/status-im/status-go/vendor/github.com/waku-org/sds-go-bindings/third_party/nim-sds/vendor/nim-results/results.nim(433)
raiseResultDefect
Error: unhandled exception: Trying to access value with err Result:
BadWireType [ResultDefect]
The assert for Android NDK is obsolete. But we also can avoid fetching
the Android dependencies when not building for Android.
Also, lsb-release is a linux-only tool.
Signed-off-by: Jakub Sokołowski <jakub@status.im>
Otherwise it fails with:
error: aarch64-darwin not supported for Android SDK. Use: NIXPKGS_SYSTEM_OVERRIDE=x86_64-darwin
Signed-off-by: Jakub Sokołowski <jakub@status.im>