logos-delivery/waku/persistency/persistency.nim
NagyZoltanPeter 42e0aa43d1
feat: persistency (#3880)
* persistency: per-job SQLite-backed storage layer (singleton, brokered)

Adds a backend-neutral CRUD library at waku/persistency/, plus the
nim-brokers dependency swap that enables it.

Architecture (ports-and-adapters):
  * Persistency: process-wide singleton, one root directory.
  * Job: one tenant, one DB file, one worker thread, one BrokerContext.
  * Backend: SQLite via waku/common/databases/db_sqlite. Uniform schema
    kv(category BLOB, key BLOB, payload BLOB) PRIMARY KEY (category, key)
    WITHOUT ROWID, WAL mode.
  * Writes are fire-and-forget via EventBroker(mt) PersistEvent.
  * Reads are async via five RequestBroker(mt) shapes (KvGet, KvExists,
    KvScan, KvCount, KvDelete). Reads return Result[T, PersistencyError].
  * One storage thread per job; tenants isolated by BrokerContext.

Public surface (waku/persistency/persistency.nim):
  Persistency.instance(rootDir) / Persistency.instance() / Persistency.reset()
  p.openJob(id) / p.closeJob(id) / p.dropJob(id) / p.close()
  p.job(id) / p[id] / p.hasJob(id)
  Writes (Job form & string-id form, fire-and-forget):
    persist / persistPut / persistDelete / persistEncoded
  Reads (Job form & string-id form, async Result):
    get / exists / scan / scanPrefix / count / deleteAcked

Key & payload encoding (keys.nim, payload.nim):
  * encodePart family + variadic key(...) / payload(...) macros +
    single-value toKey / toPayload.
  * Primitives: string and openArray[byte] are 2-byte BE length + bytes;
    int{8..64} are sign-flipped 8-byte BE; uint{16..64} are 8-byte BE;
    bool/byte/char are 1 byte; enums are int64(ord(v)).
  * Generic encodePart[T: tuple | object] recurses through fields() so
    any composite Nim type is encodable without ceremony.
  * Stable across Nim/C compiler upgrades: no sizeof, no memcpy, no
    cast on pointers, no host-endianness dependency.
  * `rawKey(bytes)` + `persistPut(..., openArray[byte])` let callers
    bypass the built-in encoder with their own format (CBOR, protobuf...).

Lifecycle:
  * Persistency.new is private; Persistency.instance is the only public
    constructor. Same rootDir is idempotent; conflicting rootDir is
    peInvalidArgument. Persistency.reset for test/restart paths.
  * openJob opens-or-creates the per-job SQLite file; an existing file
    is reused with its data preserved.
  * Teardown integration: Persistency.instance registers a Teardown
    MultiRequestBroker provider that closes all jobs and clears the
    singleton slot when Waku.stop() issues Teardown.request.

Internal layering:
  types.nim          pure value types (Key, KeyRange, KvRow, TxOp,
                     PersistencyError)
  keys.nim           encodePart primitives + key(...) macro
  payload.nim        toPayload + payload(...) macro
  schema.nim         CREATE TABLE + connection pragmas + user_version
  backend_sqlite.nim KvBackend, applyOps (single source of write SQL),
                     getOne/existsOne/deleteOne, scanRange (asc/desc,
                     half-open ranges, open-ended stop), countRange
  backend_comm.nim   EventBroker(mt) PersistEvent + 5 RequestBroker(mt)
                     declarations; encodeErr/decodeErr boundary helpers
  backend_thread.nim startStorageThread / stopStorageThread (shared
                     allocShared0 arg, cstring dbPath, atomic
                     ready/shutdown flags); per-thread provider
                     registration
  persistency.nim    Persistency + Job types, singleton state, public
                     facade
  ../requests/lifecycle_requests.nim
                     Teardown MultiRequestBroker

Tests (69 cases, all passing):
  test_keys.nim          sort-order invariants (length-prefix strings,
                         sign-flipped ints, composite tuples, prefix
                         range)
  test_backend.nim       round-trip / replace / delete-return-value /
                         batched atomicity / asc-desc-half-open-open-
                         ended scans / category isolation / batch
                         txDelete
  test_lifecycle.nim     open-or-create rootDir / non-dir collision /
                         reopen across sessions / idempotent openJob /
                         two-tenant parallel isolation / closeJob joins
                         worker / dropJob removes file / acked delete
  test_facade.nim        put-then-get / atomic batch / scanPrefix
                         asc/desc / deleteAcked hit-miss /
                         fire-and-forget delete / two-tenant facade
                         isolation
  test_encoding.nim      tuple/named-tuple/object keys, embedded Key,
                         enum encoding, field-major composite sort,
                         payload struct encoding, end-to-end struct
                         round-trip through SQLite
  test_string_lookup.nim peJobNotFound semantics / hasJob / subscript /
                         persistPut+get via id / reads short-circuit /
                         writes drop+warn / persistEncoded via id /
                         scan parity Job-ref vs id
  test_singleton.nim     idempotent same-rootDir / different-rootDir
                         rejection / no-arg instance lifecycle / reset
                         retargets / reset idempotence / Teardown.request
                         end-to-end

Prerequisite delivered in the same series: replace the in-tree broker
implementation with the external nim-brokers package; update all
broker call-sites (waku_filter_v2, waku_relay, waku_rln_relay,
delivery_service, peer_manager, requests/*, factory/*, api tests, etc.)
to the new package API; chat2 made to compile again.

Note: SDS adapter (Phase 5 of the design) is deferred -- nim-sds is
still developed side-by-side and the persistency layer is intentionally
SDS-agnostic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* persistency: pin nim-brokers by URL+commit (workaround for stale registry)

The bare `brokers >= 2.0.1` form cannot resolve on machines where the
local nimble SAT solver enumerates only the registry-recorded 0.1.0 for
brokers. The nim-lang/packages entry for `brokers` carries no per-tag
metadata (only the URL), so until that registry entry is refreshed the
SAT solver clamps the available-versions list to 0.1.0 and rejects the
>= 2.0.1 constraint -- even though pkgs2 and pkgcache both have v2.0.1
cloned locally.

Pinning by URL+commit bypasses the registry path entirely. Inline
comment in waku.nimble documents the situation and the path back to
the bare form once nim-lang/packages is updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* persistency: nph format pass

Run `nph` on all 57 Nim files touched by this PR. Pure formatting:
17 files re-styled, no semantic change. Suite still 69/69.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix build, add local-storage-path config, lazy init of Persistency from Waku start

* fix: fix nix deps

* fixes for nix build, regenerate deps

* reverting accidental dependency changes

* Fixing deps

* Apply suggestions from code review

Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>

* persistency tests: migrate to suite / asyncTest / await

Match the in-tree test convention (procSuite -> suite, sync test +
waitFor -> asyncTest + await):

- procSuite "X": -> suite "X":
- For tests doing async work: test -> asyncTest, waitFor -> await.
- Poll helpers (proc waitFor(t: Job, ...) in test_lifecycle.nim,
  proc waitUntilExists(...) in test_facade.nim and
  test_string_lookup.nim) -> Future[bool] {.async.}, internal
  `waitFor X` -> `await X`, internal `sleep(N)` ->
  `await sleepAsync(chronos.milliseconds(N))`.
- Renamed test_lifecycle.nim's helper proc from `waitFor(t: Job, ...)`
  -> `pollExists(t: Job, ...)`; the previous name shadowed
  chronos.waitFor in the chronos macro expansion.
- `chronos.milliseconds(N)` explicitly qualified because `std/times`
  also exports `milliseconds` (returning TimeInterval, not Duration).
- `check await x` -> `let okN = await x; check okN` to dodge chronos's
  "yield in expr not lowered" with await-as-macro-argument.
- `(await x).foo()` -> `let awN = await x; ... awN.foo() ...` for the
  same reason.

waku/persistency/persistency.nim: nph also pulled the proc signatures
across multiple lines; restored explicit `Future[void] {.async.}`
return types after the colon (an intermediate nph pass had elided them).

Suite: 71 / 71 OK against the new async write surface.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* use idiomatic valueOr instead of ifs

* Reworked persistency shutdown, remove not necessary teardown mechanism

* Use const for DefaultStoragePath

* format to follow coding guidelines - no use of result and explicit returns - no functional change

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>
2026-05-16 00:09:07 +02:00

434 lines
16 KiB
Nim

## Public facade and main driver types for the persistency library.
##
## ``Persistency`` is the per-root coordinator; one instance owns one
## directory and any number of named jobs. ``Job`` is the per-job handle:
## one tenant, one DB file, one worker thread, one BrokerContext.
##
## ## Two ways to drive a job
##
## **By Job ref** — capture the handle from `openJob` and call methods on
## it. Cheapest, no map lookup per call:
##
## ```nim
## let p = Persistency.instance("/var/lib/wakustore").get()
## let j = p.openJob("alpha").get()
## await j.persistPut("msg", k, payload)
## let v = await j.get("msg", k)
## ```
##
## **By job id string** — useful when the caller doesn't want to thread
## the ``Job`` ref around (config-driven services, RPC dispatchers). The
## Job must still have been opened previously; the string-form procs look
## it up in `Persistency.jobs`:
##
## ```nim
## discard p.openJob("alpha")
## await p.persistPut("alpha", "msg", k, payload) # logs and resolves if not open
## let v = await p.get("alpha", "msg", k) # Result, peJobNotFound if missing
## ```
##
## ## Drain semantics
##
## Writes return a ``Future[void]`` that resolves once the PersistEvent
## has been pushed onto the worker thread's channel — **not** once the
## SQL has run. The listener is still fire-and-forget on the SQL side, so
## a read issued immediately after an awaited write is still racy by
## design in v1. To bridge the race:
## * use ``deleteAcked`` (it round-trips through the read path), or
## * poll ``exists`` until it returns true, or
## * yield with ``await sleepAsync(...)``.
{.push raises: [].}
import std/[locks, options, os, sequtils, tables]
import chronos, chronicles, results
import brokers/[event_broker, request_broker, broker_context]
import ./[types, keys, payload, backend_comm, backend_thread]
export types, keys, payload
logScope:
topics = "persistency"
const DefaultStoragePath* = "./data"
# ── Driver types ────────────────────────────────────────────────────────
type
Job* = ref object
## Per-job handle. Owns its BrokerContext and the worker thread that
## services it. Created and torn down via `Persistency.openJob` /
## `Persistency.closeJob`.
id*: string
context*: BrokerContext
runtime: JobRuntime ## internal — managed by openJob/closeJob
running*: bool
Persistency* = ref object
## Per-root coordinator. One Persistency instance manages a directory
## of per-job SQLite files at ``rootDir/<jobId>.db``.
rootDir*: string
jobs*: Table[string, Job]
# ── Singleton state ─────────────────────────────────────────────────────
#
# Persistency is a process-wide singleton: one rootDir at a time. The
# `instance` factory is the only public constructor; `new` below is
# private and skips the singleton bookkeeping (used internally and never
# called twice with conflicting rootDirs).
var
gPersistency {.global.}: Persistency
gPersistencyLock {.global.}: Lock
once:
gPersistencyLock.initLock()
# ── Lifecycle ───────────────────────────────────────────────────────────
proc dbPathFor(p: Persistency, jobId: string): string =
p.rootDir / (jobId & ".db")
proc new(T: type Persistency, rootDir: string): Result[T, PersistencyError] =
## Private. Build a Persistency value without touching the singleton
## slot. Validates ``rootDir`` but does **not** create it — directory
## materialisation is deferred to the first ``openJob`` call. Semantics:
##
## * If ``rootDir`` is empty, returns ``peInvalidArgument``.
## * If ``rootDir`` exists and is a directory, accept it.
## * If ``rootDir`` exists but is not a directory, returns
## ``peInvalidArgument``.
## * If ``rootDir`` does not exist, walk up the parent chain: the first
## existing ancestor must be a directory; otherwise returns
## ``peInvalidArgument``. This catches "obviously broken" paths early
## without actually touching the filesystem.
if rootDir.len == 0:
return err(persistencyErr(peInvalidArgument, "rootDir is empty"))
if fileExists(rootDir) and not dirExists(rootDir):
return err(
persistencyErr(
peInvalidArgument, "rootDir exists and is not a directory: " & rootDir
)
)
if not dirExists(rootDir):
var parent = parentDir(rootDir)
while parent.len > 0 and not dirExists(parent):
if fileExists(parent):
return err(
persistencyErr(
peInvalidArgument,
"rootDir ancestor exists and is not a directory: " & parent,
)
)
parent = parentDir(parent)
return ok(T(rootDir: rootDir, jobs: initTable[string, Job]()))
proc ensureRootDir(p: Persistency): Result[void, PersistencyError] =
## Materialise ``rootDir`` on demand. Idempotent; called from
## ``openJob`` so an unused Persistency leaves no directory behind.
if dirExists(p.rootDir):
return ok()
try:
createDir(p.rootDir)
except OSError, IOError:
return
err(persistencyErr(peBackend, "createDir failed: " & getCurrentExceptionMsg()))
return ok()
proc reset*(T: type Persistency) {.gcsafe.} =
## Tear down the singleton: close every open job, clear the Teardown
## provider, and free the slot so a subsequent ``Persistency.instance``
## starts fresh. Idempotent. Tests use this in `defer`;.
{.cast(gcsafe).}:
acquire(gPersistencyLock)
defer:
release(gPersistencyLock)
if gPersistency != nil:
let p = gPersistency
gPersistency = nil
p.close()
proc instance*(
T: type Persistency, rootDir: string
): Result[T, PersistencyError] {.gcsafe.} =
## Get-or-init the process-wide Persistency singleton.
##
## * First call: validates ``rootDir`` (without creating it) and
## registers the Teardown handler. The directory itself is created
## lazily by the first ``openJob`` call, so a Persistency that never
## opens a job leaves no filesystem footprint.
## * Later calls with the same ``rootDir``: returns the live instance
## (idempotent).
## * Later calls with a different ``rootDir``: returns
## ``peInvalidArgument`` — the singleton can only be re-targeted via
## ``Persistency.reset`` (or by the Teardown shutdown flow).
{.cast(gcsafe).}:
acquire(gPersistencyLock)
defer:
release(gPersistencyLock)
if gPersistency != nil:
if gPersistency.rootDir == rootDir:
return ok(gPersistency)
return err(
persistencyErr(
peInvalidArgument,
"Persistency already initialised with rootDir " & gPersistency.rootDir &
"; cannot re-init with " & rootDir,
)
)
let p = ?Persistency.new(rootDir)
gPersistency = p
return ok(p)
proc instance*(T: type Persistency): Result[T, PersistencyError] {.gcsafe.} =
## No-args form: succeeds only if the singleton is already initialised.
## Use this from services that must not be the first to touch
## persistency.
{.cast(gcsafe).}:
acquire(gPersistencyLock)
defer:
release(gPersistencyLock)
if gPersistency.isNil:
return err(persistencyErr(peClosed, "Persistency not initialised"))
return ok(gPersistency)
proc openJob*(p: Persistency, jobId: string): Result[Job, PersistencyError] =
## Open-or-create a job under this Persistency.
##
## * If the job is already open in this process, the existing ``Job``
## ref is returned (idempotent).
## * Otherwise ``rootDir`` is materialised on demand (created with
## missing parents on first use; no-op on subsequent calls), a worker
## thread is spawned, and the SQLite file at
## ``<rootDir>/<jobId>.db`` is opened. If the file does not exist it
## is created and the schema initialised; if it already exists it is
## reopened in place and its data is preserved.
let existing = p.jobs.getOrDefault(jobId, nil)
if existing != nil:
return ok(existing)
?p.ensureRootDir()
let ctx = NewBrokerContext()
let rt = ?startStorageThread(ctx, dbPathFor(p, jobId))
let job = Job(id: jobId, context: ctx, runtime: rt, running: true)
p.jobs[jobId] = job
return ok(job)
proc closeJob*(p: Persistency, jobId: string) =
## Stop the worker, join its thread, and forget the job. No-op if the
## job isn't open.
let job = p.jobs.getOrDefault(jobId, nil)
if job == nil:
return
stopStorageThread(job.runtime)
job.runtime = nil
job.running = false
p.jobs.del(jobId)
proc close*(p: Persistency) =
## Close every open job. Idempotent.
var ids: seq[string]
for id in p.jobs.keys:
ids.add(id)
for id in ids:
p.closeJob(id)
proc dropJob*(p: Persistency, jobId: string) =
## Close the job if open, then delete its DB file (plus -wal / -shm
## sidecars). Best-effort: a missing file is not an error.
p.closeJob(jobId)
let path = dbPathFor(p, jobId)
for suffix in ["", "-wal", "-shm"]:
try:
removeFile(path & suffix)
except OSError, IOError:
discard
# ── String lookup ───────────────────────────────────────────────────────
proc job*(p: Persistency, jobId: string): Result[Job, PersistencyError] =
## Look up an already-open job. Returns ``peJobNotFound`` if no such
## job has been opened (``openJob`` first).
let j = p.jobs.getOrDefault(jobId, nil)
if j != nil:
return ok(j)
else:
return err(persistencyErr(peJobNotFound, "no open job with id: " & jobId))
proc `[]`*(p: Persistency, jobId: string): Job {.raises: [KeyError].} =
## Subscript sugar for `job` — raises ``KeyError`` if the job isn't
## open. Prefer `job(p, id)` when you want a typed error.
p.jobs[jobId]
proc hasJob*(p: Persistency, jobId: string): bool {.inline.} =
p.jobs.hasKey(jobId)
# ── Writes (fire-and-forget) — Job form ─────────────────────────────────
proc persist*(t: Job, ops: seq[TxOp]): Future[void] {.async.} =
## Emit a batched persist event. The handler treats >1 ops as a single
## BEGIN IMMEDIATE/COMMIT transaction (see backend_sqlite.applyOps).
await PersistEvent.emit(t.context, PersistEvent(ops: ops))
proc persist*(t: Job, op: TxOp): Future[void] {.async.} =
await persist(t, @[op])
proc persistPut*(
t: Job, category: string, key: Key, payload: seq[byte]
): Future[void] {.async.} =
await persist(t, TxOp(category: category, key: key, kind: txPut, payload: payload))
proc persistDelete*(t: Job, category: string, key: Key): Future[void] {.async.} =
await persist(t, TxOp(category: category, key: key, kind: txDelete))
proc persistEncoded*[T](
t: Job, category: string, key: Key, value: T
): Future[void] {.async.} =
## Convenience: encode `value` via `toPayload` and put it. Use the raw
## `persistPut(..., seq[byte])` form when you already have bytes
## (e.g. an externally-produced CBOR blob).
await persistPut(t, category, key, toPayload(value))
# ── Writes (fire-and-forget) — string-lookup form ───────────────────────
#
# These look up the Job by id and dispatch. If the job isn't open we log
# a warning and drop the write — consistent with the fire-and-forget
# contract; the caller has no return channel to inspect.
proc jobOrWarn(p: Persistency, jobId: string): Job =
## Lookup helper for the fire-and-forget write paths. Returns nil and
## logs a warning if the job isn't open. Isolated as a non-generic proc
## so chronicles' `warn` macro expands cleanly (it doesn't, when called
## from inside a generic proc's body).
let job = p.jobs.getOrDefault(jobId, nil)
if job.isNil():
warn "persistency: write dropped, job not open", jobId
return job
template withJobOrWarn(p: Persistency, jobId: string, j, body: untyped) =
let `j` = p.jobOrWarn(jobId)
if not `j`.isNil():
body
proc persist*(p: Persistency, jobId: string, ops: seq[TxOp]): Future[void] {.async.} =
let j = p.jobOrWarn(jobId)
if not j.isNil():
await j.persist(ops)
proc persist*(p: Persistency, jobId: string, op: TxOp): Future[void] {.async.} =
await p.persist(jobId, @[op])
proc persistPut*(
p: Persistency, jobId: string, category: string, key: Key, payload: seq[byte]
): Future[void] {.async.} =
let j = p.jobOrWarn(jobId)
if not j.isNil():
await j.persistPut(category, key, payload)
proc persistDelete*(
p: Persistency, jobId: string, category: string, key: Key
): Future[void] {.async.} =
let j = p.jobOrWarn(jobId)
if not j.isNil():
await j.persistDelete(category, key)
proc persistEncoded*[T](
p: Persistency, jobId: string, category: string, key: Key, value: T
): Future[void] {.async.} =
let j = p.jobOrWarn(jobId)
if not j.isNil():
await j.persistEncoded(category, key, value)
# ── Reads (async, typed errors) — Job form ──────────────────────────────
template liftErr(s: string): PersistencyError =
decodeErr(s)
proc get*(
t: Job, category: string, key: Key
): Future[Result[Option[seq[byte]], PersistencyError]] {.async.} =
let r = (await KvGet.request(t.context, category, key)).valueOr:
return err(liftErr(error))
return ok(r.value)
proc exists*(
t: Job, category: string, key: Key
): Future[Result[bool, PersistencyError]] {.async.} =
let r = (await KvExists.request(t.context, category, key)).valueOr:
return err(liftErr(error))
return ok(r.value)
proc scan*(
t: Job, category: string, range: KeyRange, reverse = false
): Future[Result[seq[KvRow], PersistencyError]] {.async.} =
let r = (await KvScan.request(t.context, category, range, reverse)).valueOr:
return err(liftErr(error))
return ok(r.rows)
proc scanPrefix*(
t: Job, category: string, prefix: Key, reverse = false
): Future[Result[seq[KvRow], PersistencyError]] {.async.} =
let rng = prefixRange(prefix)
let r = (await KvScan.request(t.context, category, rng, reverse)).valueOr:
return err(liftErr(error))
return ok(r.rows)
proc count*(
t: Job, category: string, range: KeyRange
): Future[Result[int, PersistencyError]] {.async.} =
let r = (await KvCount.request(t.context, category, range)).valueOr:
return err(liftErr(error))
return ok(r.n)
proc deleteAcked*(
t: Job, category: string, key: Key
): Future[Result[bool, PersistencyError]] {.async.} =
## Goes through the read path so the caller learns whether a row was
## actually removed.
let r = (await KvDelete.request(t.context, category, key)).valueOr:
return err(liftErr(error))
return ok(r.existed)
# ── Reads (async, typed errors) — string-lookup form ────────────────────
proc get*(
p: Persistency, jobId: string, category: string, key: Key
): Future[Result[Option[seq[byte]], PersistencyError]] {.async.} =
let j = ?p.job(jobId)
return await j.get(category, key)
proc exists*(
p: Persistency, jobId: string, category: string, key: Key
): Future[Result[bool, PersistencyError]] {.async.} =
let j = ?p.job(jobId)
return await j.exists(category, key)
proc scan*(
p: Persistency, jobId: string, category: string, range: KeyRange, reverse = false
): Future[Result[seq[KvRow], PersistencyError]] {.async.} =
let j = ?p.job(jobId)
return await j.scan(category, range, reverse)
proc scanPrefix*(
p: Persistency, jobId: string, category: string, prefix: Key, reverse = false
): Future[Result[seq[KvRow], PersistencyError]] {.async.} =
let j = ?p.job(jobId)
return await j.scanPrefix(category, prefix, reverse)
proc count*(
p: Persistency, jobId: string, category: string, range: KeyRange
): Future[Result[int, PersistencyError]] {.async.} =
let j = ?p.job(jobId)
return await j.count(category, range)
proc deleteAcked*(
p: Persistency, jobId: string, category: string, key: Key
): Future[Result[bool, PersistencyError]] {.async.} =
let j = ?p.job(jobId)
return await j.deleteAcked(category, key)
{.pop.}