303 lines
8.5 KiB
Nim
Raw Normal View History

feat: persistency (#3880) * persistency: per-job SQLite-backed storage layer (singleton, brokered) Adds a backend-neutral CRUD library at waku/persistency/, plus the nim-brokers dependency swap that enables it. Architecture (ports-and-adapters): * Persistency: process-wide singleton, one root directory. * Job: one tenant, one DB file, one worker thread, one BrokerContext. * Backend: SQLite via waku/common/databases/db_sqlite. Uniform schema kv(category BLOB, key BLOB, payload BLOB) PRIMARY KEY (category, key) WITHOUT ROWID, WAL mode. * Writes are fire-and-forget via EventBroker(mt) PersistEvent. * Reads are async via five RequestBroker(mt) shapes (KvGet, KvExists, KvScan, KvCount, KvDelete). Reads return Result[T, PersistencyError]. * One storage thread per job; tenants isolated by BrokerContext. Public surface (waku/persistency/persistency.nim): Persistency.instance(rootDir) / Persistency.instance() / Persistency.reset() p.openJob(id) / p.closeJob(id) / p.dropJob(id) / p.close() p.job(id) / p[id] / p.hasJob(id) Writes (Job form & string-id form, fire-and-forget): persist / persistPut / persistDelete / persistEncoded Reads (Job form & string-id form, async Result): get / exists / scan / scanPrefix / count / deleteAcked Key & payload encoding (keys.nim, payload.nim): * encodePart family + variadic key(...) / payload(...) macros + single-value toKey / toPayload. * Primitives: string and openArray[byte] are 2-byte BE length + bytes; int{8..64} are sign-flipped 8-byte BE; uint{16..64} are 8-byte BE; bool/byte/char are 1 byte; enums are int64(ord(v)). * Generic encodePart[T: tuple | object] recurses through fields() so any composite Nim type is encodable without ceremony. * Stable across Nim/C compiler upgrades: no sizeof, no memcpy, no cast on pointers, no host-endianness dependency. * `rawKey(bytes)` + `persistPut(..., openArray[byte])` let callers bypass the built-in encoder with their own format (CBOR, protobuf...). Lifecycle: * Persistency.new is private; Persistency.instance is the only public constructor. Same rootDir is idempotent; conflicting rootDir is peInvalidArgument. Persistency.reset for test/restart paths. * openJob opens-or-creates the per-job SQLite file; an existing file is reused with its data preserved. * Teardown integration: Persistency.instance registers a Teardown MultiRequestBroker provider that closes all jobs and clears the singleton slot when Waku.stop() issues Teardown.request. Internal layering: types.nim pure value types (Key, KeyRange, KvRow, TxOp, PersistencyError) keys.nim encodePart primitives + key(...) macro payload.nim toPayload + payload(...) macro schema.nim CREATE TABLE + connection pragmas + user_version backend_sqlite.nim KvBackend, applyOps (single source of write SQL), getOne/existsOne/deleteOne, scanRange (asc/desc, half-open ranges, open-ended stop), countRange backend_comm.nim EventBroker(mt) PersistEvent + 5 RequestBroker(mt) declarations; encodeErr/decodeErr boundary helpers backend_thread.nim startStorageThread / stopStorageThread (shared allocShared0 arg, cstring dbPath, atomic ready/shutdown flags); per-thread provider registration persistency.nim Persistency + Job types, singleton state, public facade ../requests/lifecycle_requests.nim Teardown MultiRequestBroker Tests (69 cases, all passing): test_keys.nim sort-order invariants (length-prefix strings, sign-flipped ints, composite tuples, prefix range) test_backend.nim round-trip / replace / delete-return-value / batched atomicity / asc-desc-half-open-open- ended scans / category isolation / batch txDelete test_lifecycle.nim open-or-create rootDir / non-dir collision / reopen across sessions / idempotent openJob / two-tenant parallel isolation / closeJob joins worker / dropJob removes file / acked delete test_facade.nim put-then-get / atomic batch / scanPrefix asc/desc / deleteAcked hit-miss / fire-and-forget delete / two-tenant facade isolation test_encoding.nim tuple/named-tuple/object keys, embedded Key, enum encoding, field-major composite sort, payload struct encoding, end-to-end struct round-trip through SQLite test_string_lookup.nim peJobNotFound semantics / hasJob / subscript / persistPut+get via id / reads short-circuit / writes drop+warn / persistEncoded via id / scan parity Job-ref vs id test_singleton.nim idempotent same-rootDir / different-rootDir rejection / no-arg instance lifecycle / reset retargets / reset idempotence / Teardown.request end-to-end Prerequisite delivered in the same series: replace the in-tree broker implementation with the external nim-brokers package; update all broker call-sites (waku_filter_v2, waku_relay, waku_rln_relay, delivery_service, peer_manager, requests/*, factory/*, api tests, etc.) to the new package API; chat2 made to compile again. Note: SDS adapter (Phase 5 of the design) is deferred -- nim-sds is still developed side-by-side and the persistency layer is intentionally SDS-agnostic. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * persistency: pin nim-brokers by URL+commit (workaround for stale registry) The bare `brokers >= 2.0.1` form cannot resolve on machines where the local nimble SAT solver enumerates only the registry-recorded 0.1.0 for brokers. The nim-lang/packages entry for `brokers` carries no per-tag metadata (only the URL), so until that registry entry is refreshed the SAT solver clamps the available-versions list to 0.1.0 and rejects the >= 2.0.1 constraint -- even though pkgs2 and pkgcache both have v2.0.1 cloned locally. Pinning by URL+commit bypasses the registry path entirely. Inline comment in waku.nimble documents the situation and the path back to the bare form once nim-lang/packages is updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * persistency: nph format pass Run `nph` on all 57 Nim files touched by this PR. Pure formatting: 17 files re-styled, no semantic change. Suite still 69/69. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Fix build, add local-storage-path config, lazy init of Persistency from Waku start * fix: fix nix deps * fixes for nix build, regenerate deps * reverting accidental dependency changes * Fixing deps * Apply suggestions from code review Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com> * persistency tests: migrate to suite / asyncTest / await Match the in-tree test convention (procSuite -> suite, sync test + waitFor -> asyncTest + await): - procSuite "X": -> suite "X": - For tests doing async work: test -> asyncTest, waitFor -> await. - Poll helpers (proc waitFor(t: Job, ...) in test_lifecycle.nim, proc waitUntilExists(...) in test_facade.nim and test_string_lookup.nim) -> Future[bool] {.async.}, internal `waitFor X` -> `await X`, internal `sleep(N)` -> `await sleepAsync(chronos.milliseconds(N))`. - Renamed test_lifecycle.nim's helper proc from `waitFor(t: Job, ...)` -> `pollExists(t: Job, ...)`; the previous name shadowed chronos.waitFor in the chronos macro expansion. - `chronos.milliseconds(N)` explicitly qualified because `std/times` also exports `milliseconds` (returning TimeInterval, not Duration). - `check await x` -> `let okN = await x; check okN` to dodge chronos's "yield in expr not lowered" with await-as-macro-argument. - `(await x).foo()` -> `let awN = await x; ... awN.foo() ...` for the same reason. waku/persistency/persistency.nim: nph also pulled the proc signatures across multiple lines; restored explicit `Future[void] {.async.}` return types after the colon (an intermediate nph pass had elided them). Suite: 71 / 71 OK against the new async write surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * use idiomatic valueOr instead of ifs * Reworked persistency shutdown, remove not necessary teardown mechanism * Use const for DefaultStoragePath * format to follow coding guidelines - no use of result and explicit returns - no functional change --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>
2026-05-16 00:09:07 +02:00
{.used.}
import std/[options, os, times]
import chronos, results
import testutils/unittests
import brokers/[event_broker, request_broker]
import waku/persistency/persistency
import waku/persistency/backend_comm
proc payloadBytes(s: string): seq[byte] =
result = newSeq[byte](s.len)
for i, c in s:
result[i] = byte(c)
template str(b: seq[byte]): string =
var s = newString(b.len)
for i, x in b:
s[i] = char(x)
s
proc tmpRoot(label: string): string =
let p = getTempDir() / ("persistency_test_" & label & "_" & $epochTime().int)
removeDir(p)
p
# Cross-thread persist: emit a PersistEvent then poll until the row shows up
# via KvExists. The PersistEvent listener is fire-and-forget, so reads
# immediately after emit are racy by design (documented in v1).
proc pollExists(
t: Job, category: string, k: Key, timeoutMs = 1000
): Future[bool] {.async.} =
let deadline = epochTime() + (timeoutMs.float / 1000.0)
while epochTime() < deadline:
let r = await KvExists.request(t.context, category, k)
if r.isOk and r.get().value:
return true
await sleepAsync(chronos.milliseconds(2))
return false
suite "Persistency lifecycle":
test "Persistency.instance accepts a pre-existing rootDir":
let root = tmpRoot("preexisting")
defer:
removeDir(root)
createDir(root) # pretend a previous run left it
let marker = root / "do-not-touch.txt"
writeFile(marker, "hi")
defer:
removeFile(marker)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
# The pre-existing file is untouched.
check fileExists(marker)
check readFile(marker) == "hi"
test "Persistency.instance refuses a non-directory path":
let root = tmpRoot("collision")
defer:
removeFile(root)
writeFile(root, "im a file not a dir") # collide with rootDir name
let r = Persistency.instance(root)
check r.isErr
check r.error.kind == peInvalidArgument
test "Persistency.instance defers rootDir creation until first openJob":
let root = tmpRoot("lazy")
defer:
removeDir(root)
check not dirExists(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
# instance() must not have touched the filesystem
check not dirExists(root)
discard p.openJob("first").get()
# first openJob materialises the directory
check dirExists(root)
test "Persistency.instance refuses a path whose ancestor is not a directory":
let parent = tmpRoot("bad-parent")
defer:
removeFile(parent)
writeFile(parent, "not a directory")
let root = parent / "child"
let r = Persistency.instance(root)
check r.isErr
check r.error.kind == peInvalidArgument
asyncTest "openJob reuses an existing DB file across processes-of-one":
let root = tmpRoot("reopen")
defer:
removeDir(root)
# First "session": write something then close.
block firstSession:
let p = Persistency.instance(root).get()
let j = p.openJob("persist").get()
await j.persistPut("msg", key("c", 1'i64), payloadBytes("v1"))
let ckOk1 = await j.pollExists("msg", key("c", 1'i64))
check ckOk1
Persistency.reset()
check fileExists(root / "persist.db")
# Second "session": reopen and read the data back.
block secondSession:
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let j = p.openJob("persist").get()
let aw1 = await KvGet.request(j.context, "msg", key("c", 1'i64))
let got = aw1.get()
check got.value.isSome
check str(got.value.get) == "v1"
test "openJob is idempotent within a session":
let root = tmpRoot("idem")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let a = p.openJob("same").get()
let b = p.openJob("same").get()
check a.id == b.id
check a.context == b.context
test "openJob materialises rootDir and launches a worker":
let root = tmpRoot("basic")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let t = p.openJob("alpha").get()
check t.id == "alpha"
check t.running
check fileExists(root / "alpha.db")
asyncTest "persist then read round-trips via brokers":
let root = tmpRoot("rw")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let t = p.openJob("t1").get()
let k = key("c", 1'i64)
let ev = PersistEvent(
ops: @[TxOp(category: "msg", key: k, kind: txPut, payload: payloadBytes("hello"))]
)
await PersistEvent.emit(t.context, ev)
let ckOk2 = await t.pollExists("msg", k)
check ckOk2
let aw2 = await KvGet.request(t.context, "msg", k)
let got = aw2.get()
check got.value.isSome
check str(got.value.get) == "hello"
asyncTest "two jobs run in parallel with isolated DBs":
let root = tmpRoot("isolation")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let a = p.openJob("alpha").get()
let b = p.openJob("beta").get()
check a.context != b.context
let k = key("shared", 1'i64)
await PersistEvent.emit(
a.context,
PersistEvent(
ops: @[
TxOp(
category: "msg", key: k, kind: txPut, payload: payloadBytes("from-alpha")
)
]
),
)
await PersistEvent.emit(
b.context,
PersistEvent(
ops: @[
TxOp(category: "msg", key: k, kind: txPut, payload: payloadBytes("from-beta"))
]
),
)
let ckOk3 = await a.pollExists("msg", k)
check ckOk3
let ckOk4 = await b.pollExists("msg", k)
check ckOk4
let aw3 = await KvGet.request(a.context, "msg", k)
let aGot = aw3.get()
let aw4 = await KvGet.request(b.context, "msg", k)
let bGot = aw4.get()
check str(aGot.value.get) == "from-alpha"
check str(bGot.value.get) == "from-beta"
# Each job has its own DB file.
check fileExists(root / "alpha.db")
check fileExists(root / "beta.db")
asyncTest "closeJob joins the worker and frees the slot":
let root = tmpRoot("close")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let t = p.openJob("x").get()
let ctx = t.context
p.closeJob("x")
check not t.running
# After close, requests on the old context have no provider.
let r = await KvExists.request(ctx, "msg", key("k"))
check r.isErr
test "dropJob removes the DB file":
let root = tmpRoot("drop")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
discard p.openJob("ephemeral").get()
check fileExists(root / "ephemeral.db")
p.dropJob("ephemeral")
check not fileExists(root / "ephemeral.db")
asyncTest "scan and count over a range":
let root = tmpRoot("scan")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let t = p.openJob("t").get()
var ops: seq[TxOp]
for i in 1'i64 .. 5:
ops.add(
TxOp(category: "msg", key: key("c", i), kind: txPut, payload: payloadBytes($i))
)
await PersistEvent.emit(t.context, PersistEvent(ops: ops))
# Wait for the last insert to land.
let ckOk5 = await t.pollExists("msg", key("c", 5'i64))
check ckOk5
let rng = prefixRange(key("c"))
let aw5 = await KvCount.request(t.context, "msg", rng)
let cnt = aw5.get()
check cnt.n == 5
let aw6 = await KvScan.request(t.context, "msg", rng, false)
let scn = aw6.get()
check scn.rows.len == 5
check str(scn.rows[0].payload) == "1"
check str(scn.rows[4].payload) == "5"
asyncTest "acked delete reports whether the row existed":
let root = tmpRoot("delete")
defer:
removeDir(root)
let p = Persistency.instance(root).get()
defer:
Persistency.reset()
let t = p.openJob("t").get()
let k = key("d", 1'i64)
let aw7 = await KvDelete.request(t.context, "msg", k)
let r1 = aw7.get()
check r1.existed == false
await PersistEvent.emit(
t.context,
PersistEvent(
ops: @[TxOp(category: "msg", key: k, kind: txPut, payload: payloadBytes("v"))]
),
)
let ckOk6 = await t.pollExists("msg", k)
check ckOk6
let aw8 = await KvDelete.request(t.context, "msg", k)
let r2 = aw8.get()
check r2.existed == true
let aw9 = await KvExists.request(t.context, "msg", k)
let r3 = aw9.get()
check r3.value == false