mirror of
https://github.com/logos-messaging/logos-messaging-nim.git
synced 2026-05-22 10:20:04 +00:00
* persistency: per-job SQLite-backed storage layer (singleton, brokered)
Adds a backend-neutral CRUD library at waku/persistency/, plus the
nim-brokers dependency swap that enables it.
Architecture (ports-and-adapters):
* Persistency: process-wide singleton, one root directory.
* Job: one tenant, one DB file, one worker thread, one BrokerContext.
* Backend: SQLite via waku/common/databases/db_sqlite. Uniform schema
kv(category BLOB, key BLOB, payload BLOB) PRIMARY KEY (category, key)
WITHOUT ROWID, WAL mode.
* Writes are fire-and-forget via EventBroker(mt) PersistEvent.
* Reads are async via five RequestBroker(mt) shapes (KvGet, KvExists,
KvScan, KvCount, KvDelete). Reads return Result[T, PersistencyError].
* One storage thread per job; tenants isolated by BrokerContext.
Public surface (waku/persistency/persistency.nim):
Persistency.instance(rootDir) / Persistency.instance() / Persistency.reset()
p.openJob(id) / p.closeJob(id) / p.dropJob(id) / p.close()
p.job(id) / p[id] / p.hasJob(id)
Writes (Job form & string-id form, fire-and-forget):
persist / persistPut / persistDelete / persistEncoded
Reads (Job form & string-id form, async Result):
get / exists / scan / scanPrefix / count / deleteAcked
Key & payload encoding (keys.nim, payload.nim):
* encodePart family + variadic key(...) / payload(...) macros +
single-value toKey / toPayload.
* Primitives: string and openArray[byte] are 2-byte BE length + bytes;
int{8..64} are sign-flipped 8-byte BE; uint{16..64} are 8-byte BE;
bool/byte/char are 1 byte; enums are int64(ord(v)).
* Generic encodePart[T: tuple | object] recurses through fields() so
any composite Nim type is encodable without ceremony.
* Stable across Nim/C compiler upgrades: no sizeof, no memcpy, no
cast on pointers, no host-endianness dependency.
* `rawKey(bytes)` + `persistPut(..., openArray[byte])` let callers
bypass the built-in encoder with their own format (CBOR, protobuf...).
Lifecycle:
* Persistency.new is private; Persistency.instance is the only public
constructor. Same rootDir is idempotent; conflicting rootDir is
peInvalidArgument. Persistency.reset for test/restart paths.
* openJob opens-or-creates the per-job SQLite file; an existing file
is reused with its data preserved.
* Teardown integration: Persistency.instance registers a Teardown
MultiRequestBroker provider that closes all jobs and clears the
singleton slot when Waku.stop() issues Teardown.request.
Internal layering:
types.nim pure value types (Key, KeyRange, KvRow, TxOp,
PersistencyError)
keys.nim encodePart primitives + key(...) macro
payload.nim toPayload + payload(...) macro
schema.nim CREATE TABLE + connection pragmas + user_version
backend_sqlite.nim KvBackend, applyOps (single source of write SQL),
getOne/existsOne/deleteOne, scanRange (asc/desc,
half-open ranges, open-ended stop), countRange
backend_comm.nim EventBroker(mt) PersistEvent + 5 RequestBroker(mt)
declarations; encodeErr/decodeErr boundary helpers
backend_thread.nim startStorageThread / stopStorageThread (shared
allocShared0 arg, cstring dbPath, atomic
ready/shutdown flags); per-thread provider
registration
persistency.nim Persistency + Job types, singleton state, public
facade
../requests/lifecycle_requests.nim
Teardown MultiRequestBroker
Tests (69 cases, all passing):
test_keys.nim sort-order invariants (length-prefix strings,
sign-flipped ints, composite tuples, prefix
range)
test_backend.nim round-trip / replace / delete-return-value /
batched atomicity / asc-desc-half-open-open-
ended scans / category isolation / batch
txDelete
test_lifecycle.nim open-or-create rootDir / non-dir collision /
reopen across sessions / idempotent openJob /
two-tenant parallel isolation / closeJob joins
worker / dropJob removes file / acked delete
test_facade.nim put-then-get / atomic batch / scanPrefix
asc/desc / deleteAcked hit-miss /
fire-and-forget delete / two-tenant facade
isolation
test_encoding.nim tuple/named-tuple/object keys, embedded Key,
enum encoding, field-major composite sort,
payload struct encoding, end-to-end struct
round-trip through SQLite
test_string_lookup.nim peJobNotFound semantics / hasJob / subscript /
persistPut+get via id / reads short-circuit /
writes drop+warn / persistEncoded via id /
scan parity Job-ref vs id
test_singleton.nim idempotent same-rootDir / different-rootDir
rejection / no-arg instance lifecycle / reset
retargets / reset idempotence / Teardown.request
end-to-end
Prerequisite delivered in the same series: replace the in-tree broker
implementation with the external nim-brokers package; update all
broker call-sites (waku_filter_v2, waku_relay, waku_rln_relay,
delivery_service, peer_manager, requests/*, factory/*, api tests, etc.)
to the new package API; chat2 made to compile again.
Note: SDS adapter (Phase 5 of the design) is deferred -- nim-sds is
still developed side-by-side and the persistency layer is intentionally
SDS-agnostic.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* persistency: pin nim-brokers by URL+commit (workaround for stale registry)
The bare `brokers >= 2.0.1` form cannot resolve on machines where the
local nimble SAT solver enumerates only the registry-recorded 0.1.0 for
brokers. The nim-lang/packages entry for `brokers` carries no per-tag
metadata (only the URL), so until that registry entry is refreshed the
SAT solver clamps the available-versions list to 0.1.0 and rejects the
>= 2.0.1 constraint -- even though pkgs2 and pkgcache both have v2.0.1
cloned locally.
Pinning by URL+commit bypasses the registry path entirely. Inline
comment in waku.nimble documents the situation and the path back to
the bare form once nim-lang/packages is updated.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* persistency: nph format pass
Run `nph` on all 57 Nim files touched by this PR. Pure formatting:
17 files re-styled, no semantic change. Suite still 69/69.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Fix build, add local-storage-path config, lazy init of Persistency from Waku start
* fix: fix nix deps
* fixes for nix build, regenerate deps
* reverting accidental dependency changes
* Fixing deps
* Apply suggestions from code review
Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>
* persistency tests: migrate to suite / asyncTest / await
Match the in-tree test convention (procSuite -> suite, sync test +
waitFor -> asyncTest + await):
- procSuite "X": -> suite "X":
- For tests doing async work: test -> asyncTest, waitFor -> await.
- Poll helpers (proc waitFor(t: Job, ...) in test_lifecycle.nim,
proc waitUntilExists(...) in test_facade.nim and
test_string_lookup.nim) -> Future[bool] {.async.}, internal
`waitFor X` -> `await X`, internal `sleep(N)` ->
`await sleepAsync(chronos.milliseconds(N))`.
- Renamed test_lifecycle.nim's helper proc from `waitFor(t: Job, ...)`
-> `pollExists(t: Job, ...)`; the previous name shadowed
chronos.waitFor in the chronos macro expansion.
- `chronos.milliseconds(N)` explicitly qualified because `std/times`
also exports `milliseconds` (returning TimeInterval, not Duration).
- `check await x` -> `let okN = await x; check okN` to dodge chronos's
"yield in expr not lowered" with await-as-macro-argument.
- `(await x).foo()` -> `let awN = await x; ... awN.foo() ...` for the
same reason.
waku/persistency/persistency.nim: nph also pulled the proc signatures
across multiple lines; restored explicit `Future[void] {.async.}`
return types after the colon (an intermediate nph pass had elided them).
Suite: 71 / 71 OK against the new async write surface.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* use idiomatic valueOr instead of ifs
* Reworked persistency shutdown, remove not necessary teardown mechanism
* Use const for DefaultStoragePath
* format to follow coding guidelines - no use of result and explicit returns - no functional change
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Ivan FB <128452529+Ivansete-status@users.noreply.github.com>
274 lines
9.1 KiB
Nim
274 lines
9.1 KiB
Nim
## This module reinforces the publish operation with regular store-v3 requests.
|
|
##
|
|
|
|
import std/[sequtils, tables, options]
|
|
import chronos, chronicles, libp2p/utility
|
|
import brokers/broker_context
|
|
import
|
|
./[send_processor, relay_processor, lightpush_processor, delivery_task],
|
|
../[subscription_manager],
|
|
waku/[
|
|
waku_core,
|
|
node/waku_node,
|
|
node/peer_manager,
|
|
waku_store/client,
|
|
waku_store/common,
|
|
waku_relay/protocol,
|
|
waku_rln_relay/rln_relay,
|
|
waku_lightpush/client,
|
|
waku_lightpush/callbacks,
|
|
events/message_events,
|
|
]
|
|
|
|
logScope:
|
|
topics = "send service"
|
|
|
|
# This useful util is missing from sequtils, this extends applyIt with predicate...
|
|
template applyItIf*(varSeq, pred, op: untyped) =
|
|
for i in low(varSeq) .. high(varSeq):
|
|
let it {.inject.} = varSeq[i]
|
|
if pred:
|
|
op
|
|
varSeq[i] = it
|
|
|
|
template forEach*(varSeq, op: untyped) =
|
|
for i in low(varSeq) .. high(varSeq):
|
|
let it {.inject.} = varSeq[i]
|
|
op
|
|
|
|
const MaxTimeInCache* = chronos.minutes(1)
|
|
## Messages older than this time will get completely forgotten on publication and a
|
|
## feedback will be given when that happens
|
|
|
|
const ServiceLoopInterval* = chronos.seconds(1)
|
|
## Interval at which we check that messages have been properly received by a store node
|
|
|
|
const ArchiveTime = chronos.seconds(3)
|
|
## Estimation of the time we wait until we start confirming that a message has been properly
|
|
## received and archived by a store node
|
|
|
|
type SendService* = ref object of RootObj
|
|
brokerCtx: BrokerContext
|
|
taskCache: seq[DeliveryTask]
|
|
## Cache that contains the delivery task per message hash.
|
|
## This is needed to make sure the published messages are properly published
|
|
|
|
serviceLoopHandle: Future[void] ## handle that allows to stop the async task
|
|
sendProcessor: BaseSendProcessor
|
|
|
|
node: WakuNode
|
|
checkStoreForMessages: bool
|
|
subscriptionManager: SubscriptionManager
|
|
|
|
proc setupSendProcessorChain(
|
|
peerManager: PeerManager,
|
|
lightpushClient: WakuLightPushClient,
|
|
relay: WakuRelay,
|
|
rlnRelay: WakuRLNRelay,
|
|
brokerCtx: BrokerContext,
|
|
): Result[BaseSendProcessor, string] =
|
|
let isRelayAvail = not relay.isNil()
|
|
let isLightPushAvail = not lightpushClient.isNil()
|
|
|
|
if not isRelayAvail and not isLightPushAvail:
|
|
return err("No valid send processor found for the delivery task")
|
|
|
|
var processors = newSeq[BaseSendProcessor]()
|
|
|
|
if isRelayAvail:
|
|
let rln: Option[WakuRLNRelay] =
|
|
if rlnRelay.isNil():
|
|
none[WakuRLNRelay]()
|
|
else:
|
|
some(rlnRelay)
|
|
let publishProc = getRelayPushHandler(relay, rln)
|
|
|
|
processors.add(RelaySendProcessor.new(isLightPushAvail, publishProc, brokerCtx))
|
|
if isLightPushAvail:
|
|
processors.add(LightpushSendProcessor.new(peerManager, lightpushClient, brokerCtx))
|
|
|
|
var currentProcessor: BaseSendProcessor = processors[0]
|
|
for i in 1 ..< processors.len:
|
|
currentProcessor.chain(processors[i])
|
|
currentProcessor = processors[i]
|
|
trace "Send processor chain", index = i, processor = type(processors[i]).name
|
|
|
|
return ok(processors[0])
|
|
|
|
proc new*(
|
|
T: typedesc[SendService],
|
|
preferP2PReliability: bool,
|
|
w: WakuNode,
|
|
s: SubscriptionManager,
|
|
): Result[T, string] =
|
|
if w.wakuRelay.isNil() and w.wakuLightpushClient.isNil():
|
|
return err(
|
|
"Could not create SendService. wakuRelay or wakuLightpushClient should be set"
|
|
)
|
|
|
|
let checkStoreForMessages = preferP2PReliability and not w.wakuStoreClient.isNil()
|
|
|
|
let sendProcessorChain = setupSendProcessorChain(
|
|
w.peerManager, w.wakuLightPushClient, w.wakuRelay, w.wakuRlnRelay, w.brokerCtx
|
|
).valueOr:
|
|
return err("failed to setup SendProcessorChain: " & $error)
|
|
|
|
let sendService = SendService(
|
|
brokerCtx: w.brokerCtx,
|
|
taskCache: newSeq[DeliveryTask](),
|
|
serviceLoopHandle: nil,
|
|
sendProcessor: sendProcessorChain,
|
|
node: w,
|
|
checkStoreForMessages: checkStoreForMessages,
|
|
subscriptionManager: s,
|
|
)
|
|
|
|
return ok(sendService)
|
|
|
|
proc addTask(self: SendService, task: DeliveryTask) =
|
|
self.taskCache.addUnique(task)
|
|
|
|
proc isStorePeerAvailable*(sendService: SendService): bool =
|
|
return sendService.node.peerManager.selectPeer(WakuStoreCodec).isSome()
|
|
|
|
proc checkMsgsInStore(self: SendService, tasksToValidate: seq[DeliveryTask]) {.async.} =
|
|
if tasksToValidate.len() == 0:
|
|
return
|
|
|
|
if not isStorePeerAvailable(self):
|
|
warn "Skipping store validation for ",
|
|
messageCount = tasksToValidate.len(), error = "no store peer available"
|
|
return
|
|
|
|
var hashesToValidate = tasksToValidate.mapIt(it.msgHash)
|
|
# TODO: confirm hash format for store query!!!
|
|
|
|
let storeResp: StoreQueryResponse = (
|
|
await self.node.wakuStoreClient.queryToAny(
|
|
StoreQueryRequest(includeData: false, messageHashes: hashesToValidate)
|
|
)
|
|
).valueOr:
|
|
error "Failed to get store validation for messages",
|
|
hashes = hashesToValidate.mapIt(shortLog(it)), error = $error
|
|
return
|
|
|
|
let storedItems = storeResp.messages.mapIt(it.messageHash)
|
|
|
|
# Set success state for messages found in store
|
|
self.taskCache.applyItIf(storedItems.contains(it.msgHash)):
|
|
it.state = DeliveryState.SuccessfullyValidated
|
|
|
|
# set retry state for messages not found in store
|
|
hashesToValidate.keepItIf(not storedItems.contains(it))
|
|
self.taskCache.applyItIf(hashesToValidate.contains(it.msgHash)):
|
|
it.state = DeliveryState.NextRoundRetry
|
|
|
|
proc checkStoredMessages(self: SendService) {.async.} =
|
|
if not self.checkStoreForMessages:
|
|
return
|
|
|
|
let tasksToValidate = self.taskCache.filterIt(
|
|
it.state == DeliveryState.SuccessfullyPropagated and it.deliveryAge() > ArchiveTime and
|
|
not it.isEphemeral()
|
|
)
|
|
|
|
await self.checkMsgsInStore(tasksToValidate)
|
|
|
|
proc reportTaskResult(self: SendService, task: DeliveryTask) =
|
|
case task.state
|
|
of DeliveryState.SuccessfullyPropagated:
|
|
# TODO: in case of unable to strore check messages shall we report success instead?
|
|
if not task.propagateEventEmitted:
|
|
info "Message successfully propagated",
|
|
requestId = task.requestId, msgHash = task.msgHash.to0xHex()
|
|
MessagePropagatedEvent.emit(
|
|
self.brokerCtx, task.requestId, task.msgHash.to0xHex()
|
|
)
|
|
task.propagateEventEmitted = true
|
|
return
|
|
of DeliveryState.SuccessfullyValidated:
|
|
info "Message successfully sent",
|
|
requestId = task.requestId, msgHash = task.msgHash.to0xHex()
|
|
MessageSentEvent.emit(self.brokerCtx, task.requestId, task.msgHash.to0xHex())
|
|
return
|
|
of DeliveryState.FailedToDeliver:
|
|
error "Failed to send message",
|
|
requestId = task.requestId,
|
|
msgHash = task.msgHash.to0xHex(),
|
|
error = task.errorDesc
|
|
MessageErrorEvent.emit(
|
|
self.brokerCtx, task.requestId, task.msgHash.to0xHex(), task.errorDesc
|
|
)
|
|
return
|
|
else:
|
|
# rest of the states are intermediate and does not translate to event
|
|
discard
|
|
|
|
if task.messageAge() > MaxTimeInCache:
|
|
error "Failed to send message",
|
|
requestId = task.requestId,
|
|
msgHash = task.msgHash.to0xHex(),
|
|
error = "Message too old",
|
|
age = task.messageAge()
|
|
task.state = DeliveryState.FailedToDeliver
|
|
MessageErrorEvent.emit(
|
|
self.brokerCtx,
|
|
task.requestId,
|
|
task.msgHash.to0xHex(),
|
|
"Unable to send within retry time window",
|
|
)
|
|
|
|
proc evaluateAndCleanUp(self: SendService) =
|
|
self.taskCache.forEach(self.reportTaskResult(it))
|
|
self.taskCache.keepItIf(
|
|
it.state != DeliveryState.SuccessfullyValidated and
|
|
it.state != DeliveryState.FailedToDeliver
|
|
)
|
|
|
|
# remove propagated messages when no store confirmation will follow
|
|
self.taskCache.keepItIf(
|
|
not (
|
|
it.state == DeliveryState.SuccessfullyPropagated and
|
|
(it.isEphemeral() or not self.checkStoreForMessages)
|
|
)
|
|
)
|
|
|
|
proc trySendMessages(self: SendService) {.async.} =
|
|
let tasksToSend = self.taskCache.filterIt(it.state == DeliveryState.NextRoundRetry)
|
|
|
|
for task in tasksToSend:
|
|
# Todo, check if it has any perf gain to run them concurrent...
|
|
await self.sendProcessor.process(task)
|
|
|
|
proc serviceLoop(self: SendService) {.async.} =
|
|
## Continuously monitors that the sent messages have been received by a store node
|
|
while true:
|
|
await self.trySendMessages()
|
|
await self.checkStoredMessages()
|
|
self.evaluateAndCleanUp()
|
|
## TODO: add circuit breaker to avoid infinite looping in case of persistent failures
|
|
## Use OnlineStateChange observers to pause/resume the loop
|
|
await sleepAsync(ServiceLoopInterval)
|
|
|
|
proc startSendService*(self: SendService) =
|
|
self.serviceLoopHandle = self.serviceLoop()
|
|
|
|
proc stopSendService*(self: SendService) {.async.} =
|
|
if not self.serviceLoopHandle.isNil():
|
|
await self.serviceLoopHandle.cancelAndWait()
|
|
|
|
proc send*(self: SendService, task: DeliveryTask) {.async.} =
|
|
assert(not task.isNil(), "task for send must not be nil")
|
|
|
|
info "SendService.send: processing delivery task",
|
|
requestId = task.requestId, msgHash = task.msgHash.to0xHex()
|
|
|
|
self.subscriptionManager.subscribe(task.msg.contentTopic).isOkOr:
|
|
error "SendService.send: failed to subscribe to content topic",
|
|
contentTopic = task.msg.contentTopic, error = error
|
|
|
|
await self.sendProcessor.process(task)
|
|
reportTaskResult(self, task)
|
|
if task.state != DeliveryState.FailedToDeliver:
|
|
self.addTask(task)
|