logos-messaging-nim/tests/node/test_wakunode_health_monitor.nim
NagyZoltanPeter a7f893555d
Integrate api-shape phase2 (#3989) + api interfaces (#3975) (#3999)
* Reshape per-layer API into api/ folders and thin the FFI over them

Each layer now separates its constructible core from its public surface:

  - core module (waku.nim / messaging_client.nim /
    reliable_channel_manager.nim): the type plus new/start/stop and the
    private construction helpers.
  - api/ folder: one module per differentiated set of operations
    (waku: topics/relay/filter/lightpush/store/peer_manager/discovery/
    debug/health) plus an events surface.

The waku api is reshaped to be the complete operation surface the C
bindings need, so the library no longer reaches into node internals:
relayPublish returns the message hash, relaySubscribe takes an optional
handler, filter/lightpush auto-select the service peer, connectedPeersInfo
returns structured data, pingPeer honours the timeout, plus
relayNumPeersInMesh / relayNumConnectedPeers / isOnline. library/ is now a
thin C-ABI shim: each {.ffi.} proc only marshals cstring/JSON/callbacks and
delegates to ctx.myLib[].waku.<op> (or messagingClient.<op>).
app_callbacks re-exports the modules defining its handler types, which the
included FFI files previously relied on by leakage.

Events move next to the surface that owns them, with each dependency kept
pointing the right way:

  - waku/events/ relocated under waku/api/events/.
  - channel events live in channels/api/events.nim.
  - the four messaging-level message events move to messaging/api/events;
    MessageSeenEvent stays in waku because it is emitted by waku core, so
    moving it would make waku depend on the messaging layer.
  - delivery_events renamed to filter_subscribe_events to match the
    OnFilterSubscribe/Unsubscribe events it actually declares.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Add reliable-channel FFI ops + events (nim-ffi v0.1.3)

Expose the reliable-channel layer through the v0.1.3 FFI:
- channel_create / channel_send / channel_close call the
  ReliableChannelManager api (createReliableChannel / send / closeChannel),
  marshalling channel id + base64 payload + ephemeral by hand
- channel message received / sent / errored are surfaced by listening to the
  channel-layer broker events in start_node and forwarding them through
  callEventCallback (received payload base64-encoded), dropped in stop_node

Stays on nim-ffi v0.1.3 (no typed/CBOR rewrite).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Expose reliable-channel ops in the stable C header (#3851)

The library already ships as a single .so with a tiered header surface
(liblogosdelivery.h = stable Messaging/Reliable-Channels, liblogosdelivery_kernel.h
= advanced Kernel). Per that tiering, the reliable-channel ops belong on the
stable surface, so declare channel_create / channel_send / channel_close in
liblogosdelivery.h and document the channel lifecycle events delivered through
the event callback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Graft PR#3975 interface layer onto decomposed foundation (events deduped)

Add IKernel/IMessagingClient/IReliableChannelManager/ILogosDelivery interface
classes under logos_delivery/api/. The EventBroker types PR#3975 hoisted into
these files already exist in PR#3989's decomposed */api/events/ modules, so the
interface files re-export those modules instead of redefining the types
(avoids 8 duplicate EventBroker definitions). api/types.nim kept at the
foundation version (ChannelId stays in channels/types.nim, which the decomposed
modules import).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Wire impl classes to interfaces (inherit; relocate SendHandler)

- Waku : IKernel, MessagingClient : IMessagingClient,
  ReliableChannelManager : IReliableChannelManager.
- The operation procs already live in PR#3989's decomposed */api/ modules and
  stay as plain procs (nothing dispatches through the interface types, so no
  method-ization is needed).
- SendHandler now lives in reliable_channel_manager_api.nim (its PR#3975 home);
  removed the duplicate from reliable_channel.nim, which re-exports the
  interface module so channels/api/{channel_lifecycle,send} still see it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Wire LogosDelivery to ILogosDelivery orchestrator interface

LogosDelivery : ILogosDelivery; start/stop/isOnline become method overrides.
Peripheral PR#3975 edits (lightpush/store clients, self_req_handlers,
statistics) are import-reorg artifacts of deleting waku/utils/requests.nim,
which the decomposed structure keeps -- so they are intentionally not ported.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Dedup EventConnectionStatusChange (re-export from health_events)

9th duplicate EventBroker type: defined in both logos_delivery_api.nim and the
decomposed waku/api/events/health_events.nim. The interface file now re-exports
it. liblogosdelivery builds clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Move events back into interface-class source files (restore #3975 placement)

Reverses the earlier dedup-by-re-export: event TYPE definitions now live in the
interface classes, and the emptied decomposed event files are removed.

- MessageSeenEvent            -> logos_delivery/api/kernel_api.nim
- Message{Sent,Error,Propagated,Received}Event -> api/messaging_client_api.nim
- ChannelMessage{Received,Sent,Error}Event     -> api/reliable_channel_manager_api.nim
- EventConnectionStatusChange -> api/logos_delivery_api.nim

Deleted (became empty after the move):
- logos_delivery/waku/api/events/message_events.nim
- logos_delivery/messaging/api/events.nim
- logos_delivery/channels/api/events.nim
health_events.nim keeps its two remaining events (content/shard topic health).

Rewiring: each layer re-exports its interface module (waku->kernel_api,
messaging_client->messaging_client_api, reliable_channel->reliable_channel_manager_api,
which also re-exports messaging_client_api). Deep emitters/listeners
(subscription_manager, waku_node, waku_node/relay, node_health_monitor,
recv_service, send_service) import the owning interface module directly.
kernel_api stays below node level (types/topics/message/store-common) so the
node->kernel_api imports are acyclic. liblogosdelivery builds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* nph formatting

---------

Co-authored-by: Ivan FB <ivansete@status.im>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 11:51:22 +02:00

437 lines
16 KiB
Nim

{.used.}
import
std/[json, options, sequtils, strutils, tables], testutils/unittests, chronos, results
import brokers/broker_context
import
logos_delivery/waku/[
waku_core,
common/waku_protocol,
node/waku_node,
node/peer_manager,
node/health_monitor/health_status,
node/health_monitor/connection_status,
node/health_monitor/protocol_health,
node/health_monitor/topic_health,
node/health_monitor/node_health_monitor,
node/waku_node/relay,
node/waku_node/store,
node/waku_node/lightpush,
node/waku_node/filter,
api/events/health_events,
api/events/peer_events,
waku_archive,
]
import ../testlib/[wakunode, wakucore], ../waku_archive/archive_utils
import logos_delivery/waku/node/subscription_manager
import logos_delivery/messaging/messaging_client
const MockDLow = 4 # Mocked GossipSub DLow value
const TestConnectivityTimeLimit = 3.seconds
proc protoHealthMock(kind: WakuProtocol, health: HealthStatus): ProtocolHealth =
var ph = ProtocolHealth.init(kind)
if health == HealthStatus.READY:
return ph.ready()
else:
return ph.notReady("mock")
suite "Health Monitor - health state calculation":
test "Disconnected, zero peers":
let protocols = @[
protoHealthMock(RelayProtocol, HealthStatus.NOT_READY),
protoHealthMock(StoreClientProtocol, HealthStatus.NOT_READY),
protoHealthMock(FilterClientProtocol, HealthStatus.NOT_READY),
protoHealthMock(LightpushClientProtocol, HealthStatus.NOT_READY),
]
let strength = initTable[WakuProtocol, int]()
let state = calculateConnectionState(protocols, strength, some(MockDLow))
check state == ConnectionStatus.Disconnected
test "PartiallyConnected, weak relay":
let weakCount = MockDLow - 1
let protocols = @[protoHealthMock(RelayProtocol, HealthStatus.READY)]
var strength = initTable[WakuProtocol, int]()
strength[RelayProtocol] = weakCount
let state = calculateConnectionState(protocols, strength, some(MockDLow))
# Partially connected since relay connectivity is weak (> 0, but < dLow)
check state == ConnectionStatus.PartiallyConnected
test "Connected, robust relay":
let protocols = @[protoHealthMock(RelayProtocol, HealthStatus.READY)]
var strength = initTable[WakuProtocol, int]()
strength[RelayProtocol] = MockDLow
let state = calculateConnectionState(protocols, strength, some(MockDLow))
# Fully connected since relay connectivity is ideal (>= dLow)
check state == ConnectionStatus.Connected
test "Connected, robust edge":
let protocols = @[
protoHealthMock(RelayProtocol, HealthStatus.NOT_MOUNTED),
protoHealthMock(LightpushClientProtocol, HealthStatus.READY),
protoHealthMock(FilterClientProtocol, HealthStatus.READY),
protoHealthMock(StoreClientProtocol, HealthStatus.READY),
]
var strength = initTable[WakuProtocol, int]()
strength[LightpushClientProtocol] = HealthyThreshold
strength[FilterClientProtocol] = HealthyThreshold
strength[StoreClientProtocol] = HealthyThreshold
let state = calculateConnectionState(protocols, strength, some(MockDLow))
check state == ConnectionStatus.Connected
test "Disconnected, edge missing store":
let protocols = @[
protoHealthMock(LightpushClientProtocol, HealthStatus.READY),
protoHealthMock(FilterClientProtocol, HealthStatus.READY),
protoHealthMock(StoreClientProtocol, HealthStatus.NOT_READY),
]
var strength = initTable[WakuProtocol, int]()
strength[LightpushClientProtocol] = HealthyThreshold
strength[FilterClientProtocol] = HealthyThreshold
strength[StoreClientProtocol] = 0
let state = calculateConnectionState(protocols, strength, some(MockDLow))
check state == ConnectionStatus.Disconnected
test "PartiallyConnected, edge meets minimum failover requirement":
let weakCount = max(1, HealthyThreshold - 1)
let protocols = @[
protoHealthMock(LightpushClientProtocol, HealthStatus.READY),
protoHealthMock(FilterClientProtocol, HealthStatus.READY),
protoHealthMock(StoreClientProtocol, HealthStatus.READY),
]
var strength = initTable[WakuProtocol, int]()
strength[LightpushClientProtocol] = weakCount
strength[FilterClientProtocol] = weakCount
strength[StoreClientProtocol] = weakCount
let state = calculateConnectionState(protocols, strength, some(MockDLow))
check state == ConnectionStatus.PartiallyConnected
test "Connected, robust relay ignores store server":
let protocols = @[
protoHealthMock(RelayProtocol, HealthStatus.READY),
protoHealthMock(StoreProtocol, HealthStatus.READY),
]
var strength = initTable[WakuProtocol, int]()
strength[RelayProtocol] = MockDLow
strength[StoreProtocol] = 0
let state = calculateConnectionState(protocols, strength, some(MockDLow))
check state == ConnectionStatus.Connected
test "Connected, robust relay ignores store client":
let protocols = @[
protoHealthMock(RelayProtocol, HealthStatus.READY),
protoHealthMock(StoreProtocol, HealthStatus.READY),
protoHealthMock(StoreClientProtocol, HealthStatus.NOT_READY),
]
var strength = initTable[WakuProtocol, int]()
strength[RelayProtocol] = MockDLow
strength[StoreProtocol] = 0
strength[StoreClientProtocol] = 0
let state = calculateConnectionState(protocols, strength, some(MockDLow))
check state == ConnectionStatus.Connected
suite "Health Monitor - events":
asyncTest "Core (relay) health update":
var nodeA: WakuNode
lockNewGlobalBrokerContext:
let nodeAKey = generateSecp256k1Key()
nodeA = newTestWakuNode(nodeAKey, parseIpAddress("127.0.0.1"), Port(0))
(await nodeA.mountRelay()).expect("Node A failed to mount Relay")
await nodeA.start()
let monitorA = NodeHealthMonitor.new(nodeA)
var
lastStatus = ConnectionStatus.Disconnected
callbackCount = 0
healthChangeSignal = newAsyncEvent()
monitorA.onConnectionStatusChange = proc(status: ConnectionStatus) {.async.} =
lastStatus = status
callbackCount.inc()
healthChangeSignal.fire()
monitorA.startHealthMonitor().expect("Health monitor failed to start")
var nodeB: WakuNode
lockNewGlobalBrokerContext:
let nodeBKey = generateSecp256k1Key()
nodeB = newTestWakuNode(nodeBKey, parseIpAddress("127.0.0.1"), Port(0))
let driver = newSqliteArchiveDriver()
nodeB.mountArchive(driver).expect("Node B failed to mount archive")
(await nodeB.mountRelay()).expect("Node B failed to mount relay")
await nodeB.mountStore()
await nodeB.start()
await nodeA.connectToNodes(@[nodeB.switch.peerInfo.toRemotePeerInfo()])
proc dummyHandler(topic: PubsubTopic, msg: WakuMessage): Future[void] {.async.} =
discard
nodeA.subscribe((kind: PubsubSub, topic: DefaultPubsubTopic), dummyHandler).expect(
"Node A failed to subscribe"
)
nodeB.subscribe((kind: PubsubSub, topic: DefaultPubsubTopic), dummyHandler).expect(
"Node B failed to subscribe"
)
let connectTimeLimit = Moment.now() + TestConnectivityTimeLimit
var gotConnected = false
while Moment.now() < connectTimeLimit:
if lastStatus == ConnectionStatus.PartiallyConnected:
gotConnected = true
break
if await healthChangeSignal.wait().withTimeout(connectTimeLimit - Moment.now()):
healthChangeSignal.clear()
check:
gotConnected == true
callbackCount >= 1
lastStatus == ConnectionStatus.PartiallyConnected
healthChangeSignal.clear()
await nodeB.stop()
await nodeA.disconnectNode(nodeB.switch.peerInfo.toRemotePeerInfo())
let disconnectTimeLimit = Moment.now() + TestConnectivityTimeLimit
var gotDisconnected = false
while Moment.now() < disconnectTimeLimit:
if lastStatus == ConnectionStatus.Disconnected:
gotDisconnected = true
break
if await healthChangeSignal.wait().withTimeout(disconnectTimeLimit - Moment.now()):
healthChangeSignal.clear()
check:
gotDisconnected == true
await monitorA.stopHealthMonitor()
await nodeA.stop()
asyncTest "Edge (light client) health update":
var nodeA: WakuNode
lockNewGlobalBrokerContext:
let nodeAKey = generateSecp256k1Key()
nodeA = newTestWakuNode(nodeAKey, parseIpAddress("127.0.0.1"), Port(0))
nodeA.mountLightpushClient()
await nodeA.mountFilterClient()
nodeA.mountStoreClient()
require nodeA.mountAutoSharding(1, 8).isOk
nodeA.mountMetadata(1, @[0'u16]).expect("Node A failed to mount metadata")
await nodeA.start()
let ds = MessagingClient
.new(MessagingClientConf(useP2PReliability: false), nodeA)
.expect("Failed to create MessagingClient")
ds.start().expect("Failed to start MessagingClient")
let monitorA = NodeHealthMonitor.new(nodeA)
var
lastStatus = ConnectionStatus.Disconnected
callbackCount = 0
healthChangeSignal = newAsyncEvent()
monitorA.onConnectionStatusChange = proc(status: ConnectionStatus) {.async.} =
lastStatus = status
callbackCount.inc()
healthChangeSignal.fire()
monitorA.startHealthMonitor().expect("Health monitor failed to start")
var nodeB: WakuNode
lockNewGlobalBrokerContext:
let nodeBKey = generateSecp256k1Key()
nodeB = newTestWakuNode(nodeBKey, parseIpAddress("127.0.0.1"), Port(0))
let driver = newSqliteArchiveDriver()
nodeB.mountArchive(driver).expect("Node B failed to mount archive")
(await nodeB.mountRelay()).expect("Node B failed to mount relay")
(await nodeB.mountLightpush()).expect("Node B failed to mount lightpush")
await nodeB.mountFilter()
await nodeB.mountStore()
require nodeB.mountAutoSharding(1, 8).isOk
nodeB.mountMetadata(1, toSeq(0'u16 ..< 8'u16)).expect(
"Node B failed to mount metadata"
)
await nodeB.start()
var metadataFut = newFuture[void]("waitForMetadata")
let metadataLis = WakuPeerEvent
.listen(
nodeA.brokerCtx,
proc(evt: WakuPeerEvent): Future[void] {.async: (raises: []), gcsafe.} =
if not metadataFut.finished and
evt.kind == WakuPeerEventKind.EventMetadataUpdated:
metadataFut.complete()
,
)
.expect("Failed to listen for metadata")
await nodeA.connectToNodes(@[nodeB.switch.peerInfo.toRemotePeerInfo()])
let metadataOk = await metadataFut.withTimeout(TestConnectivityTimeLimit)
await WakuPeerEvent.dropListener(nodeA.brokerCtx, metadataLis)
require metadataOk
let connectTimeLimit = Moment.now() + TestConnectivityTimeLimit
var gotConnected = false
while Moment.now() < connectTimeLimit:
if lastStatus == ConnectionStatus.PartiallyConnected:
gotConnected = true
break
if await healthChangeSignal.wait().withTimeout(connectTimeLimit - Moment.now()):
healthChangeSignal.clear()
check:
gotConnected == true
callbackCount >= 1
lastStatus == ConnectionStatus.PartiallyConnected
healthChangeSignal.clear()
await nodeB.stop()
await nodeA.disconnectNode(nodeB.switch.peerInfo.toRemotePeerInfo())
let disconnectTimeLimit = Moment.now() + TestConnectivityTimeLimit
var gotDisconnected = false
while Moment.now() < disconnectTimeLimit:
if lastStatus == ConnectionStatus.Disconnected:
gotDisconnected = true
break
if await healthChangeSignal.wait().withTimeout(disconnectTimeLimit - Moment.now()):
healthChangeSignal.clear()
check:
gotDisconnected == true
lastStatus == ConnectionStatus.Disconnected
await monitorA.stopHealthMonitor()
await ds.stop()
await nodeA.stop()
asyncTest "Edge health driven by confirmed filter subscriptions":
var nodeA: WakuNode
lockNewGlobalBrokerContext:
let nodeAKey = generateSecp256k1Key()
nodeA = newTestWakuNode(nodeAKey, parseIpAddress("127.0.0.1"), Port(0))
await nodeA.mountFilterClient()
nodeA.mountLightpushClient()
nodeA.mountStoreClient()
require nodeA.mountAutoSharding(1, 8).isOk
nodeA.mountMetadata(1, @[0'u16]).expect("Node A failed to mount metadata")
await nodeA.start()
let ds = MessagingClient
.new(MessagingClientConf(useP2PReliability: false), nodeA)
.expect("Failed to create MessagingClient")
ds.start().expect("Failed to start MessagingClient")
let subMgr = nodeA.subscriptionManager
var nodeB: WakuNode
lockNewGlobalBrokerContext:
let nodeBKey = generateSecp256k1Key()
nodeB = newTestWakuNode(nodeBKey, parseIpAddress("127.0.0.1"), Port(0))
let driver = newSqliteArchiveDriver()
nodeB.mountArchive(driver).expect("Node B failed to mount archive")
(await nodeB.mountRelay()).expect("Node B failed to mount relay")
(await nodeB.mountLightpush()).expect("Node B failed to mount lightpush")
await nodeB.mountFilter()
await nodeB.mountStore()
require nodeB.mountAutoSharding(1, 8).isOk
nodeB.mountMetadata(1, toSeq(0'u16 ..< 8'u16)).expect(
"Node B failed to mount metadata"
)
await nodeB.start()
let monitorA = NodeHealthMonitor.new(nodeA)
var
lastStatus = ConnectionStatus.Disconnected
healthSignal = newAsyncEvent()
monitorA.onConnectionStatusChange = proc(status: ConnectionStatus) {.async.} =
lastStatus = status
healthSignal.fire()
monitorA.startHealthMonitor().expect("Health monitor failed to start")
var metadataFut = newFuture[void]("waitForMetadata")
let metadataLis = WakuPeerEvent
.listen(
nodeA.brokerCtx,
proc(evt: WakuPeerEvent): Future[void] {.async: (raises: []), gcsafe.} =
if not metadataFut.finished and
evt.kind == WakuPeerEventKind.EventMetadataUpdated:
metadataFut.complete()
,
)
.expect("Failed to listen for metadata")
await nodeA.connectToNodes(@[nodeB.switch.peerInfo.toRemotePeerInfo()])
let metadataOk = await metadataFut.withTimeout(TestConnectivityTimeLimit)
await WakuPeerEvent.dropListener(nodeA.brokerCtx, metadataLis)
require metadataOk
var deadline = Moment.now() + TestConnectivityTimeLimit
while Moment.now() < deadline:
if lastStatus == ConnectionStatus.PartiallyConnected:
break
if await healthSignal.wait().withTimeout(deadline - Moment.now()):
healthSignal.clear()
check lastStatus == ConnectionStatus.PartiallyConnected
var shardHealthFut = newFuture[EventShardTopicHealthChange]("waitForShardHealth")
let shardHealthLis = EventShardTopicHealthChange
.listen(
nodeA.brokerCtx,
proc(
evt: EventShardTopicHealthChange
): Future[void] {.async: (raises: []), gcsafe.} =
if not shardHealthFut.finished and (
evt.health == TopicHealth.MINIMALLY_HEALTHY or
evt.health == TopicHealth.SUFFICIENTLY_HEALTHY
):
shardHealthFut.complete(evt)
,
)
.expect("Failed to listen for shard health")
let contentTopic = ContentTopic("/waku/2/default-content/proto")
subMgr.subscribe(contentTopic).expect("Failed to subscribe")
let shardHealthOk = await shardHealthFut.withTimeout(TestConnectivityTimeLimit)
await EventShardTopicHealthChange.dropListener(nodeA.brokerCtx, shardHealthLis)
check shardHealthOk == true
check nodeA.subscriptionManager.edgeFilterSubStates.len > 0
healthSignal.clear()
deadline = Moment.now() + TestConnectivityTimeLimit
while Moment.now() < deadline:
if lastStatus == ConnectionStatus.PartiallyConnected:
break
if await healthSignal.wait().withTimeout(deadline - Moment.now()):
healthSignal.clear()
check lastStatus == ConnectionStatus.PartiallyConnected
await ds.stop()
await monitorA.stopHealthMonitor()
await nodeB.stop()
await nodeA.stop()