From 6ecfa430228d9205199baeedf9af7b1cbeafd799 Mon Sep 17 00:00:00 2001 From: osmaczko <33099791+osmaczko@users.noreply.github.com> Date: Thu, 14 May 2026 21:25:50 +0200 Subject: [PATCH] chore: add adr for client-event-system --- docs/adr/0001-client-event-system.md | 285 +++++++++++++++++++++++++++ 1 file changed, 285 insertions(+) create mode 100644 docs/adr/0001-client-event-system.md diff --git a/docs/adr/0001-client-event-system.md b/docs/adr/0001-client-event-system.md new file mode 100644 index 0000000..ed8514e --- /dev/null +++ b/docs/adr/0001-client-event-system.md @@ -0,0 +1,285 @@ +# Client Event System + +| Field | Value | +|---|---| +| Status | Proposed (draft for review) | +| Issue | https://github.com/logos-messaging/libchat/issues/97 | +| Date | 2026-05-14 | + +## Context and Problem + +Applications currently learn about new conversations from an `is_new_convo: bool` flag on `ContentData` (`core/conversations/src/types.rs:16-20`). Two problems: + +1. The flag overloads `ContentData`: protocol metadata is smuggled through a content carrier. +2. The flag assumes every new conversation carries an initial content frame. Protocols such as MLS allow a conversation to begin without one; in that case `handle_payload` returns `None` and the application never observes the new conversation. + +Issue #97 calls for a proper event system that can signal new conversations, delivery receipts, and reliability failures — without piggy-backing on content — and that provides a clear path for adding new event types later. + +This ADR proposes a layered design and presents the per-layer options. + +## Decision Drivers + +- **Simplicity of the core.** Fully synchronous and caller-driven: no background work, no callbacks out, no side effects beyond storage I/O. +- **Extensibility.** A new event type is a localised change (one enum variant, one emit site) that does not break existing consumers. +- **FFI compatibility.** Must remain expressible through the existing `safer-ffi` boundary in `crates/client-ffi`. + +Two open questions affect the options below: async runtime at the client layer, and consumer model. See [Open Questions](#open-questions). + +## Architecture + +The library is organised in three layers. Calls flow downward; events flow upward. + +```mermaid +flowchart TB + A["app
UI/UX layer
drives the event loop"] + B["client
convenience wrapper
may run background threads"] + C["core
strict sync, caller-driven"] + + A -- "method calls" --> B + B -- "method calls" --> C + C -.->|"events (from method returns)"| B + B -.->|"events (sync + background)"| A +``` + +Crates: **app** — `bin/chat-cli`, future `logos-chat-module`; **client** — `crates/client`, `crates/client-ffi`; **core** — `core/conversations` and friends in libchat. + +## Considered Options + +### Core layer + +#### Constraints + +- Strict sync, single-threaded. +- No background work, timers, or internal queues. +- Every state mutation is the direct, traceable result of a caller-invoked method. + +#### Approach + +Methods that surface state changes return their events directly: + +```rust +impl Context { + pub fn handle_payload(&mut self, payload: &[u8]) + -> Result; + + pub fn send_content(&mut self, convo: ConversationId, content: &[u8]) + -> Result; + + pub fn create_private_convo(&mut self, intro: &Introduction, content: &[u8]) + -> Result; +} + +pub struct HandlePayloadOutput { + pub events: Vec, // observations for the app to surface (e.g. MessageReceived) + pub outbound: Vec, // responses for the client to dispatch (e.g. acks) +} +``` + +Returning both events and outbound envelopes from the same struct keeps every side effect of a core method visible in its return type. The client layer dispatches the envelopes and surfaces the events upward; the core itself performs no I/O beyond caller-initiated storage access. + +### Client layer + +#### Constraints + +- May spawn background threads (e.g. for timer-driven retries). +- Background threads emit events that no caller-invoked method can return — for example `DeliveryFailed { reason: Timeout }`. +- Events from synchronous calls flow through the method's return type, inherited from the core. + +#### Common shape (all options) + +The client mirrors the core's named-output-struct style. Outbound envelopes produced by the core are dispatched internally by the client through its `DeliveryService`; only events are surfaced to the application. + +```rust +impl ChatClient { + pub fn receive(&mut self, payload: &[u8]) + -> Result>; // events from this payload + pub fn send_message(&mut self, convo: &ConversationIdOwned, content: &[u8]) + -> Result>; // sync events from this send + + // Background events are delivered via one of the three mechanisms below. +} +``` + +The three options differ only in how background events reach the application. + +#### Option A — internal poll queue + +The client owns a `Mutex>`. Background threads push to it; the application drains via two new methods. + +```rust +impl ChatClient { + pub fn poll_event(&mut self) -> Option; + pub fn drain_events(&mut self) -> Vec; +} +``` + +Prior art: mio's `Events` (per-`Poll` instance, drained by the caller); rdkafka's `Consumer::poll` (background thread fills a queue, caller polls — same domain). + +**Pros** + +- Single primitive (mutex-protected queue) with no new dependencies. +- FFI mapping is direct: `client_poll_event` returns an opaque `Option`, mirroring the existing `PushInboundResult` shape (`crates/client-ffi/src/api.rs:49-55`). +- Matches the existing chat-cli tick-loop consumer pattern (`bin/chat-cli/src/app.rs:144-180`). + +**Cons** + +- Requires the application to drain after every operation; events accumulate if it forgets. +- Adds shared mutable state (`Mutex`) inside the client; the queue must be bounded with explicit overflow handling. + +#### Option B — channel handed to the caller + +The client's constructor returns a `Receiver` alongside the client handle. Background threads hold a `Sender` clone; the application reads from the receiver. + +```rust +let (client, events): (ChatClient<_>, Receiver) = + ChatClient::new(name, delivery); +``` + +Prior art: most Rust networking libraries; `std::sync::mpsc`, `crossbeam-channel`, `flume`. + +**Pros** + +- Channels are the canonical multi-producer/single-consumer primitive in the standard library; the shape is idiomatic in pure Rust. +- The application can park in `recv()` from a worker thread, integrate with `select!`, or later swap to `tokio::sync::mpsc` for an async wrapper. +- Mirrors the inbound-bytes channel chat-cli already uses (`bin/chat-cli/src/app.rs:46`). + +**Cons** + +- `Receiver` is not `#[repr(C)]` and cannot cross `safer-ffi` cleanly. The FFI layer must expose a drain function regardless, collapsing Option B into Option A at the boundary. +- Forces a channel-crate choice (`std::sync::mpsc`, `crossbeam-channel`, or `flume`). + +#### Option C — callback registered at construction + +The application registers a closure at construction; background threads invoke it directly when events arise. + +```rust +type EventFn = Box; + +impl ChatClient { + pub fn new(name: &str, delivery: D, on_event: EventFn) -> Self; +} +``` + +Prior art: the existing FFI `DeliverFn` callback at `client_create` (`crates/client-ffi/src/delivery.rs:8-15`); `tracing::Subscriber`; GTK signals. + +**Pros** + +- The codebase already establishes this pattern for outbound delivery; events would extend a familiar contract. +- FFI mapping is direct: register an `EventFn` function pointer at `client_create`. +- No internal queue or `Mutex` to maintain. + +**Cons** + +- The callback fires on the background thread. UI-style consumers (ratatui, GUI toolkits) cannot update state from threads other than the main loop thread and will bridge the callback into a thread-local queue — effectively re-implementing Option A in user code. +- The closure must be `Send + 'static`; capturing application state requires `Arc>` or a channel back to the application. +- Sync events arrive on the caller's thread; background events arrive on the background thread. The handler must be correct in both threading contexts, or the callback must forward to the main thread (collapsing into Option A). + +#### Comparison + +| Criterion | A: poll queue | B: channel | C: callback | +|---|---|---|---| +| Background events delivered via | `poll_event` / `drain_events` | `Receiver` | direct `Fn(&Event)` invocation | +| FFI fit (`safer-ffi`) | Native opaque + accessors | Degrades to Option A at the boundary | Native function pointer (matches `DeliverFn`) | +| New dependencies | None | None (with `std::sync::mpsc`); otherwise `crossbeam-channel` or `flume` | None | +| Internal state required | `Mutex>` | Channel internals | None | +| Thread on which the application observes the event | Application thread (next drain) | Application thread (next drain) | Background thread | +| Bridges naturally to UI thread | Yes | Yes | No (requires re-bridging) | +| Backpressure if the application is slow | Client-side queue buffers; bounded with overflow handling | Channel buffers; bound configurable | No buffer; slow callbacks block the background thread | +| Future `Stream` adapter | Wrap `poll_event` in a `Stream` | Swap to async channel (native) | Bridge callback into a channel, then `Stream` | + +### App layer + +The application drives the loop. For all three client options, integration follows the existing chat-cli pattern: one additional drain per tick. + +Sketch for Option A: + +```rust +pub fn tick(&mut self) -> Result<()> { + while let Ok(bytes) = self.inbound.try_recv() { + for event in self.client.receive(&bytes)?.events { + self.handle_event(event); + } + } + for event in self.client.drain_events() { + self.handle_event(event); + } + Ok(()) +} +``` + +Option B replaces the second drain with `for event in self.events.try_iter()`. Option C moves the background-event drain out of the tick — into the callback — and the callback typically forwards into an application-side channel that is drained on each tick anyway. + +## Event Taxonomy + +The same `Event` enum is shared across all three client options. + +```rust +#[derive(Debug, Clone)] +#[non_exhaustive] +pub enum Event { + #[non_exhaustive] + ConversationStarted { + conversation_id: ConversationIdOwned, + }, + #[non_exhaustive] + MessageReceived { + conversation_id: ConversationIdOwned, + data: Vec, + }, + #[non_exhaustive] + DeliveryReceipt { + conversation_id: ConversationIdOwned, + envelope_id: EnvelopeId, + }, + #[non_exhaustive] + DeliveryFailed { + conversation_id: ConversationIdOwned, + envelope_id: EnvelopeId, + reason: FailureReason, + }, +} + +#[derive(Debug, Clone)] +#[non_exhaustive] +pub enum FailureReason { + Transport, // synchronous transport error on publish + PeerRejected, // peer signalled rejection (future protocol work) + Timeout, // no receipt within the retry window +} +``` + +`#[non_exhaustive]` on the enum permits new variants; on each struct variant it permits new fields. Both are additive minor-release changes. Future variants (`ConversationRekeyed`, `ParticipantJoined`, `PresenceChanged`, transport health, key-rotation reminders, …) follow this rule. + +Mapping of variants to emit sites: + +| Variant | Emitted from | +|---|---| +| `ConversationStarted` (responder side) | `core/conversations/src/inbox/handler.rs:155-162` (replaces `is_new_convo: true`) | +| `MessageReceived` | `core/conversations/src/conversation/privatev1.rs:184-191` (replaces `is_new_convo: false`) | +| `DeliveryReceipt` | `Context::handle_payload` when decoding a `PrivateV1Frame::Receipt` (future protocol work) | +| `DeliveryFailed { Transport }` | `ChatClient::dispatch_all` (`crates/client/src/client.rs:84-92`) on `delivery.publish` error | +| `DeliveryFailed { Timeout }` | client's background retry thread | + +The initiator side does not emit `ConversationStarted`: `create_conversation` returns the new `ConversationIdOwned` directly. + +## Open Questions + +1. **Sync vs async at the client layer.** The core stays sync. The client could adopt an async runtime (e.g. `tokio`) without changing the option set, but each option's natural shape changes: Option A → `Stream` over a notify primitive; Option B → `tokio::sync::mpsc::Receiver` with an `impl Stream` shape; Option C → `async fn` callback. + +2. **Consumer pattern assumed.** Different consumer archetypes favour different shapes: a polling UI loop suits Option A; a worker thread that blocks on `recv` suits Option B; a low-latency or push-driven consumer (toast notifications, daemons) suits Option C. Pick one — supporting multiple shapes is a maintenance burden. + +3. **Does the client absorb transport polling?** Today the application drives transport polling and feeds bytes into `client.receive` (`bin/chat-cli/src/app.rs:46-87, 144-180`). The client could absorb this as an additional background thread, in which case `DeliveryService` would become bidirectional and the application would consume events instead of bytes. Orthogonal to the event-system shape, but reshapes the application-layer contract. + +## References + +### Source references + +- `core/conversations/src/types.rs:9-20` — current `ContentData` and `AddressedEnvelope` +- `core/conversations/src/context.rs:138-185` — `Context::handle_payload` (core inbound entry) +- `core/conversations/src/inbox/handler.rs:124-167` — inbox handshake handler (current `is_new_convo` set site) +- `core/conversations/src/conversation/privatev1.rs:184-191, 219-260` — private-conversation handler +- `crates/client/src/client.rs:60-92` — `ChatClient` public surface +- `crates/client/src/delivery.rs` — `DeliveryService` trait +- `crates/client-ffi/src/api.rs:49-55, 220-285` — current FFI inbound result shape +- `crates/client-ffi/src/delivery.rs:8-15` — existing FFI callback pattern (`DeliverFn`) +- `bin/chat-cli/src/app.rs:46, 144-180` — current application consumption pattern