Adds backgroundMode to Sub and SetBackgroundMode(bool) to both Sub and
FilterManager. When background=true, subscriptionLoop skips the 5-second
health-check ticker and drops expired subscription IDs from the closing
channel without resubscribing. When background=false (foreground return),
a resubscription is immediately triggered for any expired filters.
Background context (status-im/status-app#21045):
On Android, each Waku filter subscription has a relay-side TTL (~13.5 min
observed). When it expires, the closing channel fires, checkAndResubscribe
runs, and a new wf.Subscribe() RPC wakes the LTE modem. With a loaded
account this happens every ~13.5 min overnight, producing a 55% radio
duty cycle (~144 mAh/hr) while the screen is locked.
With background mode active, no network I/O occurs during subscription
expiry. On foreground return, all expired filters are resubscribed in
one burst — the user sees a brief reconnect, then full message delivery.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sustained subscribe failures saturated CPU, leaked 600+ subscriptionLoop
goroutines, and twice panicked with `strings: Join output length overflow`.
Five independent issues:
- api/filter: errcnt budget was gated on `possibleRecursiveError`, which
matched only `ErrNoPeersAvailable` / `swarm.ErrDialBackoff`. The dominant
error class never incremented errcnt, so the 3-error-per-5s budget was
dead code. Replaced gate with `shouldIncrementErrCnt(err)`: counts every
non-nil error.
- protocol/filter: WakuFilterLightNode.Subscribe flattened per-peer errors
via `fmt.Errorf+strings.Join`, losing typed *FilterError and growing
unboundedly. Replaced with typed `*SubscribeError` (PeerID, ContentTopics,
Err) plus `HasRateLimitError()`; `Error()` is hard-capped. Concurrent
per-peer appends now mutex-guarded.
- api/filter: 60-s rate-limit backoff on `*SubscribeError.HasRateLimitError()`.
`shouldHonourRateLimitBackoff(rateLimitedUntil, now)` gates ticker push and
closing-channel checkAndResubscribe. Cleared on subscribe success.
- api/filter: FilterManager.waitingToSubQueue was a cap-100 chan written and
drained under the same lock, deadlocking the manager once full. Replaced
with mutex-guarded slice.
- api/filter: Sub.cleanup closed DataCh while multiplex forwarders could
still be sending. Added multiplexWG awaited in cleanup; forwarder send is
in a select with apiSub.ctx.Done() so it can't deadlock when
subDetails.C is never closed (node-stop transitions).
Tests (all under -race):
- TestSub_CleanupRaceWithMultiplex (50 iter)
- TestSub_CleanupDoesNotDeadlockWhenSubChannelStaysOpen
- TestFilterManager_SubscribeFilter_DoesNotDeadlockWhenQueueFull
- TestShouldIncrementErrCnt