logos-messaging-interop-tests/tests/wrappers_tests/test_send_errors_and_concurrency.py
AYAHASSAN287 4ba30a8aff
e2e part3 (#181)
* add test s17

* Add temp changes

* Add s17 positive / negative scenarios

* add S19

* Add S06 relay-only test and fix wrapper helpers (#173)

* - Add S06 relay-only test case for testing message propagation without a store.
- Update `wrapper_helpers` for clearer event type handling and type annotations (`Optional[...]` usage).
- Simplify `get_node_multiaddr` to retrieve addresses via `get_node_info_raw`.
- Refactor `wrappers_manager` to adjust bindings path to `vendor` directory and add `get_node_info_raw` method.
- Update `.gitignore` to exclude `store.sqlite3*`.

* Refactor S06 relay-only test: replace try-finally blocks with context managers for clarity and conciseness.

* Migrate S06 relay-only test to `test_send_e2e.py` and refactor with `StepsCommon` for reusability.

---------

Co-authored-by: Egor Rachkovskii <egorrachkovskii@status.im>

* Modify S19 test

* Adding S21

* Fix review comments

* Adding S22/S23

* Adding S24

* Add S26

* Add S30

* Add S31

* Improve `wait_for_event` loop logic and add `assert_event_invariants` helper (#178)

- Refactored the `wait_for_event` function for clarity and to ensure proper deadline handling within the loop.
- Introduced `assert_event_invariants` to validate per-request event properties, enforcing invariants like correct `requestId`, no duplicate terminal events, and proper timing between `Propagated` and `Sent`.
- Added tests for `assert_event_invariants` enforcement in `S14` and `S15` lightpush scenarios.

Co-authored-by: Egor Rachkovskii <egorrachkovskii@status.im>

* Add S07 and S10 send API tests with event invariants helper  (#176)

* Add `assert_event_invariants` to enforce per-request event constraints and integrate into relevant tests

* Integrate `assert_event_invariants` into edge and store tests

* Remove redundant comments from `test_send_e2e.py`

---------

Co-authored-by: Egor Rachkovskii <egorrachkovskii@status.im>

* Fix some tests

* Add S02/S12 send API tests and PR CI pipeline (#174)

* Add tests for auto-subscribe on first send and isolated sender with no peers

* Add PR CI workflow with tiered test strategy

- pr_tests.yml: build job with cache, wrapper-tests, smoke-tests,
  and label-triggered full-suite
- test_common.yml: add deploy_allure/send_discord inputs so PR runs
  skip reporting side effects
- Add docker_required marker to S19 (needs Docker, excluded from
  wrapper-only CI job)
- Register docker_required marker in pytest.ini

* Document PR CI test workflows in README

* Refine PR CI test strategy:
- Exclude `docker_required` tests from smoke set in `pr_tests.yml`.
- Add `wait_for_connected` helper for connection state checks.
- Update S19 test to dynamically create and clean up the store node setup.
- General simplifications and improved test stability.

* Add `wait_for_connected` assertion to ensure sender connection state before propagation test

* Refine tests and CI workflows:
- Replace `ERROR_TIMEOUT_S` with `ERROR_AFTER_CACHE_EXPIRY_TIMEOUT_S` in `test_send_e2e.py`.
- Adjust timeout assertion for better clarity and accuracy.
- Update `pr_tests.yml` to add retries (`--reruns`) and ignore wrapper tests in smoke tests.
- Change `test_common.yml` default Discord reporting to `false`.

* Normalize `portsshift` to `portsShift` in `test_send_e2e.py` configuration definitions.

---------

Co-authored-by: Egor Rachkovskii <egorrachkovskii@status.im>

* Add relay-to-lightpush fallback integration tests (S08/S09) (#180)

Co-authored-by: Egor Rachkovskii <egorrachkovskii@status.im>

* Ignore S19

* fix s26

* Ignore s20 / s31 for errors

* Change image name

* fix xfail syntax error

* rename test file

* FIx flaky tests

* comment the skipped tests

* Fix review comments

* revert tag in yml in latest

* commenting lightpush

* Modify the PR

* Fix the ports conflict

* Modify S20

* fix portsshift option

* remove the /true from yml to allow errors to exist

* Modify the yml to continue on error

* First set of review comments

* adding xfail mark for failed tests

* address review comments about xfail

* cleanup unused lines

* event collector fix

* Address review comment about delay constant

* fix the timeout review comment

* Add assert_event_invariants

* enhance comment on S26 test

* mark the waku tests as docker_required

* Add S01

* add S01 second scenario

* Add S03

* Add S04

* Adding S11

* modify s11 scenario to pass

* Adding test S05

* Adding the new tests in part3 file

* Fix the yml file error

* Add the new test file to the PR job

* bump logos-delivery-python-bindings to include destroy_keep_ctx

* modify the S01 test

* mark S01 with xfail

* mark the second S01 test as xfail too

* use skip instead of xfail

* comment the skip line to try S01 again

* restore the xfail mark again

* remove the wrapped text code from test file

* Changing  the test files names

* skip S01 again

* removed extra comments

* Update logos-delivery-python-bindings submodule

---------

Co-authored-by: Egor Rachkovskii <32649334+at0m1x19@users.noreply.github.com>
Co-authored-by: Egor Rachkovskii <egorrachkovskii@status.im>
2026-05-14 14:48:14 +02:00

446 lines
19 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

from concurrent.futures import ThreadPoolExecutor
import pytest
from src.env_vars import NODE_2
from src.steps.common import StepsCommon
from src.steps.store import StepsStore
from src.libs.common import delay, to_base64
from src.libs.custom_logger import get_custom_logger
from src.node.waku_node import WakuNode
from src.node.wrappers_manager import WrapperManager
from src.node.wrapper_helpers import (
EventCollector,
assert_event_invariants,
create_message_bindings,
get_node_multiaddr,
wait_for_propagated,
wait_for_sent,
wait_for_error,
)
logger = get_custom_logger(__name__)
PROPAGATED_TIMEOUT_S = 30.0
RECOVERY_TIMEOUT_S = 45.0
# MaxTimeInCache from send_service.nim.
MAX_TIME_IN_CACHE_S = 60.0
# Extra slack to cover the background retry loop tick after the window expires.
CACHE_EXPIRY_SLACK_S = 10.0
ERROR_AFTER_CACHE_EXPIRY_TIMEOUT_S = MAX_TIME_IN_CACHE_S + CACHE_EXPIRY_SLACK_S
RETRY_WINDOW_EXPIRED_MSG = "Unable to send within retry time window"
# S30: concurrent sends on the same content topic during initial auto-subscribe.
S30_CONCURRENT_SENDS = 5
S30_CONTENT_TOPIC = "/test/1/s30-concurrent/proto"
# S31: concurrent sends across mixed topics during peer churn.
S31_BURST_SIZE = 8
S31_CONTENT_TOPICS = [
"/test/1/s31-topic-a/proto",
"/test/1/s31-topic-b/proto",
"/test/1/s31-topic-c/proto",
"/test/1/s31-topic-d/proto",
"/test/1/s31-topic-e/proto",
"/test/1/s31-topic-f/proto",
"/test/1/s31-topic-g/proto",
"/test/1/s31-topic-h/proto",
]
class TestS12IsolatedSenderNoPeers(StepsCommon):
"""
S12 — Isolated sender, no peers.
Sender has relay enabled but zero relay peers and zero lightpush peers.
Expected: send() returns Ok(RequestId), but eventually a message_error
event arrives (no route to propagate).
"""
def test_s12_send_with_no_peers_produces_error(self, node_config):
sender_collector = EventCollector()
node_config.update(
{
"relay": True,
"store": False,
"lightpush": False,
"filter": False,
"discv5Discovery": False,
"numShardsInNetwork": 1,
}
)
sender_result = WrapperManager.create_and_start(
config=node_config,
event_cb=sender_collector.event_callback,
)
assert sender_result.is_ok(), f"Failed to start sender: {sender_result.err()}"
with sender_result.ok_value as sender:
message = create_message_bindings(
payload=to_base64("S12 isolated sender payload"),
contentTopic="/test/1/s12-isolated/proto",
)
send_result = sender.send_message(message=message)
assert send_result.is_ok(), f"send() must return Ok(RequestId) even with no peers, got: {send_result.err()}"
request_id = send_result.ok_value
assert request_id, "send() returned an empty RequestId"
error = wait_for_error(
collector=sender_collector,
request_id=request_id,
timeout_s=ERROR_AFTER_CACHE_EXPIRY_TIMEOUT_S,
)
assert error is not None, (
f"No message_error event within {ERROR_AFTER_CACHE_EXPIRY_TIMEOUT_S}s "
f"(MaxTimeInCache={MAX_TIME_IN_CACHE_S}s + slack) for isolated sender. "
f"Collected events: {sender_collector.events}"
)
assert error["requestId"] == request_id
propagated = wait_for_propagated(sender_collector, request_id, timeout_s=0)
assert propagated is None, f"Unexpected message_propagated event for isolated sender: {propagated}"
class TestS21ErrorWhenRetryWindowExpires(StepsCommon):
"""
S21: delivery retry window expires before any valid path recovers.
"""
def test_s21_error_when_retry_window_expires(self, node_config):
sender_collector = EventCollector()
node_config.update(
{
"relay": True,
"store": False,
"lightpush": False,
"filter": False,
"discv5Discovery": False,
"numShardsInNetwork": 1,
}
)
sender_result = WrapperManager.create_and_start(
config=node_config,
event_cb=sender_collector.event_callback,
)
assert sender_result.is_ok(), f"Failed to start sender: {sender_result.err()}"
with sender_result.ok_value as sender_node:
message = create_message_bindings()
send_result = sender_node.send_message(message=message)
assert send_result.is_ok(), f"send() must return Ok(RequestId) even with no peers, got: {send_result.err()}"
request_id = send_result.ok_value
assert request_id, "send() returned an empty RequestId"
# No peer
error_event = wait_for_error(
collector=sender_collector,
request_id=request_id,
timeout_s=ERROR_AFTER_CACHE_EXPIRY_TIMEOUT_S,
)
assert error_event is not None, (
f"No MessageErrorEvent received within {ERROR_AFTER_CACHE_EXPIRY_TIMEOUT_S}s "
f"(MaxTimeInCache={MAX_TIME_IN_CACHE_S}s + slack). "
f"Collected events: {sender_collector.events}"
)
logger.info(f"S21 received error event: {error_event}")
assert error_event.get("error") == RETRY_WINDOW_EXPIRED_MSG, (
f"Unexpected error message in message_error event.\n"
f"Expected: {RETRY_WINDOW_EXPIRED_MSG!r}\n"
f"Got: {error_event.get('error')!r}\n"
f"Full event: {error_event}"
)
assert_event_invariants(sender_collector, request_id)
class TestS30ConcurrentSendsDuringAutoSubscribe(StepsCommon):
"""
S30: concurrent sends on the same content topic during initial auto-subscribe.
- Sender starts unsubscribed to the target topic.
- Several send() calls are issued at nearly the same time.
- Each call must return Ok(RequestId) with a unique id.
- Each request id must get its own propagated event,
with no dropped or cross-associated events.
"""
def test_s30_concurrent_sends_during_auto_subscribe(self, node_config):
sender_collector = EventCollector()
node_config.update(
{
"relay": True,
"store": False,
"discv5Discovery": False,
"numShardsInNetwork": 1,
}
)
sender_result = WrapperManager.create_and_start(
config=node_config,
event_cb=sender_collector.event_callback,
)
assert sender_result.is_ok(), f"Failed to start sender: {sender_result.err()}"
with sender_result.ok_value as sender_node:
# Relay peer so the sender has a propagation path.
relay_config = {
**node_config,
"staticnodes": [get_node_multiaddr(sender_node)],
"portsShift": 1,
}
relay_result = WrapperManager.create_and_start(config=relay_config)
assert relay_result.is_ok(), f"Failed to start relay peer: {relay_result.err()}"
with relay_result.ok_value:
# Build one message per send, with distinct payloads so we can
# detect any cross-association between request ids and events.
messages = [
create_message_bindings(
contentTopic=S30_CONTENT_TOPIC,
payload=to_base64(f"s30-concurrent-{i}"),
)
for i in range(S30_CONCURRENT_SENDS)
]
# Fire all sends concurrently. The sender is not yet subscribed
# to S30_CONTENT_TOPIC, so this exercises the auto-subscribe path
# under contention.
with ThreadPoolExecutor(max_workers=S30_CONCURRENT_SENDS) as pool:
send_results = list(pool.map(sender_node.send_message, messages))
# Every send must return Ok(RequestId).
request_ids = []
for i, send_result in enumerate(send_results):
assert send_result.is_ok(), f"Concurrent send #{i} failed: {send_result.err()}"
request_id = send_result.ok_value
assert request_id, f"Concurrent send #{i} returned an empty RequestId"
request_ids.append(request_id)
# Request ids must be unique across concurrent sends.
assert len(set(request_ids)) == len(request_ids), f"Duplicate RequestIds returned by concurrent sends: {request_ids}"
# Each request id must get its own propagated event and no error.
for request_id in request_ids:
propagated_event = wait_for_propagated(
collector=sender_collector,
request_id=request_id,
timeout_s=PROPAGATED_TIMEOUT_S,
)
assert propagated_event is not None, (
f"No MessagePropagatedEvent for request_id={request_id} "
f"within {PROPAGATED_TIMEOUT_S}s. "
f"Collected events: {sender_collector.events}"
)
error_event = wait_for_error(
collector=sender_collector,
request_id=request_id,
timeout_s=0,
)
assert error_event is None, f"Unexpected message_error for request_id={request_id}: {error_event}"
# Cross-association guard: every event with a requestId must
# belong to exactly one of the request ids we issued.
issued = set(request_ids)
for event in sender_collector.snapshot():
event_request_id = event.get("requestId")
if event_request_id is None:
continue
assert event_request_id in issued, (
f"Event carries an unknown requestId={event_request_id!r}, " f"not in issued set {issued}. Event: {event}"
)
# Per-request invariants apply to every concurrent send
# (correct requestId, no duplicate terminal events,
# Sent never before Propagated).
for request_id in request_ids:
assert_event_invariants(sender_collector, request_id)
class TestS31ConcurrentSendsMixedTopicsDuringChurn(StepsStore):
"""
S31: concurrent sends across mixed content topics during peer churn.
"""
@pytest.mark.docker_required
def test_s31_concurrent_sends_mixed_topics_during_churn(self, node_config):
sender_collector = EventCollector()
relay_peer = WakuNode(NODE_2, f"s31_relay_peer_{self.test_id}")
relay_peer.start(relay="true", discv5_discovery="false")
relay_peer.set_relay_subscriptions([self.test_pubsub_topic])
lightpush_peer = WakuNode(NODE_2, f"s31_lightpush_peer_{self.test_id}")
lightpush_peer.start(relay="true", lightpush="true", discv5_discovery="false")
lightpush_peer.set_relay_subscriptions([self.test_pubsub_topic])
store_peer = WakuNode(NODE_2, f"s31_store_peer_{self.test_id}")
store_peer.start(relay="true", store="true", discv5_discovery="false")
store_peer.set_relay_subscriptions([self.test_pubsub_topic])
churn_peers = [relay_peer, lightpush_peer, store_peer]
# Mesh docker peers so a lightpushed message can fan out to the store peer.
peer_multiaddrs = [p.get_multiaddr_with_id() for p in churn_peers]
for peer in churn_peers:
others = [a for a in peer_multiaddrs if a != peer.get_multiaddr_with_id()]
peer.add_peers(others)
node_config.update(
{
"mode": "Edge",
"relay": True,
"lightpush": True,
"store": False,
"discv5Discovery": False,
"numShardsInNetwork": 1,
"lightpushnode": lightpush_peer.get_multiaddr_with_id(),
}
)
sender_result = WrapperManager.create_and_start(
config=node_config,
event_cb=sender_collector.event_callback,
)
assert sender_result.is_ok(), f"Failed to start sender: {sender_result.err()}"
with sender_result.ok_value as sender_node:
sender_multiaddr = get_node_multiaddr(sender_node)
for peer in churn_peers:
peer.add_peers([sender_multiaddr])
delay(3) # let docker peers connect to the sender
all_request_ids: list[str] = []
phase1_ids = self._s31_fire_burst(sender_node, phase_label="phase1")
all_request_ids.extend(phase1_ids)
for peer in churn_peers:
peer.restart()
delay(1) # small window so the restart is actually in-flight
phase2_ids = self._s31_fire_burst(sender_node, phase_label="phase2")
all_request_ids.extend(phase2_ids)
# Wait for all peers to be ready again and re-attach the sender.
for peer in churn_peers:
peer.ensure_ready(timeout_duration=20)
peer.add_peers([sender_multiaddr])
peer_multiaddrs = [p.get_multiaddr_with_id() for p in churn_peers]
for peer in churn_peers:
others = [a for a in peer_multiaddrs if a != peer.get_multiaddr_with_id()]
peer.add_peers(others)
delay(3)
phase3_ids = self._s31_fire_burst(sender_node, phase_label="phase3")
all_request_ids.extend(phase3_ids)
assert len(set(all_request_ids)) == len(all_request_ids), f"Duplicate RequestIds across bursts: {all_request_ids}"
# Phase 1 ran before any churn, so the mesh was stable — standard timeout.
# Phase 3 ran right after restart + re-attach, so the mesh needed to
# re-stabilize — use the recovery timeout to avoid CI flakiness.
phase_timeouts = [
(phase1_ids, PROPAGATED_TIMEOUT_S),
(phase3_ids, RECOVERY_TIMEOUT_S),
]
for request_ids, timeout_s in phase_timeouts:
for request_id in request_ids:
propagated_event = wait_for_propagated(
collector=sender_collector,
request_id=request_id,
timeout_s=timeout_s,
)
assert propagated_event is not None, (
f"No MessagePropagatedEvent for stable-phase "
f"request_id={request_id} within {timeout_s}s. "
f"Collected events: {sender_collector.events}"
)
error_event = wait_for_error(
collector=sender_collector,
request_id=request_id,
timeout_s=0,
)
assert error_event is None, f"Unexpected message_error event for stable-phase " f"request_id={request_id}: {error_event}"
for request_id in phase2_ids:
error_event = wait_for_error(
collector=sender_collector,
request_id=request_id,
timeout_s=0,
)
assert error_event is None, f"Unexpected terminal message_error for phase-2 " f"request_id={request_id} after recovery: {error_event}"
issued = set(all_request_ids)
for event in sender_collector.snapshot():
event_request_id = event.get("requestId")
if event_request_id is None:
continue
assert event_request_id in issued, (
f"Event carries an unknown requestId={event_request_id!r}, " f"not in issued set {issued}. Event: {event}"
)
# Use the hash the wrapper emitted on message_sent so the store
# lookup matches the exact bytes that were actually published.
phase3_hashes = []
for request_id in phase3_ids:
sent_event = wait_for_sent(
collector=sender_collector,
request_id=request_id,
timeout_s=RECOVERY_TIMEOUT_S,
)
assert sent_event is not None, (
f"No message_sent event for phase-3 request_id={request_id} "
f"within {RECOVERY_TIMEOUT_S}s. Collected events: {sender_collector.events}"
)
msg_hash = sent_event.get("messageHash")
assert msg_hash, f"message_sent event missing messageHash: {sent_event}"
phase3_hashes.append(msg_hash)
# 3 phases × S31_BURST_SIZE messages, so the page must fit them all,
# otherwise phase-3 hashes (which sort last in ascending order) get cut off.
self.check_sent_message_is_stored(
expected_hashes=phase3_hashes,
store_node=store_peer,
pubsub_topic=self.test_pubsub_topic,
page_size=S31_BURST_SIZE * 3,
ascending="true",
)
# Per-request invariants apply across all phases, including the
# retry-path bursts (phase 2). If retries ever emit duplicate
# Propagated events or reorder Sent before Propagated, this catches it.
for request_id in all_request_ids:
assert_event_invariants(sender_collector, request_id)
def _s31_fire_burst(self, sender_node, *, phase_label: str) -> list[str]:
"""Fire S31_BURST_SIZE concurrent sends, one per topic in S31_CONTENT_TOPICS.
Returns the list of RequestIds. Asserts every send returned Ok."""
messages = [
self.create_message(
contentTopic=S31_CONTENT_TOPICS[i],
payload=to_base64(f"s31-{phase_label}-{i}"),
)
for i in range(S31_BURST_SIZE)
]
with ThreadPoolExecutor(max_workers=S31_BURST_SIZE) as pool:
send_results = list(pool.map(sender_node.send_message, messages))
request_ids = []
for i, send_result in enumerate(send_results):
assert send_result.is_ok(), f"{phase_label}: concurrent send #{i} failed: {send_result.err()}"
request_id = send_result.ok_value
assert request_id, f"{phase_label}: concurrent send #{i} returned an empty RequestId"
request_ids.append(request_id)
return request_ids