add and modify pm files from Waku

This commit is contained in:
jessiebroke 2023-11-06 15:53:21 -05:00
parent 2818239148
commit aa9d51b706
No known key found for this signature in database
GPG Key ID: D901DC638A938F8C
4 changed files with 457 additions and 0 deletions

187
.github/labels.yml vendored Normal file
View File

@ -0,0 +1,187 @@
# Generic labels
- name: bug
description: Something isn't working
color: d73a4a
- name: documentation
description: Improvements or additions to documentation
color: 0075ca
- name: duplicate
description: This issue or pull request already exists
color: cfd3d7
- name: enhancement
description: New feature or request
color: a2eeef
- name: good first issue
description: Good for newcomers
color: 7057ff
- name: help wanted
description: Extra attention is needed
color: 008672
- name: invalid
description: This doesn't seem right
color: e4e669
- name: wontfix
description: This will not be worked on
color: ffffff
- name: blocked
description: This issue is blocked by some other work
color: e0af74
- name: critical
description: This issue needs critical attention
color: B60205
- name: infra
description: Infra, devops, CI and related tasks
color: 277196
- name: milestone
description: Tracks a subteam milestone
color: 1CC0B0
- name: epic
description: Tracks a yearly team epic (only for codex-storage/codex-pm repo)
color: 07544B
- name: test
description: Issue related to the test suite with no expected consequence to production code
color: 277196
- name: release-notes
description: Issue/PR needs to be evaluated for inclusion in release notes highlights or upgrade instructions
color: fb3c99
- name: dependencies
description: Pull requests that update a dependency file or issues that track it
color: 0366d6
# Everything below this comment requires updating for Codex
# Tracks | need to update to Codex
- name: track:rln
description: RLN Track (Secure Messaging/Applied ZK), e.g. relay and applications
color: C89BC6
- name: track:anonymity
description: Anonymity track (Secure Messaging)
color: 06B6C8
- name: track:operator-outreach
description: Operator outreach track (Secure Messaging/Waku Product)
color: B888AB
- name: track:ft-store
description: FT-Store track (Secure Messaging)
color: F5FD62
- name: track:discovery
description: Discovery track (Secure Messaging/Waku Product)
color: 6BEB61
- name: track:protocol-incentivization
description: Protocol Incentivization track (Secure Messaging), e.g. service credentials
color: 0037E3
- name: track:restricted-run
description: Restricted run track (Secure Messaging/Waku Product), e.g. filter, WebRTC
color: D91C35
- name: track:conversational-security
description: Conversational security track (Secure Messaging)
color: CC6B00
- name: track:nwaku-productionization
description: nwaku productionization track (Waku Product)
color: 9DEA79
- name: track:nwaku-maintenance
description: nwaku maintenance track (Waku Product)
color: 40F9F0
- name: track:network-testing
description: Network testing track (Secure Messaging/Waku Product)
color: bfd4f2
- name: track:platform-outreach
description: Platform outreach track (Waku Product)
color: 06B6C8
- name: track:sdks
description: SDKS track (Waku Product), including bindings
color: 34D557
- name: track:go-waku-productionization
description: go-waku productionization track (Waku Product)
color: 9DEA79
# Epics | need to update to Codex
## Orphan Epics
- name: "E:Define network and community metrics"
description: See https://github.com/waku-org/pm/issues/35 for details
color: 2B4DAF
- name: "E:Comprehensive dev testing"
description: See https://github.com/waku-org/pm/issues/90 for details
color: 088DF8
- name: "E:Basic service incentivization"
description: See https://github.com/waku-org/pm/issues/96 for details
color: 0e8a16
- name: "E:Presentation Readiness"
description: See https://github.com/waku-org/pm/issues/95 for details
color: 95D888
- name: "E:RLN on mainnet"
description: see https://github.com/waku-org/pm/issues/98 for details
color: 54D412
## [Milestone] Waku Network Can Support 10K Users
- name: "E:Static sharding"
description: See https://github.com/waku-org/pm/issues/15 for details
color: b60205
- name: "E:Opt-in message signing"
description: See https://github.com/waku-org/pm/issues/20 for details
color: 10C366
- name: "E:Targeted Status Communities dogfooding"
description: See https://github.com/waku-org/pm/issues/97 for details
color: 10C313
## [Milestone] Support Many Platforms
- name: "E:NodeJS Library"
description: See https://github.com/waku-org/pm/issues/81 for details
color: fef2c0
- name: "E:REST API service node"
description: See https://github.com/waku-org/pm/issues/82 for details
color: bfd4f2
- name: "E:RLN non-native SDKs"
description: See https://github.com/waku-org/pm/issues/88 for details
color: 3CDF39
## [Milestone] Waku Network can Support 1 Million Users
- name: "E:PostgreSQL"
description: See https://github.com/waku-org/pm/issues/84 for details
color: 35DCCC
- name: "E:Cater for professional operators"
description: See https://github.com/waku-org/pm/issues/92 for details
color: 6E7C80
## [Milestone] Waku Network Gen 0
- name: "E:1.1 Network requirements and task breakdown"
description: See https://github.com/waku-org/pm/issues/62 for details
color: 20A609
- name: "E:1.2: Autosharding for autoscaling"
description: See https://github.com/waku-org/pm/issues/65 for details
color: 5319e7
- name: "E:1.3: Node bandwidth management mechanism"
description: See https://github.com/waku-org/pm/issues/66 for details
color: 8A5EE0
- name: "E:1.4: Sharded peer management and discovery"
description: See https://github.com/waku-org/pm/issues/67 for details
color: 244B5B
- name: "E:1.5: Launch and dogfood integrated WN MVP"
description: See https://github.com/waku-org/pm/issues/68 for details
color: 195A49
- name: "E:2.1: Production testing of existing protocols"
description: See https://github.com/waku-org/pm/issues/49 for details
color: e99695
- name: "E:2.2: Sharded cap discovery for light protocols"
description: See https://github.com/waku-org/pm/issues/63 for details
color: 92A880
- name: "E:2.3: Basic distributed Store services"
description: See https://github.com/waku-org/pm/issues/64 for details
color: 37BEA9
- name: "E:3.1: DoS requirements and design"
description: See https://github.com/waku-org/pm/issues/69 for details
color: B342EA
- name: "E:3.2: Basic DoS protection in production"
description: See https://github.com/waku-org/pm/issues/70 for details
color: DF42D5
- name: "E:3.4: Production and memberships on mainnet"
description: See https://github.com/waku-org/pm/issues/87 for details
color: e99695
## [Milestone] Quality Assurance processes are in place
- name: "E:Automated release processes"
description: See https://github.com/waku-org/pm/issues/86 for details
color: 0052cc
- name: "E:End-to-end testing"
description: See https://github.com/waku-org/pm/issues/34 for details
color: bfd4f2

View File

@ -0,0 +1,15 @@
name: Add new issues to Codex project board
on:
issues:
types: [opened]
jobs:
add-to-project:
name: Add issue to project
runs-on: ubuntu-latest
steps:
- uses: actions/add-to-project@v0.3.0
with:
project-url: https://github.com/orgs/codex-storage/projects/2
github-token: ${{ secrets.ADD_TO_PROJECT_PAT }}

60
.github/workflows/sync-labels.yml vendored Normal file
View File

@ -0,0 +1,60 @@
name: Sync labels
on:
push:
branches:
- master
paths:
- .github/labels.yml
- .github/workflows/sync-labels.yml
workflow_dispatch:
permissions:
issues: write
jobs:
sync-labels:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: micnncim/action-label-syncer@v1
with:
manifest: .github/labels.yml
repository: |
codex-storage/cs-codex-dist-tests
codex-storage/nim-codex
codex-storage/codex-contracts-eth
codex-storage/nim-poseidon2
codex-storage/codex-frontend
codex-storage/codex-research
codex-storage/nim-chronos
codex-storage/zk-benchmarks
codex-storage/multicodec
codex-storage/constantine
codex-storage/codex-storage-proofs-circuits
codex-storage/codex-pm
codex-storage/codex.storage
codex-storage/nim-ethers
codex-storage/logtools
codex-storage/nim-libp2p
codex-storage/nim-datastore
codex-storage/das-research
codex-storage/nim-codex-dht
codex-storage/das-dht-emulator
codex-storage/swarmsim
codex-storage/questionable
codex-storage/nim-contract-abi
codex-storage/asynctest
codex-storage/dist-tests-prometheus
codex-storage/dist-tests-geth
codex-storage/codex-storage-proofs
codex-storage/network-testing-codex
codex-storage/rs-poseidon
codex-storage/nim-leopard
codex-storage/nim-nitro
codex-storage/zk-research-artifacts
codex-storage/debugging-scratchpad
codex-storage/infra-codex
codex-storage/infra-docs
codex-storage/codex-incentives
token: ${{ secrets.SYNC_LABELS2 }}
prune: true

View File

@ -0,0 +1,195 @@
# Waku Message UID (v2)
# Context and previous attempts
### Message deduplication in Waku Relay (Gossipsub)
The *Message Cache* and the *Seen Cache* perform message deduplication in Waku Relay. These two deduplication structures rely on a unique ID which, at the moment of writing, is computed as follows:
```rust
message_id: [u8; 32] = sha256(WakuMessageBuffer)
```
These structures provide a limited-time message deduplication capability o keep the memory footprint low. The *Message Cache* provides a short-term deduplication capability (~5 heartbeat periods). The **Seen Cache,** implemented as a bloom filter, provides a longer-term deduplication capability (~2 minutes).
Based on the fact that `WakuMessage` is formatted using protocol buffers, and this serialization mechanism is not deterministic, there is a possibility that the **Seen Cache** fails to lead to duplicated messages by just reordering the different fields that are serialized.
### Libp2ps Gossipsub message validation
The Gossipsub v1.0 specification states [the following](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md#message-processing:~:text=Payload%20processing%20will%20validate%20the%20message%20according%20to%20application%2Ddefined%20rules%20and%20check%20the%20seen%20cache%20to%20determine%20if%20the%20message%20has%20been%20processed%20previously) about the message validation:
> Payload processing will validate the message according to application-defined rules and check the [`seen` cache](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md#message-cache) to determine if the message has been processed previously.
>
Essentially, the specification leaves the message validation to the application.
Consequently, the principal Gossipsub implementations (Nim, Go and Rust) do not implement message integrity or ID validation. And by extension, this is also true for the Waku Relay protocol, as it is an opinionated version of the Gossipsub protocol.
### Message delivery order guarantees in Waku Relay
Gossipsub protocol, due to its flood-distribution nature, cannot guarantee that all messages will follow the same distribution path. So there is no ordering guarantee per se.
But if two messages are published separately enough in time, we can assume that both messages will be delivered in a predictable order. The maximum latency of the network determines this minimum separation.
### Records deduplication in Waku Archive
At the moment of writing, the Waku Archive SQLite backend implementation considers two messages equal if both messages share the same `pubsub_topic`, `content_topic`, `timestamp` and `payload`. This criterion is baked into the SQLites table `PRIMARY KEY`. Messages that do not match the criteria are treated as duplicates.
At the moment of querying via the Waku Store protocol, the results are ordered by the columns `stored_at` (equal to the message timestamp if present or to message arrival time otherwise) and `id` (computed as `Sha256(content_topic, payload`).
### Records retrieval via Waku Store
At the moment of writing, the **Waku Store** protocol provides an API supporting retrieving a filtered list of messages from a remote node. Additionally, as the maximum size supported for a message is 1MB, the **Waku** *Store* query protocol supports a pagination mechanism. This helps reduce the amount of data downloaded per query to a maximum of 100 messages (100 MB).
However, there is a limitation when verifying missing messages in a node's history. Downloading large amounts of data, in the order of hundreds of megabytes or even gigabytes, is not a time and bandwidth-efficient method for this task.
# Goal
Definition of a Waku message uniqueness identifier that can be used to deduplicate Waku messages across the solution.
# No goal
- Build a blockchain. Consistency guarantees between archive-capable nodes are out of the scope of the present document.
- Address or mitigate [all vulnerabilities](https://www.researchgate.net/publication/342734066_GossipSub_Attack-Resilient_Message_Propagation_in_the_Filecoin_and_ETH20_Networks) to which Waku Relay (and Gossipsub) are susceptible.
# Use cases
- Message deduplication in the network.
- Message deduplication in the Waku Archive backend (e.g., in a shared backend setup).
- Bandwidth-efficient Waku Archive synchronization.
# Requirements
- Length limited.
- Low to negligible collision probability.
- Not leaking information (e.g., sender's key).
- Application specific (e.g., lexicographic sortable).
- Uniqueness is global to the network.
- Global to all the nodes publishing in a certain network (pub-sub topic).
- Global to the Gossipsubs pub-sub topics certain store node is subscribed to.
- Act as a message integrity check.
- As an open network, all nodes should be able to perform a message ID validation.
# Pre-requites
## The Waku Messages `meta` attribute
The Waku Messages `meta` attribute is an arbitrary application-specific variable-length byte array with a maximum length limit of 32 bytes (2^256 possibilities).
The messages `meta` field MUST be present and have a length greater than zero in the non-ephemeral messages (those persisted by the Waku Archive durability service).
# The Waku Message UID (MUID)
The Waku Message UID is a two-part variable length identifier that can unequivocally identify and deduplicate the messages in a Waku network.
The MUID comprises two parts: *message checksum* and **application-specific variable-length metadata.**
```rust
muid: [u8; 64] = concat(checksum, metadata)
```
The maximum length for the MUID is 64 bytes.
## The *checksum* part
It is a computable 32 bytes fixed-length checksum based on the content of the Waku Message. It is defined as follows:
```rust
checksum: [u8; 32] = sha256(network_topic, WakuMessage.topic, WakuMessage.meta, WakuMessage.payload)
```
The *checksum* part ensures the integrity of the Waku Message contained in the Gossipsub payload. As any node in the network can compute it, the message integrity can be verified.
## The *metadata* part
It is an application-specific part extracted from the Waku Messages `meta` attribute.
# Message uniqueness considerations
Two messages are considered equal if they have the same `network topic`, `content topic`, `meta` attribute, and `payload`. As the MUID is derived from the message content, both messages share the same.
The application should provide different `meta` attributes to different messages to avoid collisions in the relay and archive collisions.
# Example message metadata schemas
Applications should specify the schema for the Waku Messages `meta` attribute. The selected schema will affect the application's messages' privacy, security and message collision probability.
These are some example schemas that could be used:
- **Timestamp (e.g., int64 Unix Epoch nanoseconds timestamp):**
- **PRO:** Simple, performant generation, “backwards compatible”, fine grain sortable at archive query results (precision in ns).
- **CON:** Prone to collision/message duplication, traceable/graph learning
- **ULID:**
- **PRO:** Medium complexity, performant generation, negligible collision probability, fine grain sortable at archive query results (precision in ms).
- **CON:** traceable/graph learning
- **UUID (e.g., UUID v4):**
- **PRO:** Medium complexity, performant generation, negligible collision probability, no traceability (random data).
- **CON:** Coarse sortability at archive query time.
- **Noise Sessions (e.g., encrypted metadata/info):**
- **PRO:** Negligible collision probability (if the content is well thought), contains metadata, non-traceable (looks like random data).
- **CON:** High complexity, not-so-performant generation (hashing, encryption), not sortable at archive query time.
# Waku Relay: deduplication and integrity
Based on the low collision probability of some of the schemas described above, this MUID could be used as the message and seen caches key.
A message that reuses the same ID with a different payload within the *Message Cache* window wont be relayed. In the same way, if it is replayed within the **Seen Cache** window, it wont be received by subscribers.
Additionally, as all nodes can compute the **checksum** part of a message ID, a validator can be integrated to guarantee the Waku Message integrity.
# Waku Archive and Waku Store: durable streams
<aside>
The terms *event*, *message*, and *record* are synonyms in the *Waku Archive* context. Thus they can be used interchangeably. Although the record term is preferred over the others when talking about individual items in an events log.
</aside>
From the Waku product documents terms and concepts:
> Waku platform **events** are organized and durably stored in **topics**, also called **content topics**. And a sequence of events published on the same **topic** constitutes an **event stream**.
>
### Event recording
The Waku Archive stream durability service is responsible for recording and persisting in a long-term storage system the events that occurred in a certain event stream. This persistence follows two rules:
- Messages should be stored following the arrival order (FIFO).
- Messages should be deduplicated based on the MUID (the same criteria used in the **Waku Relay** layer).
Optionally, messages can be tagged with arrival time timestamps for coarse-ordering purposes.
### Queriable event log
Additionally to the event recording system, the *Waku Archive* service features a searchable log functionality and provides an interface for retrieving messages from the previously described event recording system. And the *Waku Store* historical messages query protocol sits on top of this interface, making it easily accessible through a remote procedure call (RPC) interface.
### Waku Stores bandwidth-consumption optimization
Based on the assumption that MUIDs are globally unique to all messages. We can understand archive-capable nodes as key-value stores. The MUID would be the key, and the message would be the value.
With that approach, the current **Waku Store** protocol RPCs can be extended to support a message ID query mechanism. The new Waku Store protocol APIs will look like this:
- Query a list of messages based on certain filter criteria (e.g., network, content topic, time range, etc.).
- Query a list of MUIDs based on certain filter criteria (e.g., network, content topic, time range, etc.).
- Query messages by a MUIDs list.
With a maximum size of 32 bytes, the number of UIDs per query response can be higher, saving bandwidth and reducing the “time-to-sync” metric.
It is discretionary to the application in which APIs are used and when. For example, bootstrapping the nodes history and getting the first 50 messages that fit the screen for a hypothetical messaging app can make a difference in the perceived UX. Once bootstrapped the nodes message history, the same application could use the MUID-based query mechanism to efficiently retrieve the missing messages and complete the rest of the messages history.
### Waku Stores message decryption optimization
Additionally, as the MUID must contain application-specific metadata, a Waku Store client can identify the messages in the list of MUIDs retrieved from the Waku Archive.
This will reduce bandwidth consumption and the CPU load derived from downloading a big list of messages from anothers node history and decrypting all the messages to filter only the application ones.
### Waku Stores message integrity check
In the same way, the Waku Relay nodes can validate a message by computing the checksum part of the message; a Waku Store client can determine if any record has been maliciously modified and certify the integrity of the received entry.
# Conclusions and future work
This proposal extends the current model and tries to unify the Waku platforms relay (*Waku Relay*) layer with the platforms stream durability functionality (*Waku Archive*) and the stream history query functionality (*Waku Store*).
Eventually, this UID-based history synchronization mechanism has the potential to be evolved into a fully-fledged history synchronization mechanism. Due to this unified approach, it has the potential to be added as a Gossipsub extension adding a “durable stream capability” to the protocol.
An in-depth privacy and security analysis is pending.