From 18a94fa3b84ad106c366e9740eab09fd00492331 Mon Sep 17 00:00:00 2001 From: fryorcraken Date: Tue, 22 Jul 2025 11:28:55 +1000 Subject: [PATCH 1/6] Express Codex requirements --- requirements/codex.md | 106 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 104 insertions(+), 2 deletions(-) diff --git a/requirements/codex.md b/requirements/codex.md index 2fdb0de..ed22791 100644 --- a/requirements/codex.md +++ b/requirements/codex.md @@ -1,3 +1,105 @@ -# Waku's requirements on Codex +# Waku's Requirements on Codex -TODO \ No newline at end of file +## Publish Large Messages - Uploader is online + +To be used for messages archival in Chat SDK, Qaku, opchan, etc. + +### Functionality + +1. Ability to transfer a message of 1MB or more between two or more nodes. +2. Message's CID is less than 100kB. + + +### Usability + +1. Developer can implement upload feature with 10 lines of code or less. +2. No configuration is necessary. + +### Reliability + +1. Download operation can be resumed. +2. Upload operation can be resumed. +3. Uploader can be expected to be online when user are downloading. + +### Performance + +None + +### Supportability + +1. Library for Browser applications. +2. Library for Nim desktop applications. +3. Library for Nim mobile applications. + +**+ (Privacy, Anonymity, Censorship-Resistance, Deployments)** + +1. The unavailability of a static host (IP, DNS) does not prevent a user to upload or download (censorship-resistance). +2. TODO (privacy) + +## Publish Large Messages - Uploader is offline + +To be used for + +- large messages transfers (such as images, videos, audio) in Chat SDK, Opchan, etc. +- Enhancement of message archival (uploader does not need to be online). + +Builds on [Publish Large Messages - Uploader is online](#publish-large-messages---uploader-is-online) + +### Functionality + +1. Message is cached to enable retrieval without sender being online (once uploaded). +2. Best effort in terms of message retention; expectations on restrictions are documented. + +### Usability + +1. Receiver can download the large message, even if sender is offline, as long as they get the CID out-of-band. + +### Reliability + +1. Uploader may be offline when receiver is retrieving the large message. + +### Performance + +None + +### Supportability + +1. Library for Browser applications. +2. Library for Nim applications. + +### + (Privacy, Anonymity, Censorship-Resistance, Deployments)** + +TODO (privacy) + +## Publish Large Messages - Retention is guaranteed + +In this scenario, the uploader wants to ensure the data is persisted and is willing to pay for it. +This may be a Qaku Q&A admin, a opchan cell owner or Status Communities owner. + +Builds on previous requirements. + +### Functionality + +1. Uploader may pay for large message storage to have guaranteed retention. + +### Usability + +1. Receiver can download the large message, even if sender is offline, as long as they get the CID out-of-band. +2. Receiver does not need to pay to retrieve the large message. + +### Reliability + +1. Uploader may be offline when receiver is retrieving the large message. +2. Uploader is guaranteed a period of retention for a given price. + +### Performance + +See previous requirements. + +### Supportability + +See previous requirements. + +### + (Privacy, Anonymity, Censorship-Resistance, Deployments)** + +See previous requirements. \ No newline at end of file From c2ede49d626a0ceab2dbdfa657d3ca22b2d35a54 Mon Sep 17 00:00:00 2001 From: fryorcraken <110212804+fryorcraken@users.noreply.github.com> Date: Tue, 22 Jul 2025 15:30:15 +1000 Subject: [PATCH 2/6] Update requirements/codex.md Co-authored-by: Jazz Turner-Baggs <473256+jazzz@users.noreply.github.com> --- requirements/codex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements/codex.md b/requirements/codex.md index ed22791..c2f9855 100644 --- a/requirements/codex.md +++ b/requirements/codex.md @@ -41,7 +41,7 @@ None To be used for - large messages transfers (such as images, videos, audio) in Chat SDK, Opchan, etc. -- Enhancement of message archival (uploader does not need to be online). +- Enhancement of message archival (uploader does not need to be online for messages to be retrieved). Builds on [Publish Large Messages - Uploader is online](#publish-large-messages---uploader-is-online) From 30baa44d0a1def6321e351a0537ba59300b92298 Mon Sep 17 00:00:00 2001 From: fryorcraken Date: Tue, 22 Jul 2025 15:35:00 +1000 Subject: [PATCH 3/6] more context on archival --- requirements/codex.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/requirements/codex.md b/requirements/codex.md index c2f9855..afbd178 100644 --- a/requirements/codex.md +++ b/requirements/codex.md @@ -3,6 +3,13 @@ ## Publish Large Messages - Uploader is online To be used for messages archival in Chat SDK, Qaku, opchan, etc. +It assumes that a special user (admin) regularly bundles messages and pushes them to an external system. +It then pushes the CID (or any other reference to retrieve the bundle) over Waku. + +New users retrieve and listen to new messages using Waku. +Thanks to SDS, they learn whether they miss messages, and if so, can proceed with retrieval from the latest bundle. + +(clearly, spec is needed). ### Functionality From 82d33b63e154dc5d23718931fa695630ca7c2831 Mon Sep 17 00:00:00 2001 From: fryorcraken <110212804+fryorcraken@users.noreply.github.com> Date: Tue, 22 Jul 2025 15:38:08 +1000 Subject: [PATCH 4/6] Update requirements/codex.md Co-authored-by: Eric <5089238+emizzle@users.noreply.github.com> --- requirements/codex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements/codex.md b/requirements/codex.md index afbd178..31d13c3 100644 --- a/requirements/codex.md +++ b/requirements/codex.md @@ -54,7 +54,7 @@ Builds on [Publish Large Messages - Uploader is online](#publish-large-messages- ### Functionality -1. Message is cached to enable retrieval without sender being online (once uploaded). +1. After upload, message is retrievable without sender being online. 2. Best effort in terms of message retention; expectations on restrictions are documented. ### Usability From b703f4dbf78b8ea7a5e7bd5bf427975774e6b123 Mon Sep 17 00:00:00 2001 From: fryorcraken Date: Thu, 24 Jul 2025 12:27:16 +1000 Subject: [PATCH 5/6] Incorporate feedback and focus on two features for now # Conflicts: # requirements/codex.md --- requirements/codex.md | 115 ++++++++++++++++++++---------------------- 1 file changed, 54 insertions(+), 61 deletions(-) diff --git a/requirements/codex.md b/requirements/codex.md index 31d13c3..635f83c 100644 --- a/requirements/codex.md +++ b/requirements/codex.md @@ -1,112 +1,105 @@ # Waku's Requirements on Codex -## Publish Large Messages - Uploader is online +## Message Archival To be used for messages archival in Chat SDK, Qaku, opchan, etc. -It assumes that a special user (admin) regularly bundles messages and pushes them to an external system. -It then pushes the CID (or any other reference to retrieve the bundle) over Waku. +It assumes that a special user (admin, referred to as "original uploader") regularly bundles messages and pushes them to an external system. +It then pushes the reference to the bundle over Waku. -New users retrieve and listen to new messages using Waku. -Thanks to SDS, they learn whether they miss messages, and if so, can proceed with retrieval from the latest bundle. +New users retrieve and listen to new messages using Waku upon start up. +Then, they may retrieve bundles, likely because the know they are missing message via SDS. -(clearly, spec is needed). +The original uploader is the one to determine durability, aka, relevance of data over time. +It is application specific (eg until a Q&A is completed), and not related to users having downloaded the data. + +Which means it's a scenario where: + +- Time from upload to retrieval is **not** critical (latest messages are available on Waku) +- Several users can seed and download the bundle. + +This is very similar to BitTorrent integration in Status. I need to find specs to be more explicit about difference. +Some notes on BitTorrent integration in Status (AFAIK, asking @osmaczko for help): + +1. Known issue is that the bundle is very large, and hence consumes a lot of bandwidth. I don't know if the bundle is "updated" or overridden. + On Waku app side, we need to be open to have one large bundle vs a series of manageable bundle. The latter offers more flexibility such as attaching bloom filters, + for selective download. +2. The bundle download is indiscriminate, meaning every user will download it at some point, with SDS, we can do something smarter + +Also note (more of a personal opinion), usage of BitTorrent/webtorrent could be an acceptable starting point. ### Functionality -1. Ability to transfer a message of 1MB or more between two or more nodes. -2. Message's CID is less than 100kB. - +1. Ability to transfer a bundle of 1MB or more between two or more nodes. +2. Reference to bundle is 50kB or less. ### Usability -1. Developer can implement upload feature with 10 lines of code or less. -2. No configuration is necessary. +1. Developer can implement feature with 10 lines of code or less. ### Reliability 1. Download operation can be resumed. 2. Upload operation can be resumed. -3. Uploader can be expected to be online when user are downloading. +3. As long as original developer is online, bundle should be retrievable. +4. As long as N out of M users are online, bundle should be retrievable. ### Performance -None +1. Time between bundle uploaded, and retrieved by users can be in the span of minutes and hours (we assumes messages are available in Waku store for several hours). +2. The burden of re-upload is shared by users. ### Supportability 1. Library for Browser applications. 2. Library for Nim desktop applications. 3. Library for Nim mobile applications. +4. Most users may be behind NAT routers and other domestic network setup. -**+ (Privacy, Anonymity, Censorship-Resistance, Deployments)** +### + (Privacy, Anonymity, Censorship-Resistance, Deployments) 1. The unavailability of a static host (IP, DNS) does not prevent a user to upload or download (censorship-resistance). -2. TODO (privacy) +2. A participant cannot determine original uploader's PII (anonymity). -## Publish Large Messages - Uploader is offline +## Large File Transfer -To be used for +To be used when 2 users or more, are transferring a large payload. This may be a large image or video in a private chat. +Or it could be a llm prompt that returns a large image or video. -- large messages transfers (such as images, videos, audio) in Chat SDK, Opchan, etc. -- Enhancement of message archival (uploader does not need to be online for messages to be retrieved). +Due to the broadcast nature of Waku, it would hog too much bandwidth if every large file sent between users where sent over Waku. -Builds on [Publish Large Messages - Uploader is online](#publish-large-messages---uploader-is-online) +In terms of durability, it can be assumed that once all participants have downloaded the payload, it does not need to be retrievable anymore. +It should also be assumed that the users may not be online at the same time (mobile). +There is more expectation on timeliness of retrievability, as one would want to be able to download seconds after the upload happened. ### Functionality -1. After upload, message is retrievable without sender being online. -2. Best effort in terms of message retention; expectations on restrictions are documented. +1. Ability to transfer a payload of 1MB or more between two or more peers. +2. Reference to payload is less than 50kB. ### Usability -1. Receiver can download the large message, even if sender is offline, as long as they get the CID out-of-band. +1. Developer can implement feature with 10 lines of code or less. ### Reliability -1. Uploader may be offline when receiver is retrieving the large message. +1. Download operation can be resumed. +2. Upload operation can be resumed. +3. Payload should be retrievable even if original uploader goes offline. +4. Once all recipients have downloaded the payload, there is no more durability expectations. ### Performance -None +1. Payload download should start within seconds of the upload start. ### Supportability 1. Library for Browser applications. -2. Library for Nim applications. +2. Library for Nim desktop applications. +3. Library for Nim mobile applications. +4. Most users may be behind NAT routers and other domestic network setup. -### + (Privacy, Anonymity, Censorship-Resistance, Deployments)** +### + (Privacy, Anonymity, Censorship-Resistance, Deployments) -TODO (privacy) - -## Publish Large Messages - Retention is guaranteed - -In this scenario, the uploader wants to ensure the data is persisted and is willing to pay for it. -This may be a Qaku Q&A admin, a opchan cell owner or Status Communities owner. - -Builds on previous requirements. - -### Functionality - -1. Uploader may pay for large message storage to have guaranteed retention. - -### Usability - -1. Receiver can download the large message, even if sender is offline, as long as they get the CID out-of-band. -2. Receiver does not need to pay to retrieve the large message. - -### Reliability - -1. Uploader may be offline when receiver is retrieving the large message. -2. Uploader is guaranteed a period of retention for a given price. - -### Performance - -See previous requirements. - -### Supportability - -See previous requirements. - -### + (Privacy, Anonymity, Censorship-Resistance, Deployments)** - -See previous requirements. \ No newline at end of file +1. The unavailability of a static host (IP, DNS) does not prevent a user to upload or download (censorship-resistance). +2. An external observer cannot tie the PIIs of the uploader and downloaders of one payload; + it is assumed that the reference to the payload (eg, CID) is not leaked outside the participants. From 7367871d6c375d3fae03fd88e424af30dbc76e77 Mon Sep 17 00:00:00 2001 From: fryorcraken Date: Mon, 28 Jul 2025 15:55:26 +1000 Subject: [PATCH 6/6] Capturing discussions. --- requirements/codex.md | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/requirements/codex.md b/requirements/codex.md index 635f83c..9f241aa 100644 --- a/requirements/codex.md +++ b/requirements/codex.md @@ -9,7 +9,7 @@ It then pushes the reference to the bundle over Waku. New users retrieve and listen to new messages using Waku upon start up. Then, they may retrieve bundles, likely because the know they are missing message via SDS. -The original uploader is the one to determine durability, aka, relevance of data over time. +The original uploader is the one to determine how much persistence and guarantee they want for the bundle. It is application specific (eg until a Q&A is completed), and not related to users having downloaded the data. Which means it's a scenario where: @@ -17,16 +17,33 @@ Which means it's a scenario where: - Time from upload to retrieval is **not** critical (latest messages are available on Waku) - Several users can seed and download the bundle. -This is very similar to BitTorrent integration in Status. I need to find specs to be more explicit about difference. -Some notes on BitTorrent integration in Status (AFAIK, asking @osmaczko for help): +This is very similar to BitTorrent integration in Status ([specs](https://github.com/vacp2p/rfc-index/blob/main/status/61/community-history-service.md)) +Some notes on BitTorrent integration in Status: 1. Known issue is that the bundle is very large, and hence consumes a lot of bandwidth. I don't know if the bundle is "updated" or overridden. On Waku app side, we need to be open to have one large bundle vs a series of manageable bundle. The latter offers more flexibility such as attaching bloom filters, for selective download. 2. The bundle download is indiscriminate, meaning every user will download it at some point, with SDS, we can do something smarter +**Technical solutions** + +A comment on possible solutions: + Also note (more of a personal opinion), usage of BitTorrent/webtorrent could be an acceptable starting point. +Web is being solved by @vpavlin with vpavlin/qaku-cache which acts as a pinning gateway: + +> it is basically a pinning gateway, but instead of using HTTP to upload you use the Codex network itself. +> It would be cool to add some auth - I was thinking about using Semaphore or RLN to limit who and how much can cache - WDYT? + +Or with a Codex web-client that can retrieve and upload from the network: + +> I think if we can get a Codex web-client which can upload to the network somehow, we are immediately +> solving one of the biggest pain points of IPFS and have a great story for anyone "why are you building a new storage +> and why should I use it?" - I definitely want this to happen:-) + +These are inferred in the FURPS below with requirements on web support and censorship-resistance. + ### Functionality 1. Ability to transfer a bundle of 1MB or more between two or more nodes. @@ -40,7 +57,7 @@ Also note (more of a personal opinion), usage of BitTorrent/webtorrent could be 1. Download operation can be resumed. 2. Upload operation can be resumed. -3. As long as original developer is online, bundle should be retrievable. +3. As long as original uploader is online, bundle should be retrievable. 4. As long as N out of M users are online, bundle should be retrievable. ### Performance @@ -85,7 +102,7 @@ There is more expectation on timeliness of retrievability, as one would want to 1. Download operation can be resumed. 2. Upload operation can be resumed. 3. Payload should be retrievable even if original uploader goes offline. -4. Once all recipients have downloaded the payload, there is no more durability expectations. +4. Once all recipients have downloaded the payload, there is no more expectations on being able to retrieve the payload. ### Performance @@ -102,4 +119,4 @@ There is more expectation on timeliness of retrievability, as one would want to 1. The unavailability of a static host (IP, DNS) does not prevent a user to upload or download (censorship-resistance). 2. An external observer cannot tie the PIIs of the uploader and downloaders of one payload; - it is assumed that the reference to the payload (eg, CID) is not leaked outside the participants. + it is assumed that the reference to the payload (eg, CID) is not leaked outside the participants. \ No newline at end of file