team-nl-br design proposal

This commit is contained in:
Marcin Czenko 2025-08-22 03:35:09 +02:00
parent c74fc622da
commit c7bead879f
No known key found for this signature in database
GPG Key ID: F6CB3ED4082ED433
8 changed files with 581 additions and 0 deletions

View File

@ -0,0 +1,188 @@
The `TorrentConfig` type provides the configuration for the BitTorrent-based History Archive management functionality:
```go
type TorrentConfig struct {
// Enabled set to true enables Community History Archive protocol
Enabled bool
// Port number which the BitTorrent client will listen to for conntections
Port int
// DataDir is the file system folder Status should use for message archive torrent data.
DataDir string
// TorrentDir is the file system folder Status should use for storing torrent metadata files.
TorrentDir string
}
```
The `DataDir` is where the History Archives for the controlled communities are stored. Then, `TorrentDir` is where the corresponding community torrent files are preserved.
In the `DataDir` folder, for each community there is a folder (named after community id) in which the history archive for that community is stored:
```bash
DataDir/
├── {communityID}/
│ ├── index # Archive index file (metadata)
│ └── data # Archive data file (actual messages)
└──
```
There is one-to-one relationship between the community folder and the corresponding *torrent* file (BitTorrent metainfo):
```bash
TorrentDir/
├── {communityID}.torrent # Torrent metadata file
└──
```
## When Archives are created
The function somehow central to the Archive creation is [InitHistoryArchiveTasks](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/messenger_communities.go#L3783). This function is called in a number of situations, e.g. in [Messenger.Start](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/messenger.go#L562), [Messenger.EditCommunity](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/messenger_communities.go#L2807), [Messenger.ImportCommunity](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/messenger_communities.go#L2865), [Messenger.EnableCommunityHistoryArchiveProtocol](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/messenger_communities.go#L4136).
In `InitHistoryArchiveTasks`, for each community with `HistoryArchiveSupportEnabled` option set to `true`:
- if community torrent file already exists: call [ArchiveManager.SeedHistoryArchiveTorrent](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive.go#L408) - see also [[What is Seeding (AI)?]] and [[When are magnetlink messages sent?]].
- determine if new archives need to be created based on the last archive end date and call [CreateAndSeedHistoryArchive](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive.go#L314)
- starts periodic archive creation task by calling [StartHistoryArchiveTasksInterval](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive.go#L323), which will in turn call [CreateAndSeedHistoryArchive](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive.go#L314).
From `CreateAndSeedHistoryArchive`, via chain of calls, we arrive at [createHistoryArchiveTorrent](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive_file.go#L55), which is key to understand how archives are created and to our proposal.
## Archive Creation
Archives are about archiving messages. Thus, we first need to find the messages relevant to the given (chat) community. This happens through *filters* and connected to them `topics`. It is probably enough to say, that there is some [[Unclarity about Waku filters and topics]], but for this discussion, it should be enough to assume, that we trust to correctly retrieve the community messages. `topics` are provided to the `createHistoryArchiveTorrent`, which is where the archives are built.
Recall that for each community, status-go uses two files: `index` metadata file, and the `data`.
The `data` file stores *protobuf*-encoded *archives*. Each archive describes a period of time given by `From` and `To` attributes (both Unix timestamps casted to `uint64`), which together with `ContentTopic` form the `Metadata` part of an archive:
```go
type WakuMessageArchiveMetadata struct {
From uint64
To uint64
ContentTopic [][]byte
}
```
> For clarity, we skip `protobuf`-specific fields and annotations.
In `createHistoryArchiveTorrent`, the messages are retrieved using [GetWakuMessagesByFilterTopic](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/persistence.go#L967). Then, the messages are bundled into chunks, where each chunk is max `30MB` big as given by the `maxArchiveSizeInBytes` constant. Messages bigger than `maxArchiveSizeInBytes` will not be archived.
Now, for each message chunk, an instance of [WakuMessageArchive](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/protobuf/communities.pb.go#L2152) (`wakuMessageArchive`) is created using [createWakuMessageArchive](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive_file.go#L343). `WakuMessageArchive` has the following definition:
```go
type WakuMessageArchive struct {
Metadata *WakuMessageArchiveMetadata
Messages []*WakuMessage
}
```
> again we strip `protobuf` attributes for clarity.
We see that the `Metadata` attribute of the `WakuMessageArchive` type is set to the `WakuMessageArchiveMetadata` defined above. For reference only, `WakuMessage` has the following definition:
```go
type WakuMessage struct {
Sig []byte
Timestamp uint64
Topic []byte
Payload []byte
Padding []byte
Hash []byte
ThirdPartyId string
}
```
The `WakuMessageArchive` is then encoded and encrypted, resulting in the final `encodedArchive`. The `rawSize := len(encodedArchive)` is then padded if necessary so that the archive size is aligned to the BitTorrent piece length (which is set to `100KiB`). The `encodedArchive` together with the padding information is then added to `encodedArchives` (`[]*EncodedArchiveData`). Finally, the resulting `size` and `padding` together with the current `offset` in the existing `data` file and the `Metadata` are used to create the corresponding archive index entry that later will be serialized to the `index` file:
```go
wakuMessageArchiveIndexMetadata := &protobuf.WakuMessageArchiveIndexMetadata{
Metadata: wakuMessageArchive.Metadata,
Offset: offset,
Size: uint64(size),
Padding: uint64(padding),
}
```
The archive index entry is encoded, and its hash is used as a key in the archive index map:
```go
wakuMessageArchiveIndexMetadataBytes, err := proto.Marshal(
wakuMessageArchiveIndexMetadata
)
archiveID := crypto.Keccak256Hash(wakuMessageArchiveIndexMetadataBytes).String()
wakuMessageArchiveIndex[archiveID] = wakuMessageArchiveIndexMetadata
```
> `wakuMessageArchiveIndex` is earlier initialized to contain existing archive index entries from the current `index` file. Here we are basically appending new archive meta to the archive index data structure.
We repeat the whole process for each message chunk in the given time period, adding more period if needed (recall, each period is 7 days long).
After that we have a list of new archives (in `encodedArchives`) and a new archive index entries. We are ready to be encoded and serialized to the corresponding `data` (by appending) and `index` files.
Finally, the corresponding torrent file is (re)created, the `HistoryArchivesCreatedSignal` is emitted, and the last message archive end date is recorded in the persistence.
The diagram below shows the relationships between the datatypes described above:
![[team-nl-br-design-1.svg]]
And then in the following diagram we show how the `index` and `data` files are populated, the corresponding torrent file and the magnet link:
![[team-nl-br-design-2.svg]]
## Archive Distribution and Download
All the nodes that want to restore the message history, first need to retrieve the `index` file. Here, the selective download of the selected files from the torrent is used. After having the index files, the nodes can find out which periods they need to retrieve. Using the `offset`, `size`, and `padding`, they use BitTorrent library, to selectively fetch only the torrent pieces that they need. In our Codex integration proposal, we suggest taking advantage of Codex CIDs to formally decouple archive index from archive data.
## Proposed Integration with Codex
First we propose changing the `WakuMessageArchiveIndexMetadata` type in the following direction. Instead of the `offset`, `size`, and `padding`, we suggest to refer to an archive by a Codex CID. Thus, instead of:
```go
type WakuMessageArchiveIndexMetadata struct {
Metadata *WakuMessageArchiveMetadata
Offset uint64
Size uint64
Padding uint64
}
```
we would have something like:
```go
type WakuMessageArchiveIndexMetadata struct {
Metadata *WakuMessageArchiveMetadata
Cid CodexCid
}
```
We then upload the resulting `index` to Codex under its own `index` CID. Instead of the magnet link, the community owner only publishes this `index` CID.
In order to receive the historical messages for the given period (given by `from` and `to` in the `WakuMessageArchiveMetadata`), the receiving node first acquires the `index` using the `index` CID. For each entry in the `index` that the node has interest in, the node then downloads the corresponding archive directly using the `Cid` from this `index` entry.
The diagram below shows the relationship between the new `index` identified by a Codex CID that uses individual CIDs to refer to each individual archive:
![[team-nl-br-design-3.svg]]
### Advantages
- clean and elegant solution - easier to maintain the archive(s) and the index,
- no dependency on the internals of the low level protocol used (like `padding`, `pieceLength`) - just nice and clean CIDs,
- reusing existing Codex protocol, no need to extend,
- Codex takes care for storage: no more `index` and `data` files: thus more reliable and less error prone.
### Disadvantages
- Because each archive receives its own CID which will to be announced on DHT. If this is considered problem, we may apply bundling, or using block ranges and publish the whole `data` under its own CID. Although less elegant, it still nicely decouples `index` from the `data`, but in this case we may need to expose an API to retrieve specific block index under given `treeCid`.
## Deployment
In the first prototype, we suggest to use Codex API in order to validate the idea and discover potential design flows early. After successful PoC, or already in parallel, we suggest building a Codex protocol library (stripped down from EC and marketplace), which will then be used to create GO bindings for the status-go integration. The same library should than also be used in the new Codex client.
## Long Term Durability Support
The current proposal builds on Codex and it will naturally scale towards adding stronger durability requirements with Codex Marketplace (and Erasure Coding). We consider this a long term path. In a mid-term, we consider increasing the level of durability by applying some of the element already captured in Ben's [Constellations](https://github.com/benbierens/constellations)
- 1 unchanging ID per community
- Taking care of CID dissemination
- Rough health metrics
- Owner/admin controls
- Useful for more projects than Status

View File

@ -0,0 +1,71 @@
This is the definition of the `ChatFilter` type:
```go
type ChatFilter struct {
chatID string
filterID string
identity string
pubsubTopic string
contentTopic ContentTopic
discovery bool
negotiated bool
listen bool
ephemeral bool
priority uint64
}
```
For each community, we can have a number of filters. This is how we get them.
```go
func (m *ArchiveManager) GetCommunityChatsFilters(communityID types.HexBytes) (messagingtypes.ChatFilters, error) {
chatIDs, err := m.persistence.GetCommunityChatIDs(communityID)
if err != nil {
return nil, err
}
filters := messagingtypes.ChatFilters{}
for _, cid := range chatIDs {
filter := m.messaging.ChatFilterByChatID(cid)
if filter != nil {
filters = append(filters, filter)
}
}
return filters, nil
}
```
Perhaps simplifying too much, a *filter* is basically a reference to a waku pubsub channel where the messages can be posted/received: all messages for all chats in that community. This seems to be related to [[Universal Topic Optimization in Waku (AI)]].
What is associated with a filter is a `contentTopic` and the filters are used to make sure we later get all relevant community messages. But here we get a lot of strange inconsistencies. First, if there suppose to be a single universal topic for the community, why do we still have multiple filters:
```go
filters, err := m.archiveManager.GetCommunityChatsFilters(c.ID())
if err != nil {
m.logger.Error("failed to get community chats filters for community", zap.Error(err))
continue
}
if len(filters) == 0 {
m.logger.Debug("no filters or chats for this community starting interval", zap.String("id", c.IDString()))
go m.archiveManager.StartHistoryArchiveTasksInterval(c, messageArchiveInterval)
continue
}
```
The why the universal filter is only included later, and never mapped to the corresponding topics?
```go
topics := []messagingtypes.ContentTopic{}
for _, filter := range filters {
topics = append(topics, filter.ContentTopic())
}
filter := m.messaging.ChatFilterByChatID(c.UniversalChatID())
if filter != nil {
filters = append(filters, filter)
}
```
It is kind of crucial, because the topics are effectively used to get the messages in [GetWakuMessagesByFilterTopic](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive_file.go#L125) (check the body of this function [here](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/persistence.go#L967-L1000)).

View File

@ -0,0 +1,86 @@
There is [a comment in the code base](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/community.go#L1584-L1589):
```go
// Using Member updates channelID as chatID to act as a universal content-topic
// for all chats in the community as explained here
// https://forum.vac.dev/t/status-communities-review-and-proposed-usage-of-waku-content-topics/335
```
See: [https://forum.vac.dev/t/status-communities-review-and-proposed-usage-of-waku-content-topics/335](https://forum.vac.dev/t/status-communities-review-and-proposed-usage-of-waku-content-topics/335)
This comment refers to a **universal topic optimization** that Status implemented to improve efficiency in the Waku messaging protocol. Let me explain how this relates to Waku:
### What is a ContentTopic in Waku?
In Waku, a **ContentTopic** is like a messaging channel identifier - it's a 4-byte value that tells Waku which messages are intended for which application features. Think of it as the "address" where specific types of messages should be delivered.
From the code I found, here's how ContentTopic works:
```go
// From messaging/waku/common/topic.go
func (t TopicType) ContentTopic() string {
enc := hexutil.Encode(t[:])
return "/waku/1/" + enc + "/rfc26"
}
```
This creates a topic like `/waku/1/0x12345678/rfc26` where `0x12345678` is the 4-byte topic.
### The Universal Topic Optimization
Originally, you might expect each community chat to have its own unique ContentTopic:
- `#general` → ContentTopic `0x12345678`
- `#random` → ContentTopic `0x87654321`
- `#dev` → ContentTopic `0xabcdef12`
But Status implemented a **universal topic optimization** where all chats in a community share the **same ContentTopic**. Here's how:
```go
// From protocol/communities/community.go lines 1584-1588
func (o *Community) UniversalChatID() string {
// Using Member updates channelID as chatID to act as a universal content-topic
// for all chats in the community as explained here
// https://forum.vac.dev/t/status-communities-review-and-proposed-usage-of-waku-content-topics/335
return o.MemberUpdateChannelID()
}
func (o *Community) MemberUpdateChannelID() string {
return o.IDString() + "-memberUpdate"
}
```
So instead of separate topics, **all community chats** use the same ContentTopic: `{communityID}-memberUpdate`.
### How This Works with Filters
When you join a community, Status creates **one Filter** (subscription) that receives messages for **all channels** in that community:
```go
// From messenger_community_chat_test.go
universalChatFilter := s.m.messaging.ChatFilterByChatID(community.UniversalChatID())
```
This single filter subscribes to the universal ContentTopic, and **all messages** from all channels in the community flow through this one subscription.
### Benefits of This Optimization
1. **Fewer Network Subscriptions**: Instead of subscribing to N different ContentTopics (one per channel), you subscribe to just 1
2. **Reduced Network Overhead**: Fewer pubsub subscriptions mean less network management
3. **Simplified Filter Management**: The FiltersManager only needs to track one filter per community instead of one per channel
4. **Better Message Delivery**: All community messages come through the same reliable channel
### How Messages Are Distinguished
Since all messages use the same ContentTopic, how does Status know which channel a message belongs to? The **message payload** contains the actual destination channel information. The universal topic is just for **transport optimization** - the application logic still correctly routes messages to the right channels.
### Relation to Waku Protocol
This optimization is specifically about how Status uses Waku's **pubsub topics and content filtering**:
- **PubsubTopic**: The "physical" network topic (like a network shard)
- **ContentTopic**: The "logical" application channel (like an app feature)
- **Filter**: The subscription that tells Waku "I want messages from this ContentTopic"
By using one ContentTopic for all community channels, Status reduces the load on Waku's **content filtering system** while maintaining the logical separation of channels at the application layer.
This is a clever optimization that maintains user experience (separate channels) while improving network efficiency (shared transport).

View File

@ -0,0 +1,154 @@
Seeding is the BitTorrent process where a node that has complete archive files makes them available for other peers to download. In Status-go, this happens when:
1. **Community control nodes** create new history archives
2. **Any node** successfully downloads archives from other peers
## When Does Seeding Happen?
### 1. **After Creating New Archives (Control Nodes Only)**
In `SeedHistoryArchiveTorrent`, community control nodes seed archives after creating them:
```go
func (m *ArchiveManager) SeedHistoryArchiveTorrent(communityID types.HexBytes) error {
m.UnseedHistoryArchiveTorrent(communityID)
id := communityID.String()
torrentFile := torrentFile(m.torrentConfig.TorrentDir, id)
metaInfo, err := metainfo.LoadFromFile(torrentFile)
if err != nil {
return err
}
info, err := metaInfo.UnmarshalInfo()
if err != nil {
return err
}
hash := metaInfo.HashInfoBytes()
m.torrentTasks[id] = hash
if err != nil {
return err
}
torrent, err := m.torrentClient.AddTorrent(metaInfo)
if err != nil {
return err
}
torrent.DownloadAll()
m.publisher.publish(&Subscription{
HistoryArchivesSeedingSignal: &signal.HistoryArchivesSeedingSignal{
CommunityID: communityID.String(),
},
})
magnetLink := metaInfo.Magnet(nil, &info).String()
m.logger.Debug("seeding torrent", zap.String("id", id), zap.String("magnetLink", magnetLink))
return nil
}
```
This happens when:
- **Periodic archiving**: Control nodes regularly create archives of community message history
- **Manual archiving**: When explicitly triggered through `InitHistoryArchiveTasks`
### 2. **After Downloading Archives Successfully**
In [DownloadHistoryArchivesByMagnetlink](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive.go#L486) around line 644, seeding starts after successful downloads:
```go
// After downloading all archives
m.publisher.publish(&Subscription{
HistoryArchivesSeedingSignal: &signal.HistoryArchivesSeedingSignal{
CommunityID: communityID.String(),
},
})
```
This happens when:
- **Regular members** download archives via magnetlinks
- **New community members** download historical messages
- **Nodes re-downloading** updated archives
### 3. **Seeding Signal Processing**
When seeding starts, it triggers magnetlink message distribution in `handleCommunitiesHistoryArchivesSubscription`:
```go
if sub.HistoryArchivesSeedingSignal != nil {
// Signal UI that seeding started
m.config.messengerSignalsHandler.HistoryArchivesSeeding(
sub.HistoryArchivesSeedingSignal.CommunityID
)
// Get community info
c, err := m.communitiesManager.GetByIDString(
sub.HistoryArchivesSeedingSignal.CommunityID
)
// Only control nodes send magnetlink messages
if c.IsControlNode() {
err := m.dispatchMagnetlinkMessage(
sub.HistoryArchivesSeedingSignal.CommunityID
)
// This broadcasts the magnetlink to community members
}
}
```
## Seeding Lifecycle
### **Start Seeding**
1. Archives are created or downloaded successfully
2. `HistoryArchivesSeedingSignal` is published
3. BitTorrent client starts serving the files
4. Control nodes broadcast magnetlink messages
### **Continue Seeding**
- Files remain available as long as the node is online
- Other peers can download from multiple seeders simultaneously
- BitTorrent automatically manages upload bandwidth
### **Stop Seeding**
In `UnseedHistoryArchiveTorrent`:
```go
func (m *ArchiveManager) UnseedHistoryArchiveTorrent(communityID types.HexBytes) {
// Remove torrent from client
torrent := m.torrentClient.Torrent(infoHash)
if torrent != nil {
torrent.Drop()
}
// Publish unseeding signal
m.publisher.publish(&Subscription{
HistoryArchivesUnseededSignal: &signal.HistoryArchivesUnseededSignal{
CommunityID: communityID.String(),
},
})
}
```
This happens when:
- **New archives replace old ones**: When fresher magnetlinks are received
- **Community leaves**: When a user leaves a community
- **Manual stop**: When archiving is disabled
## Key Points
- **Only control nodes send magnetlinks**, but **any node can seed** after downloading
- **Seeding is automatic** - happens immediately after successful archive creation or download
- **Multiple seeders improve availability** - BitTorrent's distributed nature means more seeders = better download speeds
- **Seeding triggers UI notifications** via `SendHistoryArchivesSeedin`.
This distributed seeding model ensures that community history archives remain available even if the original control node goes offline, making the system resilient and scalable.

View File

@ -0,0 +1,66 @@
Magnetlink messages are sent when a community's control node (the community owner) finishes seeding history archives. Here's the complete flow:
### 1. **Triggering Event: Archive Seeding Completion**
Magnetlink messages are sent when a `HistoryArchivesSeedingSignal` is triggered. This happens in two scenarios:
1. **After Creating New Archives**: In [SeedHistoryArchiveTorrent](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive.go#L408) around line 438, when the control node finishes creating and starts seeding new history archives.
2. **After Downloading Archives**: In [DownloadHistoryArchivesByMagnetlink](https://github.com/status-im/status-go/blob/6322f22783585474803cfc8a6f0a914757d763b5/protocol/communities/manager_archive.go#L486) around line 645, when archives are successfully downloaded and seeding begins.
### 2. **Message Dispatch Logic**
The actual sending happens in `handleCommunitiesHistoryArchivesSubscription` at lines 246-259:
```go
if sub.HistoryArchivesSeedingSignal != nil {
m.config.messengerSignalsHandler.HistoryArchivesSeeding(
sub.HistoryArchivesSeedingSignal.CommunityID
)
c, err := m.communitiesManager.GetByIDString(
sub.HistoryArchivesSeedingSignal.CommunityID
)
if err != nil {
m.logger.Debug(
"failed to retrieve community by id string",
zap.Error(err)
)
}
if c.IsControlNode() {
err := m.dispatchMagnetlinkMessage(
sub.HistoryArchivesSeedingSignal.CommunityID
)
if err != nil {
m.logger.Debug(
"failed to dispatch magnetlink message",
zap.Error(err)
)
}
}
}
```
### 3. **Key Conditions**
- **Only Control Nodes**: Only the community owner can send magnetlink messages
- **After Seeding**: Messages are only sent after archives are successfully seeded and available for download
### 4. **Message Creation and Sending**
The `dispatchMagnetlinkMessage` function (lines 4093-4138):
1. **Gets the magnetlink**: Calls `m.archiveManager.GetHistoryArchiveMagnetlink(community.ID())` to generate the magnetlink from the torrent file
2. **Creates the message**: Builds a `CommunityMessageArchiveMagnetlink` protobuf message with current timestamp and magnetlink URI
3. **Sends publicly**: Broadcasts the message to the community's magnetlink channel using `m.messaging.SendPublic()`
4. **Updates clocks**: Updates both the community description and magnetlink message clocks
### 5. **Message Content**
The magnetlink message contains:
- **Clock**: Current timestamp
- **MagnetUri**: The BitTorrent magnetlink for downloading the archives
- **Message Type**: `COMMUNITY_MESSAGE_ARCHIVE_MAGNETLINK`
### Summary
Magnetlink messages are sent **automatically by community control nodes whenever they finish seeding new history archives**. This ensures that community members are immediately notified when new archive data becomes available for download via BitTorrent, enabling efficient peer-to-peer distribution of community message history.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 58 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 66 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 42 KiB