The global shard with index 0 and the "all app protocols" range are treated in the same way,
but choosing shards in the global cluster has a higher probability of sharing the shard with other apps.
This offers k-anonymity and better connectivity, but comes at a higher bandwidth cost.
The name of the pubsub topic corresponding to a given static shard is specified as
`/waku/2/rs/<cluster_id>/<shard_number>`,
an example for the 2nd shard in the global shard cluster:
`/waku/2/rs/0/2`.
> *Note*: Because *all* shards distribute payload defined in [14/WAKU2-MESSAGE](spec/14/) via [protocol buffers](https://developers.google.com/protocol-buffers/),
the pubsub topic name does not explicitly add `/proto` to indicate protocol buffer encoding.
We use `rs` to indicate these are *relay shard* clusters; further shard types might follow in the future.
From an app point of view, a subscription to a content topic `waku2/xxx` on a static shard would look like:
`subscribe("/waku2/xxx", 43)`
for global shard 43.
And for shard 43 of the Status app (which has allocated index 16):
`subscribe("/waku2/xxx", 16, 43)`
### Discovery
Waku v2 supports the discovery of peers within static shards,
so app protocols do not have to implement their own discovery method.
The value is comprised of a two-byte shard cluster index in network byte order concatenated with a 128-byte wide bit vector.
The bit vector indicates which shards of the respective shard cluster the node is part of.
The right-most bit in the bit vector represents shard `0`, the left-most bit represents shard `1023`.
The representation in the ENR is inspired by [Ethereum shard ENRs](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/validator.md#sync-committee-subnet-stability)),
and [this](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/validator.md#sync-committee-subnet-stability)).
Example:
| key | value |
|--- |--- |
| `rsv` | 16u16 |`0x[...]0000100000003000` |
The `[...]` in the example indicates 120 `0` bytes.
This example node is part of shards `13`, `14`, and `45` in the Status main-net shard cluster (index 16).
(This is just for illustration purposes, a node that is only part of three shards should use the index list method specified above.)
## Automatic Sharding
Autosharding selects shards automatically and is the default behavior for shard choice.
The other choices being static and named sharding as seen in previous sections.
Shards (pubsub topics) SHOULD be computed from content topics with the procedure below.
#### Algorithm
Hash using Sha2-256 the concatenation of
the content topic `application` field (UTF-8 string of N bytes) and
the `version` (UTF-8 string of N bytes).
The shard to use is the modulo of the hash by the number of shards in the network.
#### Example
| Field | Value | Hex
|--- |--- |---
| `application` | "myapp"| 0x6d79617070
| `version` | "1" | 0x31
| `network shards`| 8 | 0x8
- SHA2-256 of `0x6d7961707031` is `0x8e541178adbd8126068c47be6a221d77d64837221893a8e4e53139fb802d4928`
-`0x8e541178adbd8126068c47be6a221d77d64837221893a8e4e53139fb802d4928` MOD `8` equals `0`
- The shard to use has index 0
### Content Topics Format for Autosharding
Content topics MUST follow the format in [23/WAKU2-TOPICS](https://rfc.vac.dev/spec/23/#content-topic-format).
In addition, a generation prefix MAY be added to content topics.
When omitted default values are used.
Generation default value is `0`.
- The full length format is `/{generation}/{application-name}/{version-of-the-application}/{content-topic-name}/{encoding}`
- The short length format is `/{application-name}/{version-of-the-application}/{content-topic-name}/{encoding}`
#### Example
- Full length `/0/myapp/1/mytopic/cbor`
- Short length `/myapp/1/mytopic/cbor`
#### Generation
The generation number monotonously increases and indirectly refers to the total number of shards of the Waku Network.
<!-- Create a new RFC for each generation spec. -->
#### Topic Design
Content topics have 2 purposes: filtering and routing.
Filtering is done by changing the `{content-topic-name}` field.
As this part is not hashed, it will not affect routing (shard selection).
The `{application-name}` and `{version-of-the-application}` fields do affect routing.
Using multiple content topics with different `{application-name}` field has advantages and disadvantages.
It increases the traffic a relay node is subjected to when subscribed to all topics.
It also allows relay and light nodes to subscribe to a subset of all topics.
### Problems
#### Hot Spots
Hot spots occur (similar to DHTs), when a specific mesh network (shard) becomes responsible for (several) large multicast groups (content topics).
The opposite problem occurs when a mesh only carries multicast groups with very few participants: this might cause bad connectivity within the mesh.
The current autosharding method does not solve this problem.
> *Note:* Automatic sharding based on network traffic measurements to avoid hot spots in not part of this specification.
#### Discovery
For the discovery of automatic shards this document specifies two methods (the second method will be detailed in a future version of this document).
The first method uses the discovery introduced above in the context of static shards.
The second discovery method will be a successor to the first method,
but is planned to preserve the index range allocation.
Instead of adding the data to the ENR, it will treat each array index as a capability,
which can be hierarchical, having each shard in the indexed shard cluster as a sub-capability.
When scaling to a very large number of shards, this will avoid blowing up the ENR size, and allows efficient discovery.
We currently use [33/WAKU2-DISCV5](https://rfc.vac.dev/spec/33/) for discovery,
which is based on Ethereum's [discv5](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md).
While this allows to sample nodes from a distributed set of nodes efficiently and offers good resilience,
it does not allow to efficiently discover nodes with specific capabilities within this node set.
Our [research log post](https://vac.dev/wakuv2-apd) explains this in more detail.
Adding efficient (but still preserving resilience) capability discovery to discv5 is ongoing research.
[A paper on this](https://github.com/harnen/service-discovery-paper) has been completed,
but the [Ethereum discv5 specification](https://github.com/ethereum/devp2p/blob/master/discv5/discv5-theory.md)
has yet to be updated.
When the new capability discovery is available,
this document will be updated with a specification of the second discovery method.
The transition to the second method will be seamless and fully backwards compatible because nodes can still advertise and discover shard memberships in ENRs.