Sphinx packet format
====================

Sphinx is concrete packet format for mixnets, with the following goals:

- compact (small overhead over the payload)
- hiding the path length and relay position
- unlinkability between the legs
- indistinguishability of forward and reply packets
- provable security

The main trick Sphinx achieves compactness is re-using a single public key (via a blinding mechanism) for each leg.

In this note we describe the Sphinx packet format as specified in the 2009 paper "Sphinx: A Compact and Provably Secure Mix Format" by George Danezis and Ian Goldberg. In a separate document we will propose some modifications.

#### Links

- the [Sphinx paper](https://cypherpunks.ca/~iang/pubs/Sphinx_Oakland09.pdf)
- a [nice talk on youtube](https://www.youtube.com/watch?v=34TKXELJa2c) about Sphinx

### Security parameters

We denote the intended security level by $\lambda$ as usual (instead of $\kappa$ as in the paper). By default we target $\lambda=128$ bits of security. Hence:

- private keys are of size $p = 2\lambda$, that is, 32 bytes
- as we use elliptic curve groups, public keys are the same size
- symmetric keys are size $s=\lambda$, that is, 16 bytes
- MACs are also of size $\lambda$, that is, 16 bytes

A further parameter is maximum number of hops (as all packets need to have the same size, you need to pre-agree on this). This is denoted by $r$, and a good recommended default value is $r = 5$.

#### Elliptic curve

For public key cryptography, we use elliptic curves as they are compact (unlike for example RSA), and also conceptually simple.

We denote the elliptic curve group by $\mathbb{G}$, its fixed generator by $\mathbf{g}\in \mathbb{G}$, and the corresponding scalar field by $\mathbb{F}_q \cong \mathbb{Z}_q$. The elliptic curve scalar multiplication is denoted by $*:\mathbb{Z}_q \times \mathbb{G} \to \mathbb{G}$.

The standard curve choice is [Curve25519](https://en.wikipedia.org/wiki/Curve25519).

#### Symmetric primitives

- we will need a MAC; a standard choice is HMAC-SHA256 truncated to $\lambda=128$ bits
- we will need a (set of) key derivation function(s) $\mathsf{KDF}$. A standard choice is SHA256 with a domain separation; eg. $\mathsf{KDF}_{\mathsf{MAC}}(x):=H(\texttt{"MAC"}\|x)$, truncated as necessary
- we need a pseudo-random stream generator, which will be used to encrypt the routing information. Usually this is AES128-CTR (that is, AES in [counter mode](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CTR)), but it could be also a XOF like SHAKE128. Below we will denote this by $\mathsf{XOF}(k)$ where $k$ is the key.
- we also need a pseudo-random permutation (sometimes called a "large block cipher") to encrypt the payload. In particular, one **SHOULD NOT USE** a standard block cipher in counter mode for this. Note: This is kind of tricky, see more about this below. The default choice for the PRP is Lioness (built on various primitives, for example: the original Lioness paper used SHA1 + SEAL; many implementation use Blake2b + ChaCha20; and of course one could also use SHA256 + AES128).
 
Remark: While this is what the original Sphinx paper needs, there may be tweaks of the design, using more modern symmetric primitives, eg. AEAD to replace the problematic pseudo-random permutation. See below, near the bottom.

## Packet format

A Sphinx packet consists of four parts, traditionally denoted by $\alpha$, $\beta$, $\gamma$, and $\delta$:

- $\alpha$ is a (blinded) public key (group element; $\alpha\in\mathbb{G})$.
- $\beta$ is the (encrypted) routing information
- $\gamma$ is a MAC (message authentication code)
- $\delta$ is the (encrypted) payload

The first three of these, $\mathcal{H}=(\alpha,\beta,\gamma)$ is called the header.

Observation: Because we need to allow for the user to create reply paths in advance (SURBs, or single-use reply blocks), where the payload is not yet known, we cannot use traditional "onion" encryption; instead, the header and the payload must be separate. However, the payload will be still encrypted in several layers, and its integrity is protected (if implemented properly).

### Constructing a packet

First, the user selects a random path of mix nodes $n_0,\dots n_{\ell-1}$ with $\ell\le r$. We will also need a final destination $\Delta_\mathrm{final}$; and in case of replies, a message identifier $J$. 

#### Computing the shared secrets

We assume all mix nodes have a long-term private-public keypair $(\mathsf{sk}_i,\mathsf{pk}_i)$.

The user first generates an ephemeral secret key $x$; in practice this is just a random number $x\in\mathbb{Z}_q^\times$. It can then iteratively derive a sequence of keys (one set per hop) consisting of:

- a per-node secret key $x_i\in\mathbb{Z}_q^\times$
- a per-node public key $\alpha_i:=x_i*\mathbf{g}\in \mathbb{G}$
- a per-node shared secret $s_i\in \mathbb{G}$, derived using  Diffie-Hellman: $s_i:=x_i*\mathsf{pk}_i = \mathsf{sk}_i*\alpha_i$
- a blinding factor $b_i\in\mathbb{Z}_q^\times$, computed from $\mathsf{KDF}_{\mathsf{blind}}(\alpha_i,s_i)$ 

Remark: The blinding factor is supposed to be in $\mathbb{Z}_q^\times$ (recall that $q\approx 2^{256}$). We can ensure this either by simply taking it modulo $q$, or more properly by rejection sampling a deterministic sequence. Implementations however usually simply say $b_i:=H(\alpha_i,s_i)$ where $H$ is eg. SHA256. Same care should be taken in an actual implementation to avoid possible corner cases.

Remark \#2: With X25519, not all scalar field elements are valid secret keys. However, we cannot simply apply the standard "masking" algorithm to the  blinded private key, because the mix nodes doing the processing won't have access to the secret keys... At least the cofactor 8 subgroup is kept invariant by multiplication. I think that the remaining of the mask is to ensure uniformity of random secret keys. So this is probably still OK at the end.

The key sequence is defined iteratively:

- $x_0:=x$
- $x_{i+1}:=\mathsf{mask}(b_i\cdot x_i)$

All the rest can be computed from $x_i$:

- the public key is $\alpha_i = x_i * \mathbf{g}$
- the shared secret is $s_i=x_i*\mathsf{pk}_i$ 
- and the blinding factor is $b_i=H(\alpha_i,s_i)$

The idea behind this construction is that each hop will have a unique sender public key, from which a shared secret can be derived, and then further symmetric keys for MAC and encryption via a KDF. When composing the forwarded packet for the next hop, the node can then "tweak" this public key using their own blinding factor: $\alpha_{i+1}=b_i * \alpha_i$. 

Thus each hop can only decrypt their own header, and if they try to break the protocol, the message will be ruined.

#### Size of a mix header

The size of the mix header must be constant (otherwise, mix nodes could guess where they are in the path), and its processing uniform (except for the final hop).

With the usual parameters, we have $|\alpha|=32$ and $|\gamma|=16$. Thus $N_\beta:=|\beta|$ determines the header size. This must be big enough to fit $(r-1)$ mix node addresses $A_i$, and also $(r-1)$ MACs $\gamma_i$ (both are the set $i>0)$, and the destination address and message id pair $(\Delta,J)$.

In the original Sphinx paper, we have

- $|A_i|=|\gamma_i|=|J|=\lambda$
- $|\Delta|=2\lambda$

However, after careful consideration, one can see that in fact there isn't any restriction on these sizes, in fact they could be even non-uniform (except that that would break our goal stated above). So in the paper we have $N_\beta = 2(r-1)\lambda + 3\lambda =(2r+1)\lambda$, but instead of counting all these $\lambda$ factors, it's _much easier_ to think about a fixed $N_\beta$ which is big enough to fit what we want it to encode.

#### Fillers (header padding)

In layered, "onion" routing, each node removes a layer of encryption, and forwards the resulting payload. Done naively, this would result both the headers and the packets to decrease in size while travelling through the mix path. This is obviously bad; we need to ensure the headers are of uniform size.

However, we also need to ensure that at all hops, the processing of the header and payload is exactly the same (otherwise nodes could figure out where they are in the path). These requirements result in the following, somewhat convoluted construction.

The filler strings $\phi_i \in \{0,1\}^{2\lambda i}$ are also constructed iteratively:

- $\phi_0$ is the empty string
- to construct $\phi_{i+1}$, take $\phi_i$, append $2\lambda$ zero bits (in our case, 32 zero bytes), and XOR the resulting string with the last $2\lambda (i+1)$ bits of the (fixed size) random stream derived from the shared secret $s_i$:

$$
\begin{align*}
  e_i &:= \mathsf{KDF}(\texttt{"route-enc-key"}\|s_i) \\
  \widetilde\phi_{i+1} &:= [\,\phi_i \;\|\; 0^{2\lambda}\,] \\
  \phi_{i+1} &:= \widetilde\phi_{i+1} \;\oplus\; \mathsf{XOF}(e_i)_{(N_\beta-|\phi_i|\dots N_\beta+2\lambda)} = \mathsf{encdec}_i^\rho (\widetilde\phi_{i+1})
\end{align*}
$$

where the starting offset in the PRG stream is $\rho_{i+1} = N_\beta- |\phi_i|$. Note: We use the convention that a tilde denotes the unencrypted version. Here $2\lambda=|A_{i+1}|+|\gamma_{i+1}|$ comes from the mix node address $A_{i+1}$ and MAC $\gamma_{i+1}$

One reason for all this complication is that the headers are constructed _backwards_, but the fillers must be constructed _forwards_. More importantly, we also need to replace the truncated bits while processing and forwarding mix packets; that's why we have the $2\lambda$ zero bits at the end (it could be any constant, zero is just the simplest). Note that this replacement must reconstruct the header sequence exactly, otherwise the MACs won't match!

Note that we use the key $e_i$ to encrypt $\phi_{i+1}$! This is a subtle but important point. The reason is that the $i$-th node must reconstruct it before forwarding to the $i+1$-th node, and they only know their own key $e_i$.


#### Constructing the header

We now want to construct the initial message header $\mathcal{H}_0=(\alpha_0,\beta_0,\gamma_0)$. This needs to encode the routing information in encrypted layers. So, we will do it iteratively, backwards from the last hop's header $\mathcal{H}_{\ell-1}=(\alpha_{\ell-1},\beta_{\ell-1},\gamma_{\ell-1})$. 

Sidenote: In the paper these headers are denoted by $M_i$, but that's a bit confusing as the reader could think "message" instead.

Apart from the mix path, we also require a destination address $\Delta$, and message identifier $J\in\{0,1\}^\lambda$. These are only used in replies ($\Delta$ would be our address, and $J$ is to identify the reply among several); they don't seem to play a role in forward messages (apart from detecting being the exit node). The destination $\Delta$ must fit into $2\lambda (r-\ell+1)$ bits; for $r=\ell$ this means $2\lambda$ (32 bytes), and to simplify we will assume that, ie. $\Delta\in\{0,1\}^{2\lambda}$.

Remark: In the current Logos implementation, $|\Delta|=|A_i|$ is actually somewhat bigger (but still bounded), ~~48~~ 96 bytes (?). It's relatively straightforward to generalize this construction to allow for that (though 96 bytes instead of 16 is extremely wasteful, as it's get multiplied by the maximum number of hops $r=5$...).

- $\alpha_i\in\mathbb{G}\subset \{0,1\}^{2\lambda}$ is just the per-hop public key computed above
- $\beta_i \in \{0,1\}^{N_\beta}$ is the layer-by-layer encrypted routing information
- $\gamma_i \in \{0,1\}^\lambda$ is the MAC of $\beta_i$, computed with a mac key derived from the shared secret $s_i$

Question: Why isn't $\alpha_i$ also included in the MAC? I guess it's not really necessary, as if $\alpha_i$ is tampered with, then decryption and everything else will fail?

We will need several symmetric keys, all of size $\lambda$; these are all derived from the shared secret:

- $m_i := \mathsf{KDF}(\texttt{"mac-key"}\;\|\;s_i)$ is the MAC key
- $e_i := \mathsf{KDF}(\texttt{"route-enc-key"}\;\|\;s_i)$ is the key use to encrpytion the routing info 
- $\mathrm{V}_i := \mathsf{KDF}(\texttt{"route-enc-iv"}\;\|\;s_i)$ is the initialization vector (when using AES or some other block cipher as our PRG stream generator)

Let $\mathsf{encdec}_i(\widetilde X):= \widetilde X \oplus \mathsf{XOF}(e_i)$ be a XOR-based stream cipher with the encryption key $e_i$; and similarly $\mathsf{MAC_i}(X)$ is a message authentication code with key $m_i$.

Then we construct the headers iteratively, starting from the final hop:

- $\widetilde \beta_{\ell-1} := ( \, \Delta \;\|\; J \;\|\; 0^{\textrm{pad}} \, )  \;\big\|\; \mathsf{encdec}^\rho_{\ell-1}(\phi_{\ell-1})$
- $\beta_{\ell-1} := \mathsf{encdec}_{\ell-1}(\widetilde\beta_{\ell-1}) = \mathsf{encdec}_{\ell-1}( \, \Delta \;\|\; J \;\|\; 0^{\textrm{pad}} \, ) \;\big\|\; \phi_{\ell-1}$

And then going backward:

- $\widetilde\beta_{i}:= [\, A_{i+1} \;\|\; \gamma_{i+1} \;\|\; \mathsf{trunc}(\beta_i) \,]$
- $\beta_i := \mathsf{encdec}_i(\widetilde\beta_i)$
- $\gamma_i := \mathsf{MAC}_{i}(\beta_i)$

where $A_i\in\{0,1\}^\lambda$ (only 16 bytes, unlike $\Delta$!) is the address of the $i$-th node in the path. Note: When computing $\beta_{i-1}$, the $\beta_i$ coming from the next hop is truncated (the last $2\lambda$ bits is discarded), so that's where the address and MAC fits in. This is not a problem because as we remove the layers, less and less routing information is required; the actual useful content of the final header is only $(\Delta,J)$.

It's not that useful to have a picture of the sizes of the various components here (because it becomes much easier when abstracted away from these fixed sizes), in any case:

- $|\Delta| = 2\lambda$
- $|J| = |\gamma_i| = \lambda$
- $|\phi_i| = 2\lambda i$
- $|\beta_{\ell-1}| = 3\lambda + 2(r-1)\lambda = (2r+1)\lambda = N_\beta$
- $|A_i| = \lambda$
- $|\beta_i| = 2\lambda + (2r-1)\lambda = (2r+1)\lambda = |\beta_{\ell-1}|$

#### Creating a forward message

Let the message to be sent be $\mathtt{msg}\in\{0,1\}^N$, and the final destination (the intended receiver) by $\Delta_{\mathrm{final}}\in\{0,1\}^{2\lambda}$. Note: $N$ must be a constant, so that each packet has the same size.

The header $\mathcal{H}_0$ is computed as above, setting $\Delta=0^{2\lambda}$ and $J=0^{\lambda}$. We also need the per-hop shared secrets $s_i$. From these, we calculate the payload encryption keys:

$$
k_i := \mathsf{KDF}( \texttt{"payload-key"} \; \| \; s_i)
$$

Now we can compute the encrypted payload iteratively:

- $P:=[\,0^\lambda\;\|\;\Delta_{\textrm{final}}\;\|\;\mathtt{msg}\,]$
- $\delta_{\ell-1} := \mathsf{ENC}(\,k_{\ell-1};\; P )$
- $\delta_{i-1} := \mathsf{ENC}(\,k_{i-1};\; \delta_i )$

The final packet is $(\alpha_0,\beta_0,\gamma_0,\delta_0)$. 

This has size $2\lambda+(2r+1)\lambda+\lambda+(3\lambda+N) = N + (2r+7)\lambda$. So for the default parameter $\lambda=128$ and $r=5$, the overhead is $17\times 16 = 272$ bytes (the header being $224$ bytes and the integrity check + destination $16+32 = 48$ bytes).

Note that the payload encryption function $\mathsf{ENC}$ is (and in fact must be) very different from the routing encryption/decryption function $\mathsf{encdec}$!

#### Creating a reply message

First, the original sender creates a SURB (single use reply block). To do this, they first pick a random path $A_0,\dots,A_{\ell-1}$ and compute a message header $\mathcal{H}_0$ as above, with $\Delta$ being their own address (though technically it could be somebody else too), and $J$ a random message identifer.

Pick a random symmetric key $K\in\{0,1\}^\lambda$, and save the following key mapping in some local table:

$$
J \mapsto (K,k_0,\dots,k_{\ell-1})
$$

where the encryption keys $k_i = \mathsf{KDF}( \texttt{"payload-key"} \| s_i)$ are the same as above.

The SURB is the triple

$$ \mathsf{SURB} := (A_0,\mathcal{H}_0,K)$$

This has the size $(2r+6)\lambda$, in our case 256 bytes.

To compose a reply message using a SURB, the sender encrypts their payload $\mathtt{reply}\in\{0,1\}^{(N+2\lambda)}$ into $\delta := \mathsf{ENC}(K;\, [0^{\lambda} \,\|\, \mathtt{reply}])$ and sends the packet $(\mathcal{H}_0,\delta)$ to the node with address $A_0$.

## Processing mix packets


Upon receiving a mix packet, a node should do the following:

0. check the packet size to conform the expected fixed size, and split it into $(\alpha,\beta,\gamma,\delta)$
1. check if the first 32 bytes correspond to a valid group element $\alpha\in\mathbb{G}$ (apparently this is a no-op for Curve25519 in the "X=x/z" representation we use here)
2. compute the shared secret $s = x*\alpha \in \mathbb{G}$, where $x\in \mathbb{Z}_q^\times$ is our long-term secret key 
3. check if this shared secret was seen before by looking up it's hash $H(s)$ in a table we keep. Reject if it was seen before.
4. recompute the MAC of $\beta$ using the derived MAC key $m_i=\mathsf{KDF}(\texttt{"mac-key"}\|s)$, and compare it with $\gamma$. Reject if they differ
5. decrypt the routing info $B:=\mathsf{DEC}(e_i;\,[\beta\|0^{2\lambda}])$ using the derived encryption key $e_i=\mathsf{KDF}(\texttt{"route-enc-key"}\|s)$
6. Parse the address in the first $\lambda$ bits of $B$: If it's all zeros, then we are the exit node of a forward message. If it's an address of a mix node, then we have to forward. Otherwise we are the exit node of a reply message, and this address is the recipient.
8. Remove a layer of encryption from the payload $\delta' := \mathsf{DEC}(k;\, \delta)$ using 
the payload encryption key $k := \mathsf{KDF}( \texttt{"payload-key"} \| s)$

If we are the exit node of a forward message:

- Check if the first $\lambda$ bits of $\delta'$ are zero. Reject if not
- Extract the destination address $\Delta$ from the payload (bits $[\lambda\dots (3\lambda-1)]$ of $\delta'$)
- If it looks like a valid network address, send them the remaining payload

If we are the exit node of a reply message:

- It's basically the same as above, except that we also need to extract $J$ and send it together with the payload
- We also don't have to remove the $\Delta$ piece from the payload (as it's not there)
 
If we have to forward it:

- compute the blinding factor $b = \mathsf{KDF}_{\mathsf{blind}}(\alpha,s)$
- compute $\alpha' = b * \alpha\in\mathbb{G}$
- let $\gamma'$ be the second $\lambda$ bits of $B$
- let $\beta'$ be the remaining bits of $B$ (recall that before decrypting $\beta$, we appended $2\lambda$ zero bits; thus $\beta'$ is the right size!)
- forward $(\alpha',\beta',\gamma',\delta')$ to the next mixed node, whose address was in the first $\lambda$ bits of $B$

#### Decrypting replies

Unlike forward message payloads, replies are encrypted; see above. (Of course the sender can encrypt the forward messages themselves, eg. if they know a public key or shared secret with the intended recipient).

To decrypt, the recipient should look up the message id $J$ in their local table, and find the $\ell+1$ symmetric keys $(K,k_0,\dots,k_{\ell-1})$.

The recipient of the reply message can decrypt the payload $\delta$ with

$$
\mathsf{DEC}(K;\; \mathsf{ENC}(k_0;\; \mathsf{ENC}(k_1;\;
\dots \mathsf{ENC}(k_{\ell-1};\;\delta)\dots)))
$$

If the first $\lambda$ bits after decryption are zero, then message is consired valid.

### Message integrity

A subtle part of this protocol is message integrity. 

This is ensured by prepending $\lambda$ zero bits to any payload, and using a "large block cipher", or pseudo-random permutation (PRP) to encrypt the payload in layers:

$$\pi : \{0,1\}^\lambda \times \{0,1\}^{N+3\lambda} \to \{0,1\}^{N+3\lambda} $$

This means that $\pi(k)$ should be indistinguishable from a random permutation (of $2^{N+3\lambda}$, so not permuting _the bits_, but permuting _all possible payloads_!)

Encryption and decryption is then simply

$$
\begin{align*}
\mathsf{ENC}(k;\delta) &:= \pi(k)(\delta) \\
\mathsf{DEC}(k;\delta) &:= \pi^{-1}(k)(\delta) 
\end{align*}
$$

With such a construction, if any single (or more) bit(s) of the payload is flipped, then after applying the permutation, the whole ciphertext should be uncorrelated with the unmodified one. This means that if somebody tries to modify the payload in route, at the end the bits which are supposed to be zeros won't be.

In particular, this most assuredly _doesn't work_ for AES-CTR or any other block cipher in counter mode!

While AES itself is a pseudo-random permutation, it can only permute blocks of fixed size, namely 128 bits. That unfortunately isn't helpful here.

#### Possible choices for a PRP

While an arbtirary sized PRP is a very powerful building block, there appears fewer choices than for block ciphers.

- the paper recommends LIONESS; see the paper ["Two practical and provabliy secure block ciphers: BEAR and LION"](https://www.cl.cam.ac.uk/archive/rja14/Papers/bear-lion.pdf). These are built using the [Luby-Rackoff construction](https://omereingold.wordpress.com/wp-content/uploads/2014/10/lr.pdf)
- see also [this paper](https://arxiv.org/abs/1105.0259) on the security of these schemes
- [AEZ](https://web.cs.ucdavis.edu/~rogaway/aez/aez.pdf) appears to have an arbitrary-sized block cipher, and should be fast, but I find the paper hard to decipher, and probably has less analysis.
- see the paper ["Improving the Sphinx Mix Network"](https://www.bartmennink.nl/pubs/16cans.pdf) about modifying the Sphinx construction to use alternative symmetric constructions (eg. AE) (???)
 
### Variations

The above is basically what is described in the Sphinx paper. But for different use cases, some modifications could be useful.

#### Larger address size

Logos Mix is built on top of libp2p, so the natural address format is libp2p addresses. These are apparently not just an IP address, port number and protocol selector (eg. TCP, UDP, Quic), but also include a ["PeerID"](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md) (here limited to 39 bytes, but in general it could be longer). Furthermore, a suggested average delay is also encoded in the address field. 

In any case, because of the PeerID this doesn't fit into 32 bytes, but just fits into 48 bytes.

Let allow an address size of $t\times \lambda$, for both mix nodes and final destination (with $t\ge 2$). In this case, the pieces change like this:

- $|\Delta|=|A_i|=t\lambda$
- $|\phi_{k}| = (t+1)k\lambda$
- $|\beta_{k}| = (t+1)r\lambda$

The total overhead is the sum of $2\lambda$ (the group element $\alpha$), $(t+1)r\lambda$ (the routing info $\beta$), $\lambda$ (the MAC), and $(t+1)\lambda$ (the final destination address and the integrity check).

In total thats $((t+1)(r+1)+3)\lambda$. For the Logos choice of $t=3$ and $r=5$, this would mean 432 bytes.

TODO: double-check this!

#### Mix nodes as recipients

We imagine that sometimes our recipients will be also mix nodes themselves. This probably requires a slight modification of the exit node handling.

TODO

#### Alternative symmetric cryptography

TODO

## Differences between the paper and Logos Mix

There are some deviations from the above in the Logos [Mix specification](https://lip.logos.co/ift-ts/raw/mix.html).

- they use AES128-CTR to encrypt the payload. This completely breaks message integrity.
- they use a larger address space than $2\lambda$ for both the destination and mix addresses $\Delta$ and $A_i$ (96 bytes at the times of writing this, which is $2 \times 48$ bytes to allow for libp2p relay circuits)
- ...