diff --git a/docs/surb.md b/docs/surb.md
new file mode 100644
index 0000000..72f24bc
--- /dev/null
+++ b/docs/surb.md
@@ -0,0 +1,1743 @@
+# SURBs in the libp2p MIX Implementation
+
+This document is a deeper treatment of Single-Use Reply Blocks (SURBs) in the
+vendored libp2p MIX implementation at
+`nimbledeps/pkgs2/libp2p-*/libp2p/protocols/mix`.
+
+It maps the implementation to the MIX spec, with special attention to the newer
+SURB section in pull request 307:
+
+- Published MIX LIP: https://lip.logos.co/ift-ts/raw/mix.html
+- Pull request 307: https://github.com/logos-co/logos-lips/pull/307
+- Pull request snapshot, Section 8.7: https://github.com/logos-co/logos-lips/blob/bfd845f11c5ee4edc1d425c7c4a2b941285fd9a3/docs/ift-ts/raw/mix.md#87-single-use-reply-blocks
+
+The published LIP still contains older wording saying reply support is not
+implemented yet. The pull request snapshot adds Section 8.7, which describes
+SURB creation, use, reply processing, and reply recovery. This document compares
+that newer SURB section with the current Nim implementation.
+
+## High-Level Model
+
+A SURB lets the destination side send a reply without learning the sender's
+identity or return path. The sender precomputes a return-path Sphinx header,
+embeds it in the forward message, and keeps the corresponding decryption
+material locally. The exit node later uses the embedded SURB to package the
+destination response as a MIX reply.
+
+The current implementation uses SURBs for request/response flows such as Ping:
+
+```text
+sender
+  builds forward Sphinx packet
+  embeds N SURBs in the encrypted forward payload
+  stores SURB id -> reply credentials + incoming queue
+    |
+    v
+mix path to exit
+    |
+    v
+exit node
+  extracts SURBs
+  forwards request to destination protocol
+  reads destination response
+  sends the same response through each supplied SURB
+    |
+    v
+return mix path(s)
+    |
+    v
+sender
+  accepts first valid SURB reply
+  deletes credentials for all SURBs from that request
+  writes one recovered response into MixEntryConnection's incoming queue
+```
+
+The important operational point is that `numSurbs > 1` does not mean the
+application receives multiple responses. In this implementation it means the
+same response is sent over multiple independent return paths, and the first valid
+reply wins.
+
+## Relevant Types and Constants
+
+The wire structures live mostly in `serialization.nim`.
+
+```nim
+const
+  k* = 16
+  r* = 5
+  t* = 6
+  AlphaSize* = 32
+  BetaSize* = ((r * (t + 1)) + 1) * k
+  GammaSize* = 16
+  HeaderSize* = AlphaSize + BetaSize + GammaSize
+  DelaySize* = 2
+  AddrSize* = (t * k) - DelaySize
+  PacketSize* = 4608
+  MessageSize* = PacketSize - HeaderSize - k
+  PayloadSize* = MessageSize + k
+  SurbSize* = HeaderSize + AddrSize + k
+  SurbLenSize* = 1
+  SurbIdLen* = k
+```
+
+With the current parameters:
+
+- `HeaderSize = 624`
+- `DelaySize = 2`
+- `AddrSize = 94`
+- `SurbSize = 734`
+- `MessageSize = 3968`
+- `PayloadSize = 3984`
+
+`t * k` is the combined per-hop routing block space for address plus delay:
+`6 * 16 = 96` bytes. The implementation reserves `DelaySize = 2` bytes for the
+encoded per-hop delay, leaving `AddrSize = 94` bytes for the serialized hop
+address.
+
+That matches the pull request 307 Section 8.7.1 structure:
+
+```text
+SURB = hop_0 || header || reply_key
+     = 94    || 624    || 16
+     = 734 bytes
+```
+
+The implementation type is:
+
+```nim
+type Hop* = object
+  MultiAddress: seq[byte]
+
+type
+  Secret* = seq[seq[byte]]
+  Key* = seq[byte]
+  SURBIdentifier* = array[SurbIdLen, byte]
+
+  SURB* = object
+    hop*: Hop
+    header*: Header
+    key*: Key
+    secret*: Opt[Secret]
+```
+
+`Hop` is the fixed-size routing address container used both for normal next-hop
+routing and for `hop_0` in a SURB. Its serialized form is always `AddrSize`
+bytes:
+
+```nim
+proc serialize*(hop: Hop): seq[byte] =
+  if hop.MultiAddress.len == 0:
+    return newSeq[byte](AddrSize)
+
+  doAssert len(hop.MultiAddress) == AddrSize
+  return hop.MultiAddress
+```
+
+An empty `Hop()` serializes as 94 zero bytes. The implementation uses that in
+SURB construction for the final return-path routing block: zero address plus
+zero delay means "this is not a normal forward destination"; the nonzero SURB
+identifier placed after that block marks it as a reply.
+
+Field mapping:
+
+- `hop`: the spec term `hop_0`, the first hop on the return path.
+- `header`: the spec tuple `(alpha_0, beta_0, gamma_0)`, the precomputed
+  Sphinx header.
+- `key`: the spec term `k_tilde`, the reply key.
+- `secret`: local-only per-hop shared secrets `s_0 ... s_{L-1}`. This should
+  not be distributed with the SURB.
+
+The serialized SURB embedded into the forward message deliberately omits
+`secret`:
+
+```nim
+let surbBytes =
+  surbs.mapIt(it.hop.serialize() & it.header.serialize() & it.key).concat()
+```
+
+## Where the Reply Queue Fits
+
+The reply queue is not part of the Sphinx/SURB spec. It is an implementation
+adapter that lets a `MixEntryConnection` look like a normal libp2p `Connection`
+to client-side protocol code.
+
+`MixEntryConnection` has:
+
+```nim
+type MixEntryConnection* = ref object of Connection
+  incoming: AsyncQueue[seq[byte]]
+  incomingFut: Future[void]
+  replyReceivedFut: Future[void]
+  cached: seq[byte]
+```
+
+When `expectReply = true`, construction creates the queue and starts one
+background future:
+
+```nim
+instance.incoming = newAsyncQueue[seq[byte]]()
+instance.replyReceivedFut = newFuture[void]()
+
+let checkForIncoming = proc(): Future[void] {.async.} =
+  instance.cached = await instance.incoming.get()
+  instance.replyReceivedFut.complete()
+
+instance.incomingFut = checkForIncoming()
+```
+
+When client code later reads from the connection, `readOnce` waits for
+`replyReceivedFut` and then serves bytes from `cached`:
+
+```nim
+if s.cached.len == 0:
+  await s.replyReceivedFut
+
+let toRead = min(nbytes, s.cached.len)
+copyMem(pbytes, addr s.cached[0], toRead)
+s.cached = s.cached[toRead ..^ 1]
+```
+
+The queue is populated by the sender-side `Reply` branch in
+`handleMixMessages`, after the returned SURB packet has been recovered:
+
+```nim
+await connCred.incoming.put(deserialized.message)
+```
+
+So the reply queue is the bridge between:
+
+- MIX/SURB packet processing, which eventually recovers `seq[byte]`; and
+- libp2p protocol code, which expects to read those bytes from a `Connection`.
+
+Current limitation: this is shaped for a single request/response. One
+`replyReceivedFut` is completed once, and the first successful reply fills
+`cached`. It is not a general multi-response stream abstraction.
+
+## Sender-Side SURB Creation
+
+SURB creation begins during a normal MIX send. `MixEntryConnection.write` calls:
+
+```nim
+let sendRes = await srcMix.anonymizeLocalProtocolSend(
+  instance.incoming, msg, codec, dest, numSurbs
+)
+```
+
+Inside `anonymizeLocalProtocolSend`, after the forward path has selected an exit
+node, the message is augmented with SURBs:
+
+```nim
+let msgWithSurbs = mixProto.prepareMsgWithSurbs(
+  incoming, msg, numSurbs, destination.peerId, exitPeerId
+)
+```
+
+`prepareMsgWithSurbs` calls `buildSurbs`, then serializes them before the
+application message:
+
+```nim
+proc prepareMsgWithSurbs(
+  mixProto: MixProtocol,
+  incoming: AsyncQueue[seq[byte]],
+  msg: seq[byte],
+  numSurbs: uint8 = 0,
+  destPeerId: PeerId,
+  exitPeerId: PeerId,
+): Result[seq[byte], string] =
+  let surbs =
+    mixProto.buildSurbs(incoming, numSurbs, destPeerId, exitPeerId).valueOr:
+      return err(error)
+
+  serializeMessageWithSURBs(msg, surbs)
+```
+
+### `buildSurbs`: Outer Loop and Local State
+
+`buildSurbs` is the outer loop. It creates `numSurbs` independent SURBs for this
+one outgoing request. It is also where `buildSurb` is called:
+
+```nim
+proc buildSurbs(
+  mixProto: MixProtocol,
+  incoming: AsyncQueue[seq[byte]],
+  numSurbs: uint8,
+  destPeerId: PeerId,
+  exitPeerId: PeerId,
+): Result[seq[SURB], string] =
+  var response: seq[SURB]
+  var igroup = SURBIdentifierGroup(members: initHashSet[SURBIdentifier]())
+
+  for _ in 0.uint8 ..< numSurbs:
+    var id: SURBIdentifier
+    hmacDrbgGenerate(mixProto.rng[], id)
+
+    let surb = ?mixProto.buildSurb(id, destPeerId, exitPeerId)
+
+    igroup.members.incl(id)
+    mixProto.connCreds[id] = ConnCreds(
+      igroup: igroup,
+      surbSecret: surb.secret.get(),
+      surbKey: surb.key,
+      incoming: incoming,
+    )
+
+    response.add(surb)
+
+  return ok(response)
+```
+
+In that loop:
+
+- `id` is the random SURB identifier later embedded in the return-path header.
+- `buildSurb(id, destPeerId, exitPeerId)` builds one complete return-path SURB.
+- `connCreds[id]` stores the local-only recovery material for that SURB.
+- `igroup` groups all SURB identifiers created for this request, so the first
+  valid reply can consume the whole group.
+- `response` is the list of distributable SURBs that will be serialized into the
+  forward request payload.
+
+The serialized layout is:
+
+```text
+num_surbs: 1 byte
+SURB[0]: hop || header || key
+SURB[1]: hop || header || key
+...
+application_message
+```
+
+This means even messages with no replies still carry a one-byte SURB count:
+
+```text
+0x00 || application_message
+```
+
+### Identifier and Reply Key
+
+For each SURB, `buildSurbs` samples a distinct identifier before calling
+`buildSurb`:
+
+```nim
+var id: SURBIdentifier
+hmacDrbgGenerate(mixProto.rng[], id)
+let surb = ?mixProto.buildSurb(id, destPeerId, exitPeerId)
+```
+
+That `id` is embedded into the terminal routing block of the SURB return path.
+When a reply returns, the original sender extracts this identifier and uses it
+to look up the stored recovery credentials in `connCreds`.
+
+`createSURB`, which is called from inside `buildSurb`, samples the reply key:
+
+```nim
+var key = newSeqUninit[byte](k)
+rng[].generate(key)
+```
+
+This maps to pull request 307 Section 8.7.2 Step 2: sample a unique SURB
+identifier `id` and a reply key `k_tilde`.
+
+There is no explicit collision check against existing `connCreds`; uniqueness is
+probabilistic from the 16-byte random identifier.
+
+### `buildSurb`: Return Path Selection
+
+`buildSurb` constructs one return path per SURB and returns the distributable
+SURB object:
+
+```nim
+method buildSurb*(
+  mixProto: MixProtocol,
+  id: SURBIdentifier,
+  destPeerId: PeerId,
+  exitPeerId: PeerId
+): Result[SURB, string]
+```
+
+Inside `buildSurb`, the implementation accumulates three aligned arrays:
+
+```nim
+var
+  publicKeys: seq[FieldElement] = @[]
+  hops: seq[Hop] = @[]
+  delay: seq[seq[byte]] = @[]
+```
+
+Those arrays are then passed to `createSURB(publicKeys, delay, hops, id)`, which
+constructs the Sphinx return-path header and reply key.
+
+It excludes the forward-path exit and destination from the random part of the
+return path:
+
+```nim
+var pubNodeInfoKeys =
+  mixProto.nodePool.peerIds().filterIt(it != exitPeerId and it != destPeerId)
+```
+
+Then it selects `PathLength` hops. The loop index `i` is the return-path hop
+index. With the current `PathLength = 3`, `i = 0` and `i = 1` are random
+intermediate return hops, while `i = 2` is the final hop. For all but the last
+hop, it picks random public mix nodes from the filtered pool. This branch does
+not run for the final hop:
+
+```nim
+if i < PathLength - 1:
+  let randomIndexPosition = cryptoRandomInt(mixProto.rng, availableIndices.len).valueOr:
+    return err("failed to generate random num: " & error)
+  let selectedIndex = availableIndices[randomIndexPosition]
+  let randPeerId = pubNodeInfoKeys[selectedIndex]
+  availableIndices.del(randomIndexPosition)
+
+  let mixPubInfo = mixProto.nodePool.get(randPeerId)
+  (
+    mixPubInfo.peerId,
+    mixPubInfo.multiAddr,
+    mixPubInfo.mixPubKey,
+    mixProto.delayStrategy.generateForEntry(),
+  )
+```
+
+There are two index layers here. `pubNodeInfoKeys` stores the candidate peer IDs.
+`availableIndices` stores the indices in `pubNodeInfoKeys` that have not yet
+been selected for this return path. `cryptoRandomInt` chooses a random position
+inside `availableIndices`; the value at that position is then used as the index
+into `pubNodeInfoKeys`.
+
+For example:
+
+```text
+pubNodeInfoKeys   = [peerA, peerB, peerC, peerD]
+availableIndices = [0,     1,     2,     3]
+
+randomIndexPosition = 2
+selectedIndex       = availableIndices[2] = 2
+randPeerId          = pubNodeInfoKeys[2] = peerC
+```
+
+After selecting `peerC`, the code removes that entry from `availableIndices`:
+
+```nim
+availableIndices.del(randomIndexPosition)
+```
+
+That prevents the same mix node from being selected twice in one SURB return
+path. The tuple returned for a selected hop contains the libp2p peer ID,
+dialable multiaddress, MIX public key used for Sphinx header construction, and
+the encoded delay value for that hop.
+
+The last hop is the original sender's own mix node:
+
+```nim
+else:
+  (
+    mixProto.mixNodeInfo.peerId,
+    mixProto.mixNodeInfo.multiAddr,
+    mixProto.mixNodeInfo.mixPubKey,
+    0.uint16,
+  )
+```
+
+This `else` branch is reached only when `i == PathLength - 1`. With
+`PathLength = 3`, that means `i == 2`. So the last return-path hop is not random:
+it is forcibly set to `mixProto.mixNodeInfo`, the local mix node that originally
+created the SURB. In this example:
+
+```text
+i = 0  random mix node
+i = 1  random mix node
+i = 2  original sender's own mix node
+```
+
+The phrase "original sender" here means the node that created the SURB and sent
+the original forward request. It is also the final recipient of the SURB reply,
+because only that node has the stored `surbKey` and `surbSecret` needed to
+recover the reply payload.
+
+This maps to pull request 307 Section 8.7.2 Step 1: the initiating node selects
+a return path with itself as the final hop and computes ephemeral secrets for
+that path.
+
+### Header Construction
+
+The cryptographic construction is in `sphinx.nim`. The inputs collected during
+return-path selection are:
+
+- `publicKeys`: the MIX public keys for the selected return-path hops, in path
+  order. These are not libp2p identity keys. They are Curve25519/Sphinx keys
+  used to derive one shared secret per hop.
+- `hops`: the serialized routing addresses for the same return-path hops.
+- `delay`: the encoded delay value for each hop.
+- `id`: the SURB identifier that the sender will later use to find the stored
+  reply recovery keys.
+
+`buildSurb` fills `publicKeys`, `hops`, and `delay` together while iterating
+over the selected return-path nodes. The relevant section is:
+
+```nim
+for i in 0 ..< PathLength:
+  let (peerId, multiAddr, mixPubKey, delayMillisec) =
+    if i < PathLength - 1:
+      let randomIndexPosition = cryptoRandomInt(mixProto.rng, availableIndices.len).valueOr:
+        return err("failed to generate random num: " & error)
+      let selectedIndex = availableIndices[randomIndexPosition]
+      let randPeerId = pubNodeInfoKeys[selectedIndex]
+      availableIndices.del(randomIndexPosition)
+      let mixPubInfo = mixProto.nodePool.get(randPeerId).valueOr:
+        return err("could not get mix pub info for peer: " & $randPeerId)
+      (
+        mixPubInfo.peerId,
+        mixPubInfo.multiAddr,
+        mixPubInfo.mixPubKey,
+        mixProto.delayStrategy.generateForEntry(),
+      )
+    else:
+      (
+        mixProto.mixNodeInfo.peerId,
+        mixProto.mixNodeInfo.multiAddr,
+        mixProto.mixNodeInfo.mixPubKey,
+        0.uint16,
+      )
+
+  publicKeys.add(mixPubKey)
+
+  let multiAddrBytes = multiAddrToBytes(peerId, multiAddr).valueOr:
+    return err("failed to convert multiaddress to bytes: " & error)
+
+  hops.add(Hop.init(multiAddrBytes))
+  delay.add(@(delayMillisec.uint16.toBytesBE()))
+```
+
+Each line feeds a different part of Sphinx construction:
+
+- `publicKeys.add(mixPubKey)` stores the hop's MIX public key. Later,
+  `computeAlpha(publicKeys)` uses these public keys to derive one shared secret
+  per return-path hop.
+- `multiAddrToBytes(peerId, multiAddr)` converts the hop's libp2p identity and
+  dialable address into the fixed 94-byte `Hop` address format. Later,
+  `computeBetaGamma` encrypts these hop addresses into the routing header.
+- `delay.add(@(delayMillisec.uint16.toBytesBE()))` stores the hop delay as two
+  big-endian bytes, matching `DelaySize = 2`. Later, each return-path hop
+  decrypts its own delay value from Beta and applies it before forwarding.
+
+The ordering matters. `publicKeys[i]`, `hops[i]`, and `delay[i]` must describe
+the same return-path hop at index `i`; otherwise the encrypted routing header
+would pair the wrong key, address, or delay with a hop.
+
+At the end of `buildSurb`, those aligned arrays are handed to `createSURB`:
+
+```nim
+return createSURB(publicKeys, delay, hops, id)
+```
+
+`createSURB` computes the return-path `alpha_0`, per-hop secrets, `beta_0`, and
+`gamma_0`, samples the reply key, and returns the final SURB object. The relevant
+part is:
+
+```nim
+proc createSURB*(
+  publicKeys: openArray[FieldElement],
+  delay: openArray[seq[byte]],
+  hops: openArray[Hop],
+  id: SURBIdentifier,
+  rng: ref HmacDrbgContext = newRng(),
+): Result[SURB, string] =
+  if id == default(SURBIdentifier):
+    return err("id should be initialized")
+
+  let (alpha_0, s) = computeAlpha(publicKeys).valueOr:
+    return err("Error in alpha generation: " & error)
+
+  let (beta_0, gamma_0) = computeBetaGamma(s, hops, delay, Hop(), id).valueOr:
+    return err("Error in beta and gamma generation: " & error)
+
+  var key = newSeqUninit[byte](k)
+  rng[].generate(key)
+
+  return ok(
+    SURB(
+      hop: hops[0],
+      header: Header.init(alpha_0, beta_0, gamma_0),
+      secret: Opt.some(s),
+      key: key,
+    )
+  )
+```
+
+This is where the distributable SURB fields are assembled:
+
+- `hop: hops[0]` is `hop_0`, the first return-path hop the exit should send to.
+- `header: Header.init(alpha_0, beta_0, gamma_0)` is the precomputed Sphinx
+  header for the return path.
+- `key` is the reply key `k_tilde`, distributed with the SURB so the exit can
+  encrypt the reply payload.
+- `secret: Opt.some(s)` is local recovery material. It is stored in
+  `connCreds` by `buildSurbs`, but is not serialized into the SURB sent to the
+  exit.
+
+#### Alpha and Per-Hop Secrets
+
+`computeAlpha(publicKeys)` creates the initial Sphinx ephemeral public value and
+the shared secret for each return-path hop:
+
+```nim
+var
+  s: seq[seq[byte]] = newSeq[seq[byte]](publicKeys.len)
+  alpha_0: seq[byte]
+  alpha: FieldElement
+  secret: FieldElement
+  blinders: seq[FieldElement] = @[]
+
+let x = generateRandomFieldElement()
+blinders.add(x)
+
+for i in 0 ..< publicKeys.len:
+  if i == 0:
+    alpha = multiplyBasePointWithScalars([blinders[i]])
+    alpha_0 = fieldElementToBytes(alpha)
+  else:
+    alpha = multiplyPointWithScalars(alpha, [blinders[i]])
+
+  secret = multiplyPointWithScalars(publicKeys[i], blinders)
+
+  let blinder = bytesToFieldElement(
+    sha256_hash(fieldElementToBytes(alpha) & fieldElementToBytes(secret))
+  )
+
+  blinders.add(blinder)
+  s[i] = fieldElementToBytes(secret)
+```
+
+Conceptually:
+
+- The sender samples an ephemeral scalar `x`.
+- `blinders` is the blinding chain. It starts with `x`; after each hop, the code
+  derives a new blinder from that hop's current `alpha` and shared secret.
+- `alpha_0 = x * G` is the initial public value placed in the SURB header.
+  `multiplyBasePointWithScalars([x])` computes this by multiplying the
+  Curve25519 base point `G` by the scalar `x`. In the helper implementation,
+  this is done by calling `public(x)`, because a Curve25519 public key is the
+  base point multiplied by the private scalar.
+- For every return-path hop, the sender derives a shared secret from that hop's
+  MIX public key and the current blinding chain.
+- The next `alpha` value is produced by multiplying the previous `alpha` by the
+  next blinder. This lets every hop see a fresh-looking `alpha` while still
+  allowing the sender to precompute all per-hop secrets.
+- Each hop will later derive the same secret from its private key and the `alpha`
+  value it receives while processing the Sphinx packet.
+- The secrets `s[0] ... s[L-1]` are kept locally in `surb.secret` so the sender
+  can recover the reply when the return packet reaches it.
+
+#### Beta and Gamma
+
+The SURB section of the spec indexes the return path in the direction the reply
+will travel:
+
+```text
+i = 0           first return-path hop, the node the exit sends to
+i = sLen - 1    terminal return-path hop, the original sender
+```
+
+This matches pull request 307 Section 8.7.2, where `hop_0` is the first hop on
+the return path and the initiating node is the final hop.
+
+The implementation uses the same index convention in its arrays:
+
+```text
+hops[0], publicKeys[0], delay[0], s[0]                 -> hop_0
+hops[1], publicKeys[1], delay[1], s[1]                 -> hop_1
+hops[sLen - 1], publicKeys[sLen - 1], delay[sLen - 1] -> hop_{L-1}
+```
+
+So `sLen - 1` does not mean `hop_0`; it means `hop_{L-1}`. In the SURB return
+path constructed above, `hop_{L-1}` is the original sender's own mix node,
+because `buildSurb` explicitly put `mixProto.mixNodeInfo` in the final hop slot.
+
+Equivalently, for `PathLength = 3`:
+
+```text
+s[0] -> hop_0, first return hop
+s[1] -> hop_1, second return hop
+s[2] -> hop_2, terminal return hop/original sender
+```
+
+The countdown loop below changes construction order only. It does not change
+which hop each `s[i]` belongs to.
+
+There is only one `computeBetaGamma` routine. This is its full signature:
+
+```nim
+proc computeBetaGamma(
+  s: seq[seq[byte]],
+  hops: openArray[Hop],
+  delay: openArray[seq[byte]],
+  destHop: Hop,
+  id: SURBIdentifier,
+): Result[tuple[beta: seq[byte], gamma: seq[byte]], string] =
+```
+
+The forward path and the SURB return path both call this same function. The
+difference is only in the argument values passed to `hops`, `destHop`, and `id`.
+
+The implementation then iterates over the path indices from high to low:
+
+```nim
+proc computeBetaGamma(
+  s: seq[seq[byte]],
+  hops: openArray[Hop],
+  delay: openArray[seq[byte]],
+  destHop: Hop,
+  id: SURBIdentifier,
+): Result[tuple[beta: seq[byte], gamma: seq[byte]], string] =
+  let sLen = s.len
+  var
+    beta: seq[byte]
+    gamma: seq[byte]
+
+  let filler = computeFillerStrings(s).valueOr:
+    return err("Error in filler generation: " & error)
+
+  for i in countdown(sLen - 1, 0):
+    let
+      beta_aes_key = deriveKeyMaterial("aes_key", s[i]).kdf()
+      mac_key = deriveKeyMaterial("mac_key", s[i]).kdf()
+      beta_iv = deriveKeyMaterial("iv", s[i]).kdf()
+
+    if i == sLen - 1:
+      let destBytes = destHop.serialize()
+      let destPadding = destBytes & delay[i] & @id & newSeq[byte](PaddingLength)
+      let aes = aes_ctr(beta_aes_key, beta_iv, destPadding)
+      beta = aes & filler
+    else:
+      let betaPrefix =
+        beta[0 .. (((r * (t + 1)) - t) * k) - 1]
+
+      let routingInfo = RoutingInfo.init(
+        hops[i + 1],
+        delay[i],
+        gamma,
+        betaPrefix,
+      )
+
+      let serializedRoutingInfo = routingInfo.serialize()
+      beta = aes_ctr(beta_aes_key, beta_iv, serializedRoutingInfo)
+
+    gamma = hmac(mac_key, beta).toSeq()
+
+  return ok((beta: beta, gamma: gamma))
+```
+
+The same `computeBetaGamma` routine is used for both normal forward packets and
+SURB return-path headers. The difference is entirely in the arguments.
+
+For a normal forward message, `wrapInSphinxPacket` calls:
+
+```nim
+let (beta_0, gamma_0) = computeBetaGamma(
+  s,
+  hop,
+  delay,
+  destHop,
+  default(SURBIdentifier),
+)
+```
+
+For a SURB, `createSURB` calls:
+
+```nim
+let (beta_0, gamma_0) = computeBetaGamma(
+  s,
+  hops,
+  delay,
+  Hop(),
+  id,
+)
+```
+
+The argument differences are:
+
+| Argument | Forward message | SURB return path |
+| --- | --- | --- |
+| `hops` | Forward mix path: first mix hop, intermediate hops, final exit hop. | Return mix path: first return hop, intermediate return hops, original sender as final hop. |
+| `destHop` | Real destination address, encoded as a `Hop`. The forward-path exit uses it to dial the destination protocol. | Empty `Hop()`, which serializes as zero address bytes. This marks that the terminal return hop is not forwarding to another destination. |
+| `id` | `default(SURBIdentifier)`, all zero bytes. | Random nonzero SURB identifier generated by `buildSurbs`. The original sender uses it to find `connCreds`. |
+
+For `PathLength = 3`, the final routing block produced by the same
+`if i == sLen - 1` branch therefore differs like this:
+
+```text
+Forward message final block:
+  destHop = destination address
+  delay   = 0
+  id      = 0^16
+  meaning = normal exit; forward plaintext to destination
+
+SURB return final block:
+  destHop = zero Hop()
+  delay   = 0
+  id      = random SURB id
+  meaning = reply arrived at original sender; recover using connCreds[id]
+```
+
+So there is no separate "forward Beta" and "SURB Beta" algorithm. There is one
+Beta/Gamma algorithm with two terminal-block encodings.
+
+`beta` and `gamma` are mutable byte sequences. During the countdown loop, they
+always hold the header state for the hop that was just constructed. For a
+3-hop path, after the first iteration they hold `beta_2/gamma_2`; after the
+second iteration they hold `beta_1/gamma_1`; after the final iteration they hold
+`beta_0/gamma_0`, which is what goes into the SURB header.
+
+#### Filler
+
+`filler` is precomputed before the Beta/Gamma countdown loop:
+
+```nim
+proc computeFillerStrings(s: seq[seq[byte]]): Result[seq[byte], string] =
+  var filler: seq[byte] = @[]
+
+  for i in 1 ..< s.len:
+    let
+      aes_key = deriveKeyMaterial("aes_key", s[i - 1]).kdf()
+      iv = deriveKeyMaterial("iv", s[i - 1]).kdf()
+
+    let
+      fillerLength = (t + 1) * k
+      zeroPadding = newSeq[byte](fillerLength)
+
+    filler = aes_ctr_start_index(
+      aes_key,
+      iv,
+      filler & zeroPadding,
+      (((t + 1) * (r - i)) + t + 2) * k,
+    )
+
+  return ok(filler)
+```
+
+The hard part is why this exists. Each hop decrypts and shifts the routing
+header state as the packet moves forward. Without filler, the end of Beta would
+gradually become distinguishable padding, and a hop could infer information
+about its position in the path or the remaining path length. Filler precomputes
+the bytes that must be appended at the tail so that, after each hop's Sphinx
+transformation, the Beta field still has the right fixed-size shape and does not
+reveal where the packet is in the path.
+
+The call to `aes_ctr_start_index` is what makes the filler line up with the tail
+of the conceptual full Beta stream:
+
+```nim
+filler = aes_ctr_start_index(
+  aes_key,
+  iv,
+  filler & zeroPadding,
+  (((t + 1) * (r - i)) + t + 2) * k,
+)
+```
+
+`filler & zeroPadding` extends the previous filler by one routing block:
+
+```text
+(t + 1) * k = (6 + 1) * 16 = 112 bytes
+```
+
+`aes_ctr_start_index` encrypts that data as if it started at byte offset
+`startIndex` in a larger AES-CTR stream. This matters because filler bytes live
+at the tail of Beta, not at byte offset 0. AES-CTR keystream bytes depend on
+position, so filler must be encrypted with the keystream positions where those
+bytes will sit inside the full Beta field.
+
+For the current constants, the filler loop runs twice for `PathLength = 3`:
+
+```text
+i = 1:
+  startIndex = (((7) * (5 - 1)) + 8) * 16
+             = 576
+  input length = 112 bytes
+
+i = 2:
+  startIndex = (((7) * (5 - 2)) + 8) * 16
+             = 464
+  input length = 224 bytes
+```
+
+So the final filler is 224 bytes. Conceptually, it is pre-encrypted tail data
+that will remain well-formed after earlier hops append zero padding and decrypt
+their Beta layers.
+
+In this implementation, filler is only appended when constructing the terminal
+routing block:
+
+```nim
+if i == sLen - 1:
+  ...
+  beta = aes & filler
+```
+
+That may look surprising, but remember the countdown loop builds nested header
+state. The terminal block is built first and becomes the innermost state carried
+forward into `beta_{L-2}`, `beta_{L-3}`, and eventually `beta_0`. Appending
+filler there prepares the tail bytes that will be needed after later hops peel
+their layers.
+
+The loop in `computeFillerStrings` starts at `i = 1` because no filler is needed
+before the first hop has processed anything. The expression `s[i - 1]` therefore
+starts with `s[0]`, which is the secret for `hop_0`, the first return hop. It
+does not start with the original sender. The filler code uses each earlier
+travel-order hop's `aes_key`/`iv` to predict how zero padding at the tail will
+transform as those earlier layers are processed.
+
+That is the end of the filler-specific part. The next paragraphs return to the
+general Beta/Gamma construction flow.
+
+This countdown is construction order, not path-travel order. It is needed
+because each earlier hop's routing block contains encrypted information for the
+next hop. For a 3-hop return path:
+
+```text
+hop 0 -> hop 1 -> hop 2/original sender
+```
+
+the header has to be prepared in this dependency order:
+
+```text
+1. build beta_2/gamma_2 for the original sender
+2. build beta_1/gamma_1, embedding gamma_2 and part of beta_2
+3. build beta_0/gamma_0, embedding gamma_1 and part of beta_1
+```
+
+The exit receives only `beta_0/gamma_0` in the SURB header. Hop 0 decrypts its
+routing block and learns how to forward to hop 1, including the `beta_1/gamma_1`
+state that hop 1 will verify next. Hop 1 does the same for hop 2. That is the
+reason the construction loop counts down from the original sender back to the
+first return hop.
+
+#### Digression: Why Build `beta_2` Before `beta_0`?
+
+For both normal forward messages and SURB return messages, the path is indexed
+in travel order:
+
+```text
+hop_0 -> hop_1 -> hop_2
+```
+
+For a normal forward message:
+
+```text
+sender -> hop_0 -> hop_1 -> hop_2/exit -> destination
+```
+
+For a SURB reply:
+
+```text
+exit -> hop_0 -> hop_1 -> hop_2/original sender
+```
+
+In both cases, the packet is processed in travel order:
+
+```text
+hop_0 processes beta_0/gamma_0
+hop_1 processes beta_1/gamma_1
+hop_2 processes beta_2/gamma_2
+```
+
+But the header must be constructed in the opposite order because each earlier
+hop's routing block embeds the next hop's header state:
+
+```text
+beta_0 contains next-hop info for hop_1 plus beta_1/gamma_1
+beta_1 contains next-hop info for hop_2 plus beta_2/gamma_2
+beta_2 contains the final instruction
+```
+
+So `beta_0` cannot be built until `beta_1/gamma_1` exists, and `beta_1` cannot
+be built until `beta_2/gamma_2` exists. The construction dependency is:
+
+```text
+1. build beta_2/gamma_2
+2. use those to build beta_1/gamma_1
+3. use those to build beta_0/gamma_0
+```
+
+The packet/SURB then carries `beta_0/gamma_0` as the starting header state. At
+runtime, each hop decrypts its own layer and reveals the next header state:
+
+```text
+construction: beta_2 -> beta_1 -> beta_0
+processing:   beta_0 -> beta_1 -> beta_2
+```
+
+For SURBs, the final instruction in `beta_2` is special: it contains zero
+address/delay plus the SURB identifier. For normal forward messages, the final
+instruction contains the destination address.
+
+Returning to the SURB header construction code, each iteration of
+`computeBetaGamma` derives the keys needed to encrypt and authenticate one hop's
+routing block.
+
+For each hop secret `s[i]`, the code derives:
+
+- `beta_aes_key`: encrypts the routing block for that hop.
+- `beta_iv`: IV for that routing-block encryption.
+- `mac_key`: authenticates the resulting `beta`.
+
+For hops where `i < sLen - 1`, meaning every return-path hop before the original
+sender, the routing block contains the next hop's address, this hop's delay, the
+next hop's `gamma`, and the next encrypted `beta` prefix:
+
+```nim
+let betaPrefix =
+  beta[0 .. (((r * (t + 1)) - t) * k) - 1]
+
+let routingInfo = RoutingInfo.init(
+  hops[i + 1],
+  delay[i],
+  gamma,
+  betaPrefix,
+)
+
+let serializedRoutingInfo = routingInfo.serialize()
+beta = aes_ctr(beta_aes_key, beta_iv, serializedRoutingInfo)
+```
+
+That is the onion-routing part of the header: when hop `i` processes the packet,
+it can decrypt only its own routing block and learn only the next hop, delay,
+next MAC, and next encrypted header state.
+
+The `betaPrefix` slice is the part of the already-built next-hop `beta` that
+fits into a `RoutingInfo` block. The formula is:
+
+```text
+(((r * (t + 1)) - t) * k)
+```
+
+With the current constants:
+
+```text
+r = 5
+t = 6
+k = 16
+
+((5 * (6 + 1)) - 6) * 16
+= ((5 * 7) - 6) * 16
+= (35 - 6) * 16
+= 29 * 16
+= 464 bytes
+```
+
+That matches the `RoutingInfo.serialize()` layout:
+
+```text
+Addr  = 94 bytes
+Delay = 2 bytes
+Gamma = 16 bytes
+Beta  = 464 bytes
+Total = 576 bytes
+```
+
+`RoutingInfo.serialize()` must produce exactly 576 bytes because the Sphinx
+header format has a fixed-size Beta field. Each non-terminal hop encrypts one
+serialized `RoutingInfo` block to produce the next `beta` value. Since address,
+delay, and gamma consume 112 bytes, only 464 bytes are available for carrying
+forward the next encrypted Beta state. The slice:
+
+```nim
+beta[0 .. 463]
+```
+
+keeps exactly that prefix.
+
+There is an important distinction here:
+
+```text
+BetaSize = ((r * (t + 1)) + 1) * k = 576 bytes
+betaPrefix size = ((r * (t + 1)) - t) * k = 464 bytes
+```
+
+So `(((r * (t + 1)) - t) * k)` is not the full Beta size in this
+implementation. The full `beta` value is 576 bytes. The 464-byte slice is the
+amount of the already-built next-hop Beta that can be embedded inside the
+previous hop's `RoutingInfo`, because `RoutingInfo` must also include 112 bytes
+of fresh routing data:
+
+```text
+next hop address   94 bytes
+delay               2 bytes
+next gamma         16 bytes
+---------------------------
+fresh routing     112 bytes
+
+576-byte RoutingInfo - 112 fresh bytes = 464 bytes for betaPrefix
+```
+
+During packet processing, the receiving hop appends `(t + 1) * k = 112` zero
+bytes before decrypting Beta:
+
+```nim
+let zeroPadding = newSeq[byte]((t + 1) * k)
+let B = aes_ctr(beta_aes_key, beta_iv, beta & zeroPadding)
+```
+
+That is how the hop recovers a full routing block containing 112 bytes of fresh
+routing data plus a restored 576-byte next-hop Beta state. The construction side
+stores only a 464-byte prefix because the processing side later supplies the
+extra 112 zero bytes before decryption.
+
+For `i == sLen - 1`, meaning the terminal return-path hop and original sender,
+the routing block is special:
+
+The call passes `Hop()` as `destHop` and a non-default `id`. In
+`computeBetaGamma`, the final routing block is built as:
+
+```nim
+let destBytes = destHop.serialize()
+let destPadding = destBytes & delay[i] & @id & newSeq[byte](PaddingLength)
+let aes = aes_ctr(beta_aes_key, beta_iv, destPadding)
+beta = aes & filler
+```
+
+For a SURB, `destHop.serialize()` is all zeros and the final delay is zero.
+That produces the pull request 307 Section 8.7.2 Step 3 shape:
+
+```text
+zero address/delay || SURB id || zero padding
+```
+
+This is how the original sender, when it becomes the final return-path hop, can
+distinguish a SURB reply from a normal forward exit message.
+
+After each final or non-final `beta` calculation, `gamma` is computed over that
+`beta`:
+
+```nim
+gamma = hmac(mac_key, beta).toSeq()
+```
+
+The first values produced by the backwards loop, `beta_0` and `gamma_0`, are the
+values placed in the distributed SURB header. Later return-path hops derive and
+verify the next `gamma` values as they process the packet.
+
+#### Hop-by-Hop Beta Example
+
+For a concrete 3-hop SURB return path, the countdown loop behaves like this:
+
+```text
+return travel path: hop_0 -> hop_1 -> hop_2/original sender
+construction loop: i = 2, then i = 1, then i = 0
+```
+
+At `i = 2`, the code builds the terminal block for the original sender:
+
+```text
+destPadding =
+  zero Hop() address  94 bytes
+  zero delay           2 bytes
+  SURB id             16 bytes
+  zero padding       240 bytes  # PaddingLength
+
+AES-CTR(key_2, iv_2, destPadding) = 352 bytes
+filler                            = 224 bytes
+beta_2                            = 576 bytes
+gamma_2 = HMAC(mac_key_2, beta_2)
+```
+
+Here `PaddingLength` is:
+
+```text
+(((t + 1) * (r - PathLength)) + 1) * k
+= (((6 + 1) * (5 - 3)) + 1) * 16
+= 240 bytes
+```
+
+At this point, the mutable `beta`/`gamma` variables hold `beta_2/gamma_2`.
+
+At `i = 1`, the code builds routing instructions for `hop_1`. When `hop_1`
+later processes the packet, it needs to learn how to forward to `hop_2` and what
+header state `hop_2` should verify. So `RoutingInfo.init` is effectively:
+
+```nim
+let routingInfo = RoutingInfo.init(
+  hops[2],          # address of hop_2/original sender
+  delay[1],         # delay hop_1 should apply before forwarding
+  gamma_2,          # MAC that hop_2 should verify
+  beta_2[0 .. 463],
+)
+```
+
+`RoutingInfo.init` is just a structured container:
+
+```nim
+RoutingInfo(
+  Addr: hops[2],
+  Delay: delay[1],
+  Gamma: gamma_2,
+  Beta: beta_2[0 .. 463],
+)
+```
+
+Then serialization lays those fields out in a fixed 576-byte block:
+
+```text
+hops[2]          94 bytes
+delay[1]          2 bytes
+gamma_2          16 bytes
+beta_2 prefix   464 bytes
+-------------------------
+serialized      576 bytes
+```
+
+That block is encrypted to become `beta_1`, and then `gamma_1` is computed:
+
+```text
+beta_1 = AES-CTR(key_1, iv_1, serialized RoutingInfo for hop_1)
+gamma_1 = HMAC(mac_key_1, beta_1)
+```
+
+At `i = 0`, the same pattern repeats for `hop_0`:
+
+```nim
+let routingInfo = RoutingInfo.init(
+  hops[1],          # address of hop_1
+  delay[0],         # delay hop_0 should apply before forwarding
+  gamma_1,          # MAC that hop_1 should verify
+  beta_1[0 .. 463],
+)
+```
+
+After encryption:
+
+```text
+beta_0 = AES-CTR(key_0, iv_0, serialized RoutingInfo for hop_0)
+gamma_0 = HMAC(mac_key_0, beta_0)
+```
+
+The SURB distributed to the exit contains `hop_0` plus `alpha_0/beta_0/gamma_0`.
+It does not contain `beta_1` or `beta_2` as separate fields. Those later header
+states are nested inside `beta_0` through the encrypted `RoutingInfo` blocks.
+
+#### Hop-by-Hop Processing Example
+
+The processing side reverses the construction dependency. A return packet starts
+at `hop_0` with:
+
+```text
+Header(alpha_0, beta_0, gamma_0)
+Payload(delta_0)
+```
+
+At every non-terminal hop, `processSphinxPacket` verifies the current Gamma,
+decrypts Beta with 112 bytes of appended zero padding, transforms Delta, and
+creates the header for the next hop:
+
+```nim
+if hmac(mac_key, beta).toSeq() != gamma:
+  return InvalidMAC
+
+let delta_prime = aes_ctr(delta_aes_key, delta_iv, payload)
+
+let zeroPadding = newSeq[byte]((t + 1) * k)
+let B = aes_ctr(beta_aes_key, beta_iv, beta & zeroPadding)
+
+let routingInfo = RoutingInfo.deserialize(B)
+let (address, delay, gamma_prime, beta_prime) =
+  routingInfo.getRoutingInfo()
+
+let alpha_prime = multiplyPointWithScalars(alphaFE, [blinder])
+
+let sphinxPkt = SphinxPacket.init(
+  Header.init(fieldElementToBytes(alpha_prime), beta_prime, gamma_prime),
+  delta_prime,
+)
+```
+
+For the same 3-hop return path, `hop_0` receives `beta_0/gamma_0`:
+
+```text
+input at hop_0:
+  beta_0   576 bytes
+  gamma_0   16 bytes
+
+processing:
+  B_0 = AES-CTR(key_0, iv_0, beta_0 || 112 zero bytes)
+```
+
+`B_0` decrypts to the routing block that was built for `hop_0`:
+
+```text
+B_0 =
+  hops[1]          94 bytes  # next hop address
+  delay[0]          2 bytes  # delay for hop_0
+  gamma_1          16 bytes  # MAC for hop_1 to verify
+  beta_1          576 bytes  # restored next-hop Beta
+```
+
+`RoutingInfo.deserialize(B_0)` returns:
+
+```text
+address      = hops[1]
+delay        = delay[0]
+gamma_prime  = gamma_1
+beta_prime   = beta_1
+```
+
+Then `hop_0` forwards:
+
+```text
+Header(alpha_1, beta_1, gamma_1)
+Payload(delta_1)
+```
+
+`hop_1` repeats the same process:
+
+```text
+B_1 = AES-CTR(key_1, iv_1, beta_1 || 112 zero bytes)
+
+B_1 =
+  hops[2]          94 bytes
+  delay[1]          2 bytes
+  gamma_2          16 bytes
+  beta_2          576 bytes
+```
+
+Then `hop_1` forwards:
+
+```text
+Header(alpha_2, beta_2, gamma_2)
+Payload(delta_2)
+```
+
+At `hop_2`, the original sender, the decrypted routing block is not parsed as a
+normal `RoutingInfo`. Instead, the zero address/delay plus nonzero SURB
+identifier tells the code this is a reply packet:
+
+```text
+B_2 =
+  zero address/delay  96 bytes
+  SURB id             16 bytes
+  zero padding       ...
+```
+
+The sender extracts the SURB identifier and hands `delta_prime` to reply
+recovery. That final payload is not application plaintext yet; it still needs
+`processReply(surbKey, surbSecret, delta_prime)`.
+
+### Local Credential Storage
+
+`buildSurbs` stores local recovery state in `mixProto.connCreds`:
+
+```nim
+mixProto.connCreds[id] = ConnCreds(
+  igroup: igroup,
+  surbSecret: surb.secret.get(),
+  surbKey: surb.key,
+  incoming: incoming,
+)
+```
+
+`ConnCreds` contains:
+
+```nim
+type
+  SURBIdentifierGroup = ref object
+    members: HashSet[SURBIdentifier]
+
+  ConnCreds = object
+    igroup: SURBIdentifierGroup
+    incoming: AsyncQueue[seq[byte]]
+    surbSecret: serialization.Secret
+    surbKey: serialization.Key
+```
+
+Mapping to pull request 307 Section 8.7.2 Step 4:
+
+- `surbKey` is `k_tilde`.
+- `surbSecret` is `s_0 ... s_{L-1}`.
+- `connCreds[id]` is the local table indexed by SURB identifier.
+- `incoming` is implementation-specific glue for waking the `Connection` reader.
+
+The `SURBIdentifierGroup` is not described in the pull request spec. It is an
+implementation policy for multiple SURBs attached to one request. All SURB IDs
+created for that request share the same group object.
+
+## Distribution to the Exit
+
+Once SURBs are serialized into the forward payload, the whole message is encoded
+as a normal `MixMessage`, padded, and wrapped in the forward Sphinx packet:
+
+```nim
+let message = buildMessage(msgWithSurbs, codec, mixProto.mixNodeInfo.peerId)
+let sphinxPacket = wrapInSphinxPacket(message, publicKeys, delay, hop, destHop)
+```
+
+The exit node eventually decrypts the forward packet and extracts SURBs:
+
+```nim
+let deserialized = MixMessage.deserialize(unpaddedMsg)
+let (surbs, message) = extractSURBs(deserialized.message)
+
+await mixProto.exitLayer.onMessage(
+  deserialized.codec, message, processedSP.destination, surbs
+)
+```
+
+`extractSURBs` reconstructs only the distributed part:
+
+```nim
+surbs[i].hop = ?Hop.deserialize(hopBytes)
+surbs[i].header = ?Header.deserialize(headerBytes)
+surbs[i].key = ?readBytes(offset, Opt.some(k))
+```
+
+The exit receives no `secret` field. That is correct: only the original sender
+should know the per-hop return-path secrets needed for final reply recovery.
+
+## Exit-Side Use of SURBs
+
+The exit layer first forwards the request to the real destination protocol:
+
+```nim
+let destConn = await self.switch.dial(destPeerId, @[destAddr], codec)
+await destConn.write(message)
+```
+
+If SURBs were attached, it reads a destination response using the registered
+`DestReadBehavior`:
+
+```nim
+let rawResponse = await behavior.callback(destConn)
+```
+
+Then it calls:
+
+```nim
+await self.reply(surbs, response)
+```
+
+The current implementation sends the same response over every supplied SURB:
+
+```nim
+let respFuts = surbs.mapIt(self.onReplyDialer(it, msg))
+await allFutures(respFuts)
+```
+
+This is an important behavioral choice. Pull request 307 Section 8.7.3 describes
+how to use a SURB; it does not require using every SURB for the same response.
+The Nim implementation currently treats multiple SURBs as redundant return paths
+for one response.
+
+Each individual SURB is used in `mix_protocol.reply`:
+
+```nim
+let (peerId, multiAddr) = surb.hop.get().bytesToMultiAddr()
+let message = buildMessage(msg, "", peerId)
+let sphinxPacket = useSURB(surb, message)
+await mixProto.sendPacket(peerId, multiAddr, sphinxPacket, SendPacketLogConfig(logType: Reply))
+```
+
+`buildMessage(msg, "", peerId)` pads the reply as a normal `MessageChunk`
+wrapped in a `MixMessage` with an empty codec. The empty codec is acceptable
+because the reply is matched by SURB identifier rather than by application
+protocol negotiation.
+
+`useSURB` encrypts the reply payload once with the SURB reply key:
+
+```nim
+let delta_aes_key = deriveKeyMaterial("delta_aes_key", surb.key).kdf()
+let delta_iv = deriveKeyMaterial("delta_iv", surb.key).kdf()
+let serializedMsg = msg.serialize()
+let delta = aes_ctr(delta_aes_key, delta_iv, serializedMsg)
+
+SphinxPacket.init(surb.header, delta)
+```
+
+This maps to pull request 307 Section 8.7.3:
+
+- prepare/pad the reply message;
+- encrypt the payload with `k_tilde`;
+- assemble a Sphinx packet with the SURB header;
+- transmit it to `hop_0` over `/mix/1.0.0`.
+
+## Return Path Processing
+
+Return packets use the same `/mix/1.0.0` handler as forward packets.
+Intermediate return-path nodes do not know they are forwarding a reply. They
+perform ordinary Sphinx processing:
+
+```nim
+let processedSP = processSphinxPacket(...)
+
+case processedSP.status
+of Intermediate:
+  await mixProto.writeLp(nextPeerId, @[nextAddr], @[MixProtocolID], outgoingPacket)
+```
+
+Every intermediate hop decrypts one payload layer and forwards the transformed
+packet. This is why pull request 307 Section 8.7.3 says the reply is initially encrypted
+with `k_tilde`, and each return-path hop adds the corresponding Sphinx payload
+transformation. The sender must later remove `L + 1` layers: one for `k_tilde`
+and one for each return-path secret.
+
+## Sender-Side Reply Detection
+
+The original sender is the final hop of the return path. In
+`processSphinxPacket`, after decrypting the final routing block `B`, the code
+distinguishes normal forward exit from SURB reply:
+
+```nim
+if B.isZeros((t + 1) * k, ((t + 1) * k) + PaddingLength - 1):
+  let hop = Hop.deserialize(B[0 .. AddrSize - 1])
+
+  if B.isZeros(AddrSize, ((t + 1) * k) - 1):
+    # normal forward exit
+  elif B.isZeros(0, (t * k) - 1):
+    return ok(
+      ProcessedSphinxPacket(
+        status: Reply,
+        id: B.extractSurbId(),
+        delta_prime: delta_prime,
+      )
+    )
+```
+
+The SURB identifier is extracted at `t * k`, matching pull request 307 Section 8.7.4:
+
+```nim
+template extractSurbId(data: seq[byte]): SURBIdentifier =
+  const startIndex = t * k
+  const endIndex = startIndex + SurbIdLen - 1
+  var id: SURBIdentifier
+  copyMem(addr id[0], addr data[startIndex], SurbIdLen)
+  id
+```
+
+## Reply Recovery
+
+The `Reply` branch in `handleMixMessages` performs implementation-level reply
+recovery.
+
+First it looks up the SURB identifier:
+
+```nim
+if not mixProto.connCreds.hasKey(processedSP.id):
+  mix_messages_error.inc(labelValues = ["Sender/Reply", "NO_CONN_FOUND"])
+  return
+
+connCred = mixProto.connCreds[processedSP.id]
+```
+
+This maps to pull request 307 Section 8.7.5 Step 1: retrieve `k_tilde` and
+`s_0 ... s_{L-1}` by `id`; if not found, discard.
+
+Then it removes all payload encryption layers:
+
+```nim
+let reply = processReply(
+  connCred.surbKey, connCred.surbSecret, processedSP.delta_prime
+)
+```
+
+`processReply` starts with `k_tilde`, then iterates over each stored return-path
+secret:
+
+```nim
+var delta = delta_prime[0 ..^ 1]
+var key_prime = key
+
+for i in 0 .. s.len:
+  if i != 0:
+    key_prime = s[i - 1]
+
+  let delta_aes_key = deriveKeyMaterial("delta_aes_key", key_prime).kdf()
+  let delta_iv = deriveKeyMaterial("delta_iv", key_prime).kdf()
+
+  delta = aes_ctr(delta_aes_key, delta_iv, delta)
+
+let deserializeMsg = Message.deserialize(delta)
+```
+
+This maps directly to pull request 307 Section 8.7.5 Step 2: decrypt with `k_tilde`, then
+with `s_0 ... s_{L-1}`, then check/strip the leading `k` zero bytes through
+`Message.deserialize`.
+
+After that, the implementation deletes all credentials in the identifier group:
+
+```nim
+for id in connCred.igroup.members:
+  mixProto.connCreds.del(id)
+```
+
+This is the "first valid reply wins" policy. If a request had `numSurbs > 1`,
+all those SURB IDs are in the same group. The first reply that can be recovered
+causes all sibling SURB credentials to be removed.
+
+Late replies for the same request are dropped because their identifiers no
+longer exist:
+
+```nim
+if not mixProto.connCreds.hasKey(processedSP.id):
+  mix_messages_error.inc(labelValues = ["Sender/Reply", "NO_CONN_FOUND"])
+  return
+```
+
+Finally, the recovered reply is decoded and pushed to the queue used by
+`MixEntryConnection.readOnce`:
+
+```nim
+let msgChunk = MessageChunk.deserialize(reply)
+let unpaddedMsg = msgChunk.removePadding()
+let deserialized = MixMessage.deserialize(unpaddedMsg)
+
+await connCred.incoming.put(deserialized.message)
+```
+
+Only this first successfully recovered reply becomes visible to the application.
+
+## `numSurbs > 1`
+
+In the current implementation, multiple SURBs attached to one request behave as
+redundant return paths:
+
+1. The sender creates `numSurbs` distinct SURB identifiers.
+2. Each SURB has its own return path, header, reply key, and stored recovery
+   credentials.
+3. All identifiers are inserted into one `SURBIdentifierGroup`.
+4. The exit sends the same destination response through all supplied SURBs.
+5. The sender accepts the first valid reply to arrive.
+6. The sender deletes credentials for every SURB in the group.
+7. Late duplicate replies are discarded with `NO_CONN_FOUND`.
+8. One response byte sequence is written to the reply queue.
+
+This gives redundancy against return-path loss or delay, but it does not provide
+multi-response semantics. If a future rewrite wants multiple distinct responses,
+the grouping and `MixEntryConnection` read model need to change.
+
+## Spec Mapping Summary
+
+| Pull Request 307 Section | Spec Concept | Nim Implementation |
+| --- | --- | --- |
+| 8.7.1 | SURB is `hop_0`, header, reply key | `SURB(hop, header, key)` in `serialization.nim`; serialized as `hop || header || key` |
+| 8.7.2 Step 1 | Select return path ending at sender | `MixProtocol.buildSurb`; last hop is `mixProto.mixNodeInfo` |
+| 8.7.2 Step 2 | Sample `id` and `k_tilde` | `buildSurbs` samples `id`; `createSURB` samples `key` |
+| 8.7.2 Step 3 | Header has zero address/delay plus `id` | `computeBetaGamma(..., Hop(), id)` |
+| 8.7.2 Step 4 | Store recovery tuple by `id` | `mixProto.connCreds[id] = ConnCreds(...)` |
+| 8.7.3 | Recipient uses SURB to reply | `ExitLayer.reply` -> `MixProtocol.reply` -> `useSURB` |
+| 8.7.4 | Final return hop extracts `id` | `processSphinxPacket` returns `ProcessedSphinxPacket(status: Reply, id, delta_prime)` |
+| 8.7.5 | Recover reply with stored keys | `processReply(surbKey, surbSecret, delta_prime)` |
+
+## Rewrite-Relevant Observations
+
+- The current model is single-request/single-visible-reply, even when multiple
+  SURBs are attached.
+- Multiple SURBs are grouped and treated as redundant alternatives. First valid
+  reply consumes the whole group.
+- The exit currently sends the same response through every SURB. That is a
+  policy choice in `ExitLayer.reply`, not an inherent requirement of the SURB
+  format.
+- The reply queue is local connection glue, not part of the cryptographic spec.
+- `MixEntryConnection` creates one future waiting for one incoming queue item.
+  It is not currently a robust stream abstraction for repeated request/reply
+  cycles over the same connection object.
+- `connCreds` has a TODO in `MixProtocol`: credentials may need cleanup when a
+  response never arrives or the connection is closed.
+- The spec says a SURB must be used at most once. The implementation enforces
+  sender-side single acceptance by deleting credentials, but the exit can still
+  attempt to use all SURBs it was given. The sender drops late duplicates.
+- The implementation does not explicitly check SURB identifier collisions before
+  inserting into `connCreds`; it relies on 16 bytes of randomness.
+- The implementation's application payload budget is smaller than raw
+  `MessageSize` because `MessageChunk` reserves two bytes for padding length and
+  four bytes for sequence number. Use `getMaxMessageSizeForCodec(codec,
+  numberOfSurbs)` when calculating usable application bytes.
+
+## Appendix: Filler as Suffix Consistency
+
+Another useful way to understand filler is as a suffix consistency mechanism.
+This view focuses on what happens to the tail of Beta when a processing node
+appends zero bytes before decrypting.
+
+Let:
+
+```text
+q = (t + 1) * k = 112 bytes
+BetaSize = 576 bytes
+betaPrefix size = 464 bytes
+```
+
+When hop `i` processes a packet, it decrypts:
+
+```text
+beta_i || 0^112
+```
+
+The appended 112 zero bytes are not transmitted, but after AES-CTR they become
+position-specific keystream bytes. For hop 0:
+
+```text
+tail_0 = KS_0[576..687]
+```
+
+For hop 0 to reconstruct a full `beta_1`, the missing suffix of `beta_1` must
+equal that tail:
+
+```text
+beta_1[464..575] = tail_0
+```
+
+The first filler iteration computes exactly this:
+
+```text
+F1 = AES-CTR_0(start = 576, 0^112)
+   = KS_0[576..687]
+```
+
+For hop 1, the same problem appears one layer deeper. We need:
+
+```text
+beta_2[464..575] = KS_1[576..687]
+```
+
+But `beta_1[464..575]` is produced by encrypting `beta_2[352..463]` under hop
+1's Beta key at positions `464..575`. To make hop 0's reconstructed
+`beta_1[464..575]` equal `F1`, construction must choose:
+
+```text
+beta_2[352..463] XOR KS_1[464..575] = F1
+```
+
+So:
+
+```text
+beta_2[352..463] = F1 XOR KS_1[464..575]
+```
+
+The second filler iteration computes both required pieces at once:
+
+```text
+F2 = AES-CTR_1(start = 464, F1 || 0^112)
+
+F2 =
+  (F1 XOR KS_1[464..575])
+  ||
+  (0^112 XOR KS_1[576..687])
+```
+
+That final `F2` is appended to the terminal Beta block:
+
+```text
+beta_2 = AES-CTR_2(destPadding) || F2
+```
+
+So filler is not just "padding." It is precomputed tail material that makes the
+suffix bytes created from appended zeros at one hop match the suffix bytes the
+next Beta state needs.
+
+Compactly:
+
+```text
+Construction:
+  precompute encrypted tail bytes as filler
+
+Processing at each hop:
+  decrypt beta || 112 zero bytes
+  consume the first 112 bytes as routing info
+  carry the remaining 576 bytes forward as next beta
+```