update the commentary and regenerate the PDF

2026-01-02 13:53:07 +00:00 · 2025-02-08 19:46:51 +01:00 · 2025-02-08 19:46:51 +01:00 · 8bf7b291b8
commit 8bf7b291b8
parent 49b5a0035e
3 changed files with 112 additions and 32 deletions
--- a/commentary/FRI.md
+++ b/commentary/FRI.md
@ -3,18 +3,18 @@ FRI protocol

 Plonky2 uses a "wide" FRI commitment (committing to whole rows), and then a batched opening proofs for all the 4 commitments (namely: constants, witness, running product and quotient polynomial).

-### Initial Merkle commitment
+### Initial Merkle commitment(s)

 To commit to a matrix of size $2^n\times M$, the columns, interpreted as values of polynomials on a multiplicative subgroup, are "low-degree extended", that is, evaluated (via an IFFT-FFT pair) on a (coset of a) larger multiplicative subgroup of size $2^{n+\mathsf{rate}^{-1}}$. In the standard configuration we have $\mathsf{rate=1}/8$, so we get 8x larger columns, that is, size $2^{n+3}$. The coset Plonky2 uses is the one shifted by the multiplicative generator of the field

 $$ g := \mathtt{0xc65c18b67785d900} = 14293326489335486720\in\mathbb{F} $$

-Note: There may be some reordering of the LDE values (bit reversal etc) which I'm unsure about at this point.
-
 When configured for zero-knowledge, each _row_ (Merkle leaf) is "blinded" by the addition of `SALT_SIZE = 4` extra random _columns_ (huh?).

 Finally, each row is hashed (well, if the number of columns is at most 4, they are left as they are, but this should never happen in practice), and a Merkle tree is built on the top of these leaf hashes.

+WARNING: before building the Merkle tree, the leaves are _reorderd_ by reversing the order of the bits in the index!!!!
+
 So we get a Merkle tree whose leaves correspond to full rows ($2^{n+3}$ leaves).

 Note that in Plonky2 we have in total 4 such matrices, resulting in 4 Merkle caps:
@ -102,11 +102,18 @@ In practice this is done per batch (see the double sum), because the division is

 Remark: In the actual protocol, $F_i$ will be the columns and $y_i$ will be the openings. Combining the batches, we end up with 2 terms:

-$$P(X) = P_0(X) + \alpha^M\cdot P_1(X) = 
+$$P(X) = P_0(X) + \alpha^{M_0}\cdot P_1(X) = 
 \frac{G_0(X)-Y_0}{X-\zeta} + \alpha^M\cdot \frac{G_1(X)-Y_1}{X-\omega\zeta}$$

 The pair $(Y_0,Y_1)$ are called "precomputed reduced openings" in the code (calculated from the opening set, involving _two rows_), and $X$ will be substituted with $X\mapsto \eta^{\mathsf{query\_index}}$ (calculated from the "initial tree proofs", involving _one row_). Here $\eta$ is the generator of the LDE subgroup, so $\omega = \eta^{1/\rho}$.

+**NOTE1:** while the above would be _the logical version_, here _AGAIN_ Plonky2 does something twisted instead, just because:
+
+$$ P(X) := \alpha^{M_1}\cdot P_0(X) + P_1(X)$$
+
+where $M_0,M_1$ denote the number of terms in the sums in $P_0,P_1$, respectively.
+
+**NOTE2:** in all these sums (well in the zero-index ones), the terms are _reordered_ so that the lookups are now at the very end. This is frankly **just stupid** because the both kind of inputs are _in a different order_, and in the case of Merkle tree openings, you couldn't even change that order...

 #### Commit phase

@ -116,7 +123,9 @@ The prover then repeatedly "folds" these vectors using the challenges $\beta_i$,

 As example, consider a starting polynomial of degree $2^{13}-1$. With $\rho=1/8$ this gives a codeword of size $2^{16}$. This is committed to (but the see the note below!). Then a challenge $\beta_0$ is generated, and we fold this (with an arity of $2^4$), getting a codeword of size $2^{12}$, representing a polynomial of degree $2^9-1$. We commit to this too. Then generate another challenge $\beta_1$, and fold again with that. Now we get a codeword of size $2^8$, however, this is represented by a polynomial of at most degree $31$, so we just send the 32 coefficients of that instead of a commitment.

-Note: as an optimization, when creating these Merkle trees, we always put _cosets_ of size $2^{\mathsf{arity}}$ on the leaves, as we will have to open them all together anyway. Furthermore, we use _Merkle caps_, so the proof lengths are shorter by the corresponding amount (4 by default, because we have 16 mini-roots in a cap). So the Merkle proofs are for a LDE size $2^k$ have length $k-\mathsf{arity\_bits}-\mathsf{cap\_bits}$, typically $k-8$.
+Notes: as an optimization, when creating these Merkle trees, we always put _cosets_ of size $2^{\mathsf{arity}}$ on the leaves, as we will have to open them all together anyway. To achieve this grouping, Plonky2 reverses the order of elements by reversing _the order of bits in the indices_; this way, members of cosets (which were before far from each other) become a continuous range. 
+
+Furthermore, we use _Merkle caps_, so the proof lengths are shorter by the corresponding amount (4 by default, because we have 16 mini-roots in a cap). So the Merkle proofs are for a LDE size $2^k$ have length $k-\mathsf{arity\_bits}-\mathsf{cap\_bits}$, typically $k-8$.

 | step | Degree     | LDE size | Tree depth |prf. len | fold with |send & absorb |
 |------|------------|----------|------------|---------|-----------|--------------|
@ -153,20 +162,89 @@ $$

 Then in each folding step, a whole coset is opened in the "upper layer", one element of which was known from the previous step (or in the very first step, can be computed from the "initial tree proofs" and the openings themselves) which is checked to match. Then the folded element of the next layer is computed by a small $2^\mathsf{arity}$ sized FFT, and this is repeated until the final step.

-### FRI verifier cost
+### Folding math details
+
+So the combined polynomial $P(x)$ is a polynomial of degree (one less than) $N=2^{n+(1/\rho)}$ over $\widetilde{\mathbb{F}}$. We first commit to the evaluations 
+
+$$\big\{P(g\cdot \eta^i)\;:\;0\le i < N\big\}$$
+
+of this polynomial on the coset $gH=\{g\cdot\eta^k\}$ where $\eta$ is generator of the subgroup of size $N$, and $g\in\mathbb{F}^\times$ is a fixed generator of the whole multiplicative group of the field.
+
+However, the commitment is done not on individual values, but smaller cosets $\mathcal{C}_k:=\{g\cdot \eta^{i(N/\mathsf{arity})+k}\;:\;0\le i < \mathsf{arity}\}$ of size $\mathsf{arity}=16$.
+
+To achieve this, the above vector of evaluation is reordered, so that _the bits of vector indices_ are reversed. 
+
+Then the query index (in each query round) selects a single element of the vector. Note that the bits of the `query_index` are _also reversed_ - however as the Merkle tree leaves are reordered too, this is not apparent at first! So mathematically speaking the locations look like $x_0 = g\cdot \eta^{\mathrm{rev}(\mathsf{idx})}$...
+
+From that, we can derive its coset, which we open with a Merkle proof. We can check the consistency of the selected element (combined polynomial value) vs. the original polynomial commitments using the formula for the combined polynomial.
+
+                           
+           0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
+         +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+    C_k  |           |     |##|  |           |           |
+         +-----------+-----------+-----------+-----------+
+          \                 ^^                          /
+            \ ____                             ______ /
+                   \ _____            ______ /
+                           \        /
+                             \    /
+               +--+-      -+--+--+--+-      -+--+
+     C_(k>>4)  |      ...     |##|      ...     |
+               +--+-      -+--+--+--+-      -+--+
+                 0             ^^             15
+                           (k mod 16)
+
+Next, we want to "fold this coset", so that we can repeat this operation until we get the final polynomial, we we simply check the final folded value aginst an evaluation of the final polynomial.
+
+On the prover side, the folding step looks like this:
+
+$$ P(x) = \sum_{i=0}^{\mathsf{arity}-1} x^i\cdot P_i(x^\mathsf{arity}) \quad\quad\longrightarrow\quad\quad
+ P'(y) := \sum_{i=0}^{\mathsf{arity}-1} \beta^i\cdot P_i(y) $$
+
+The folded polynomial $P'(x)$ will be then evaluated on the (shrunken) coset 
+
+$$H' := H^{\mathsf{arity}} = \big\{(g\eta^i)^\mathsf{arity}\;:\;i\big\} =\big\{g^\mathsf{arity}(\eta^{\mathsf{arity}})^{i'} \;:\; 0\le i'<N/\mathsf{arity} \big\}$$
+
+and so on.
+
+Now the verifier has the evaluations $\{P(x_0\cdot \eta^{iN/\mathsf{arity}})\}$ on a small coset (of size $\mathsf{arity}$) offset by $x_0$ where $x_0=g\cdot\eta^{\mathrm{rev}(\mathsf{arity}\cdot k)}$ and $k=\lfloor\mathsf{query\_idx}/\mathsf{arity}\rfloor$ (these bit reversions kill me...).
+
+Ok, let's just try and calculate this. I'll use $\mathsf{arity}=8$ for brevity. Let $\omega$ denote an 8-th root of unity, and let's write $P(x)=\sum_i a_ix^i$ in coefficient form. Then
+
+\begin{align*}
+P_k(x) &= \sum_j a_{k+8j}x^j \\
+P_k(x^8) &= \sum_j a_{k+8j}x^{8j} =
+\sum_i a_{i} x^{i-k}\cdot\left\{
+\begin{array}{lll}
+1 & \textrm{if} & i \equiv     k \mod 8)  \\
+0 & \textrm{if} & i \not\equiv k \mod 8)
+\end{array}\right\} = \\
+&=
+\sum_i a_{i} x^{i-k}\cdot \frac{1}{8} \sum_{l=0}^7\omega^{l(i-k)} 
+=\frac{1}{8}\sum_{l=0}^7 \omega^{-lk}x^{-k}\sum_i a_{i} x^{i}\omega^{li} = \\
+&=\frac{1}{8}\sum_{l=0}^7
+(x\omega^l)^{-k}\cdot P(x\omega^l) \\
+\end{align*}
+
+As what we really want to compute is $\sum_k \beta^k P_k(x_0^8)$, we can see that what really do here is a small inverse FFT (on a coset) of the values we have, then just combine them with $\beta$.
+
+This can be reinterpreted as interpolating a degree `arity-1` polynomial from these values, and evaluating it at $\beta$, and indeed this is how Plonky2 implements it.
+
+
+### FRI verification cost

 We can try and estimate the cost of the FRI verifier. Presumably the cost will be dominated by the hashing, so let's try to count the hash permutation calls. Let the circuit size be $N=2^n$ rows.

 - public input: $\lceil \#\mathsf{PI} / 8 \rceil$ (we hash with a rate 8 sponge)
 - challenges: approximately 95-120. The primary variations seem to be number (and size) of commit phase Merkle caps, and the size of the final polynomial
 - then for each query round (typically 28 of them), approx 40-100 per round:
-    - check the opened LDE row against the 4 matrix commitments:
-        - hash a row (typical sizes: 85, 135, 20, 16; resulting in 11, 17, 3 and 2 calls, respectively)
-        - check a Merkle proof (size `n+3-4 = n-1`)
+    - check the opened LDE row against the 4 matrix commitments ("initial tree proof"):
+        - hash a row (typical sizes: 85, 135, 20, 16; resulting in 11, 17, 3 and 2 calls, respectively; in total `11 + 7 + 3 + 2 = 33`)
+        - check a Merkle proof (size `n + rate_bits - cap_height = n + 3 - 4 = n-1`)
        - in total `33 + 4(n-1)` calls
-    - check the folding steps
+    - check the folding steps:
        - for each step, hash the coset (16 $\widetilde{\mathbb{F}}$ elements, that's 4 permutations)
-        - then recompute the Merkle root: the first one is `n+3-8`, the next is `n+3-12` etc
+        - then recompute the Merkle root: the first one is `n+3-8` (more precisely: `n + rate_bits - cap_height - arity_bits`), the next is `n+3-12` etc

 For example in the case of a recursive proof of size $N=2^{12}$, we have 114 permutation calls for the challenges, and then $28 \times (77+11+7)$, resulting in total

@ -174,7 +252,7 @@ $$114 + 28\times (77+11+7) = 2774$$

 Poseidon permutation calls (with `t=12`), which matches the actual code.

-We can further break down this to sponge vs compression calls. Let's concentrate on the FRI proof only, as that dominates:
+We can further break down this to sponge vs. compression calls. Let's concentrate on the FRI proof only, as that dominates:

 - sponge is (33 + 4 + 4 ...) per round 
 - compression is 4(n - 1) + (n-5) + (n-9) + ... per round
@ -187,13 +265,12 @@ It seems also worthwhile to use even wider Merkle caps than the default $2^4$.

 Soundness is bit tricky because of the small field. Plonky2 uses a mixture of sampling from a field extension and repeated challenges to achieve a claimed \~100 bit security:

-| Sub-protocol              | Soundness boost          | 
-| ------------------------- | ------------------------ | 
-| Permutation argument      | parallel repeatition     |
-| Combining constraints     | parallel repeatition     |
-| Lookup argument           | parallel repeatition     |
-| Polynomial equality test  | extension field          |
-| FRI protocol              | extension field + grinding
-|
+| Sub-protocol              | Soundness boost           | 
+| ------------------------- | ------------------------- | 
+| Permutation argument      | parallel repeatition      |
+| Combining constraints     | parallel repeatition      |
+| Lookup argument           | parallel repeatition      |
+| Polynomial equality test  | extension field           |
+| FRI protocol              | extension field + grinding|

 See [this 2023 paper](https://eprint.iacr.org/2023/1071) for more precise soundness arguments along these lines.
--- a/commentary/Recursion.md
+++ b/commentary/Recursion.md
@ -3,7 +3,7 @@ Recursive proofs

 One of Plonky2 main features is its support for efficient recursive proofs. In a recursive proof system, an outer circuit verifies an inner proof by re‑implementing the entire verification protocol as arithmetic constraints. This allows us to aggregate many proofs into one final (relatively small) proof. In Plonky2, recursion proceeds with the following steps:

-## 1. Virtual Targets
+### 1. Virtual Targets

 Before the outer (recursive) circuit can verify an inner proof, it must take the inner proof’s data as circuit targets. This is done by two main helper functions:

@ -48,7 +48,7 @@ Before the outer (recursive) circuit can verify an inner proof, it must take the
 Together, the `ProofTarget` and `VerifierCircuitTarget` allow the outer circuit to “see” all of the inner proof’s components and the associated verifier data, so that it can re‑execute the inner verifier’s checks as part of its own constraints.


-## 2. Re‑Computing Fiat–Shamir Challenges
+### 2. Re‑Computing Fiat–Shamir Challenges

 Verifying an inner proof requires re‑generating the random challenges that the inner prover originally got using the Fiat–Shamir heuristic. Plonky2’s inner proofs require several challenges (see above in section Fiat-Shamir Challenges). The outer (recursive) circuit recomputes these challenges from the committed data in the `VerifierCircuitTarget`. 

@ -68,7 +68,8 @@ the challenges produced are:
 - And the FRI challenges: **`fri_alpha`**, **`fri_betas`**, **`fri_pow_response`**, and **`fri_query_indices`**. See above in [FRI](FRI.md)  section for what these are. 


-## 3. Proof Verification With the Challenges
+### 3. Proof Verification With the Challenges
+
 Once the challenges are generated, the proof verification proceeds in-circuit (using the circuit builder) in similar way as out of the circuit, i.e. following the steps (in short):

 1. **Evaluating the Vanishing and Quotient Polynomials**
@ -102,7 +103,7 @@ Once the challenges are generated, the proof verification proceeds in-circuit (u

 For more details refer to the section on [FRI](FRI.md).

-## The Recursive Workflow in Practice
+### The Recursive Workflow in Practice

 The overall workflow when building a recursive proof in Plonky2 is as follows:

@ -120,7 +121,8 @@ The overall workflow when building a recursive proof in Plonky2 is as follows:
   - Once all the recursive verification constraints are added to the outer circuit, the outer circuit is built and proved as usual.


-## Notes on The VerifierData
+### Notes on The VerifierData
+
 Verifier data are important part of the recursive proof system because they contain the verification key for the inner circuit. that is, the circuit digest and the commitment to the constant and permutation (“sigma”) polynomials. Although the outer circuit only receives the common circuit data (which describes the structure and sizing of the proof components), that data alone does not bind the outer circuit to a specific inner circuit. So, a prover could provide any proof and verifier data that matches to the common data structure, even if they do not correspond to the intended inner circuit.

 To prevent this, the verifier data must be made available to the verifier by: either registering the verifier data as public input or hardcoding it in the outer circuit (this option would limit the outer circuit to the specific inner circuit).
@ -131,10 +133,11 @@ Exposing the verifier data as public inputs, requires the verifier to check it "

 It is technically possible to hardcode the verifier data in the circuit, but plonky2 doesn't seem to support this! might require a lot of refactoring! 

-## Recursion Tools
+### Recursion Tools
+
 Plonky2 implements a few (limited!) number of tools (functions) for recursion, mainly it is the following two: 

-### 1. Conditional Recursion
+#### 1. Conditional Recursion

 The outer (recursion) circuit can be built to conditionally verify one of two input inner proofs based on a Boolean flag. Helper function `conditionally_verify_proof` and `select_proof_with_pis` use a Boolean target to select between two sets of proof targets and verifier data. This allows a circuit, for example, to choose between verifying a “real” inner proof or a dummy proof (useful in cyclic recursion base cases). Selecting the proof and verifier data is done by basicly going through all the targets and selecting either one based on the condition, in the [code](https://github.com/0xPolygonZero/plonky2/blob/main/plonky2/src/recursion/conditional_recursive_verifier.rs) you will see all the (basic) "select" functions. 

@ -175,7 +178,7 @@ One thing to note is that if the "real" inner proof public input must be set to
 Lastly, the verifier data are set to the verifier data from the dummy circuit. Plonky2 automates setting the values for the dummy proof and verifier data targets using `DummyProofGenerator` so there is no need to keep track of the dummy proof targets. 
 `DummyProofGenerator` simply implements a `SimpleGenerator` and calls `set_proof_with_pis_target` and `set_verifier_data_target`. This means that all the fields required by the recursive verifier (such as the Merkle cap commitments, openings, and the FRI proof components) are populated with dummy values.

-### 2. Cyclic Recursion
+#### 2. Cyclic Recursion

 Cyclic recursion is used mainly for what is called IVC (incrementally verifiable computation). Meaning (in simple terms) that you create a circuit `C` that checks correct computation of function `f(s)` on the state `s` and returns an updated state `s'` and a proof $\pi$. Then you feed the `s'` and $\pi$ into another circuit that checks $\pi$ and runs the same circuit `C` but with input `s'` so this is basicly `f(f(s))`. Note that the second circuit also ensures that the output of the first circuit `s'` equals to the input to the second circuits. This process can be repeated many times as needed.

@ -184,10 +187,10 @@ In cyclic recursion, the outer circuit needs to verify either an actual inner pr
 The inner verifier data (which includes the `constants_sigmas_cap` and the `circuit_digest`) are registered as public inputs via the `add_virtual_verifier_data`. The outer circuit then “connects” (i.e. ensures equal values are set) this verifier data with its own using the function `conditionally_verify_cyclic_proof`. This ensures that every proof in the cycle uses the same verification key. At the verification step, to guarantee this consistency, the outer circuit verifier can invoke the function `check_cyclic_proof_verifier_data`. This function compares the verifier data obtained from the inner proof’s public inputs with the expected verifier data.


-## Recursion Examples
+### Recursion Examples


-### Example: Simple Recursion
+#### Example: Simple Recursion
 Now to actually do the recursion, we must first have a base circuit. So let's run an example of how this would work:

 Let's say we have a circuit with just dummy gates `NoopGate` that do nothing. So we first create such a circuit, fill in the witness values (no witnesses are here in our example), and then build and generate the proof. See below:
@ -257,7 +260,7 @@ let data = builder.build::<C>();
 data.verify(proof.clone())?; 
 ```

-### Example: Conditional Verification
+#### Example: Conditional Verification

 In some cases you may want the recursive circuit to choose between verifying a real inner proof or a dummy one. This is useful in conditional and cyclic recursion. The dummy proof is generated (using the `DummyProofGenerator`) to fill in when no “real” proof is present. 

@ -290,7 +293,7 @@ builder.conditionally_verify_proof_or_dummy::<C>(
 );
 ```

-### Example: Cyclic Recursion with Public Verifier Data
+#### Example: Cyclic Recursion with Public Verifier Data
 In cyclic recursion every inner proof must use the same verification key. For this purpose the verifier data is made public and then checked for consistency. The function check_cyclic_proof_verifier_data is used to enforce that the verifier data from the inner proof (from its public inputs) matches the expected verifier data. See the following example of this:

 ```rust
--- a/commentary/commentary.pdf
+++ b/commentary/commentary.pdf