update the commentary

2026-01-02 13:53:07 +00:00 · 2025-01-30 14:54:43 +01:00 · 2025-01-30 14:54:43 +01:00 · eeea733bd5
commit eeea733bd5
parent d07661d5b9
5 changed files with 146 additions and 27 deletions
--- a/commentary/FRI.md
+++ b/commentary/FRI.md
@ -1,9 +1,9 @@
-FRI commitment
+FRI protocol
--------------
+------------
 Plonky2 uses a "wide" FRI commitment (committing to whole rows), and then a batched opening proofs for all the 4 commitments (namely: constants, witness, running product and quotient polynomial).
-### Commitment
+### Initial Merkle commitment
 To commit to a matrix of size $2^n\times M$, the columns, interpreted as values of polynomials on a multiplicative subgroup, are "low-degree extended", that is, evaluated (via an IFFT-FFT pair) on a (coset of a) larger multiplicative subgroup of size $2^{n+\mathsf{rate}^{-1}}$. In the standard configuration we have $\mathsf{rate=1}/8$, so we get 8x larger columns, that is, size $2^{n+3}$. The coset Plonky2 uses is the one shifted by the multiplicative generator of the field
@ -11,15 +11,22 @@ $$ g := \mathtt{0xc65c18b67785d900} = 14293326489335486720\in\mathbb{F} $$
 Note: There may be some reordering of the LDE values (bit reversal etc) which I'm unsure about at this point.
-When configured for zero-knowledge, each row is "blinded" by the addition of `SALT_SIZE = 4` extra random columns (huh?).
+When configured for zero-knowledge, each _row_ (Merkle leaf) is "blinded" by the addition of `SALT_SIZE = 4` extra random _columns_ (huh?).
-Finally, each row is hashed (well, if the number of columns is at most 4, they are leaved as they are, but this should never happen in practice), and a Merkle tree is built on the top of these leaf hashes.
+Finally, each row is hashed (well, if the number of columns is at most 4, they are left as they are, but this should never happen in practice), and a Merkle tree is built on the top of these leaf hashes.
 So we get a Merkle tree whose leaves correspond to full rows ($2^{n+3}$ leaves).
 Note that in Plonky2 we have in total 4 such matrices, resulting in 4 Merkle caps:
 - constants (including selectors and sigmas)
 - the witness (135 columns)
 - the wiring (and lookup) protocol's running products (resp. sums)
 - the quotient polynomial
 #### Merkle caps
-Instead of using a single Merkle root to commit, we can use any fixed layer (so the commitment will be $2^k$ hashes instead of just $1$). As the paths become shorter, this is a tradeoff between commitment size and proof size / verification cost. In case we have a lot of Merkle openings for a given tree (like in FRI low-degree proofs here), clearly this makes sense.
+Instead of using a single Merkle root to commit, we can use any fixed layer (so the commitment will be $2^k$ hashes instead of just $1$). As the paths become shorter, this is a tradeoff between commitment size and proof size / verification cost. In case we have a lot of Merkle openings for a given tree (like in FRI low-degree proofs here), clearly this makes sense. In the default configuration Plonky2 uses Merkle caps of size $2^4=16$
 ### FRI configuration
@ -34,9 +41,9 @@ struct FriConfig {
 }
 ```
-Here the "reduction strategy" defines how to select the layers. For example it can always do 8->1 reduction (instead of the naive 2->1), or optimize and have different layers; also where to stop.
+Here the "reduction strategy" defines how to select the layers. For example it can always do 8->1 reduction (instead of the naive 2->1), or optimize and have different layers; also where to stop: If you already reduced to say a degree 3 polynomial, it's much more efficient to just send the 8 coefficients than doing 3 more folding steps.
-The "default" `standard_recursion_config` uses rate = $1/8$ (rate_bits = 3), markle cap height = 4, proof of work (grinding) = 16 bits, query rounds = 28, and reduction startegy of arity $2^4$ and final polynomial having degree $2^5$.
+The "default" `standard_recursion_config` uses rate = $1/8$ (rate_bits = 3), markle cap height = 4, proof of work (grinding) = 16 bits, query rounds = 28, reduction startegy of arity $2^4$ (16->1 folding) and final polynomial having degree (at most) $2^5$. For example for a recursive proof fitting into $2^{12}$ rows, we have the degree sequence $2^{12}\to 2^{8} \to 2^4$, with the final polynomial having degree $2^4 = 16 \le 2^5$
 For recursion you don't want fancy reduction strategies, it's better to have something uniform.
@ -76,13 +83,105 @@ See [Challenges.md](Challenges.md) for how these Fiat-Shamir challenges are gene
 Remark: **`batch_fri`**: There is also `batch_fri` subdirectory in the repo, which is not clear to me what actually does, as it doesn't seems to be used anywhere...
-### Low-degree test
+### The FRI protocol
-TODO: describe the FRI low-degree test
+The FRI protocol proves that a committed Reed-Solomon codeword, which is a priori just a vector, is in fact "close" to a codeword (with high probability).
-### Opening proofs
+This is done in two phases: The commit phase, and the query phase. Note that this "commit" is not the same as the above commitment to the witness etc!
-TODO: describe the opening proofs
+In Plonky2, we want to execute this protocol on many polynomials (remember that each column is a separate polynomial)! That would be very expensive, so instead they (well, not exactly them) are combined by the prover with the random challenge $\alpha$, and the protocol is executed on this combined polynomial.
 #### Combined polynomial
 So we want to prove that $F_{i}(x_i)=y_{i}$ where $F_{i}(X)$ are a set of (column) polynomials. In our case $\{F_i\}$ consists of two _batches_, and $x_i\in\{\zeta,\omega\zeta\}$ are constants on the batches. The first batch of the all the column polynomials, the second only those which needs to be evaluated at $\omega\zeta$ too (`"zs"` and the lookup polynomials). We can then form the combined quotient polynomial:
 $$ P(X) := \sum_{i} \alpha^{i} \frac{\;F_i(X) - y_i\;}{X-x_i} = \sum_b \frac{\alpha^{k_b}}{X - x_b} \sum_{j=0}^{m_b} 
 \alpha^j \big[F_{b,j}(X) - y_{b,j}\big]$$
 In practice this is done per batch (see the double sum), because the division is more efficient that way. This polynomial $P(X)$ is what we execute the FRI protocol on, proving a degree bound.
 Remark: In the actual protocol, $F_i$ will be the columns and $y_i$ will be the openings. Combining the batches, we end up with 2 terms:
 $$P(X) = P_0(X) + \alpha^M\cdot P_1(X) = 
 \frac{G_0(X)-Y_0}{X-\zeta} + \alpha^M\cdot \frac{G_1(X)-Y_1}{X-\omega\zeta}$$
 The pair $(Y_0,Y_1)$ are called "precomputed reduced openings" in the code (calculated from the opening set, involving _two rows_), and $X$ will be substituted with $X\mapsto \eta^{\mathsf{query\_index}}$ (calculated from the "initial tree proofs", involving _one row_). Here $\eta$ is the generator of the LDE subgroup, so $\omega = \eta^{1/\rho}$.
 #### Commit phase
 Recall that we have a RS codeword of size $2^{n+(1/\rho)}$ (encoding the combined polynomial $P(X)$ above), which the prover committed to. 
 The prover then repeatedly "folds" these vectors using the challenges $\beta_i$, until it gets something with low enough degree, then sends the coefficients of the corresponding polynomial in clear.
 As example, consider a starting polynomial of degree $2^{13}-1$. With $\rho=1/8$ this gives a codeword of size $2^{16}$. This is committed to (but the see the note below!). Then a challenge $\beta_0$ is generated, and we fold this (with an arity of $2^4$), getting a codeword of size $2^{12}$, representing a polynomial of degree $2^9-1$. We commit to this too. Then generate another challenge $\beta_1$, and fold again with that. Now we get a codeword of size $2^8$, however, this is represented by a polynomial of at most degree $31$, so we just send the 32 coefficients of that instead of a commitment.
 Note: as an optimization, when creating these Merkle trees, we always put _cosets_ of size $2^{\mathsf{arity}}$ on the leaves, as we will have to open them all together anyway. Furthermore, we use _Merkle caps_, so the proof lengths are shorter by the corresponding amount (4 by default, because we have 16 mini-roots in a cap). So the Merkle proofs are for a LDE size $2^k$ have length $k-\mathsf{arity\_bits}-\mathsf{cap\_bits}$, typically $k-8$.
 | step | Degree     | LDE size | Tree depth |prf. len | fold with |send & absorb |
 |------|------------|----------|------------|---------|-----------|--------------|
 |   0  | $2^{13}-1$ | $2^{16}$ |    12      |   8     | $\beta_0$ | Merkle cap   |
 |   1  |  $2^{9}-1$ | $2^{12}$ |     8      |   4     | $\beta_1$ | Merkle cap   |
 |   2  |  $2^{5}-1$ |   $2^8$  |    --      |  --     |    --     | $2^5$ coeffs |
 #### Grinding
 At this point (why here?) the grinding challenge is executed.
 #### Query phase
 This is repeated `num_query_rounds = 28` times (in the default configuration).
 A query round consists of two pieces:
 - the initial tree proof
 - and the folding steps
 The initial tree proof consist of a single row of the 4 LDE matrices (the index of this row is determined by the query index challenges), and a Merkle proof against the 4 commited Merkle caps.
 The steps consist of pairs of evaluations on cosets (of size $2^{\mathsf{arity}}$) and corresponding Merkle proofs against the commit phase Merkle caps.
 From the "initial tree proof" values $\{F_j(\eta^k)\}$ and the openings $\{y_j,z_j\}$, we can evaluate the combined polynomial at $\eta^k := \eta^{\mathsf{query\_idx}}$:
 $$P(\eta^k) = 
 \frac{1}{\eta^k - \zeta} \sum_{j=0}^{M_1-1} 
 \alpha^j \big[ F_j(\eta^k) - y_j\big]
 +
 \frac{1}{\eta^k - \omega\zeta} \sum_{j=M_1}^{M_2-1} 
 \alpha^{j} \big[ F_j(\eta^k) - z_j\big]
 $$
 Then in each folding step, a whole coset is opened in the "upper layer", one element of which was known from the previous step (or in the very first step, can be computed from the "initial tree proofs" and the openings themselves) which is checked to match. Then the folded element of the next layer is computed by a small $2^\mathsf{arity}$ sized FFT, and this is repeated until the final step.
 ### FRI verifier cost
 We can try and estimate the cost of the FRI verifier. Presumably the cost will be dominated by the hashing, so let's try to count the hash permutation calls. Let the circuit size be $N=2^n$ rows.
 - public input: $\lceil \#\mathsf{PI} / 8 \rceil$ (we hash with a rate 8 sponge)
 - challenges: approximately 95-120. The primary variations seem to be number (and size) of commit phase Merkle caps, and the size of the final polynomial
 - then for each query round (typically 28 of them), approx 40-100 per round:
    - check the opened LDE row against the 4 matrix commitments:
        - hash a row (typical sizes: 85, 135, 20, 16; resulting in 11, 17, 3 and 2 calls, respectively)
        - check a Merkle proof (size `n+3-4 = n-1`)
        - in total `33 + 4(n-1)` calls
    - check the folding steps
        - for each step, hash the coset (16 $\widetilde{\mathbb{F}}$ elements, that's 4 permutations)
        - then recompute the Merkle root: the first one is `n+3-8`, the next is `n+3-12` etc
 For example in the case of a recursive proof of size $N=2^{12}$, we have 114 permutation calls for the challenges, and then $28 \times (77+11+7)$, resulting in total
 $$114 + 28\times (77+11+7) = 2774$$
 Poseidon permutation calls (with `t=12`), which matches the actual code.
 We can further break down this to sponge vs compression calls. Let's concentrate on the FRI proof only, as that dominates:
 - sponge is (33 + 4 + 4 ...) per round 
 - compression is 4(n - 1) + (n-5) + (n-9) + ... per round
 In case of $n=12$, we have 41 (sponge) vs. 54 (compression). As compression is more than half of the cost, it would make sense to optimize that to `t=8` (say with the Jive construction); on the other hand, use a larger arity (say `t=16`) for the sponge.
 It seems also worthwhile to use even wider Merkle caps than the default $2^4$.
 ### Soundness
--- a/commentary/Gates.md
+++ b/commentary/Gates.md
@ -9,13 +9,13 @@ On the user side, it seems that custom gates are abstracted away behind "gadgets
 Unfortunately the actual gate equations never appear explicitly in the code, only routines to calculate them (several ones for different contexts...), which 1) makes it hard to read and debug; and 2) makes the code very non-modular. 
-However, there is also a good reason for this: The actual equations, if described as (multivariate) polynomials, can be very big and thus inefficient to calculate, especially in the case of the Poseidon gate. This is because of the lack of sharing between intermediate variables. Instead, you need to described _an efficient algorithm_ to compute these polynomials. 
+However, there is also a good reason for this: The actual equations, if described as (multivariate) polynomials, can be very big and thus inefficient to calculate, especially in the case of the Poseidon gate. This is because of the lack of sharing between intermediate variables. Instead, you need to described _an efficient algorithm_ to compute these polynomials (think for example Horner evaluation vs. naive polynomial evaluation). 
-Note that while in theory a row could contain several gates, the way Plonky2 organizes its gate equations would make this unsound (it would also complicate the system even more). See the details at the [protocol description](Protocol.md).
+Note that while in theory a row could contain several non-overlapping gates, the way Plonky2 organizes its gate equations would make this unsound (it would also complicate the system even more). See the details at the [protocol description](Protocol.md).
 ### List of gates
-The default gates are:
+The default gate set is:
    - arithmetic_base
    - arithmetic_extension
@ -38,6 +38,8 @@ The default gates are:
 These evaluate the constraint $w = c_0xy + c_1z$, either in the base field or in the quadratic extension field, possibly in many copies, but with shared $c_0,c_1\in\mathbb{F}$ (these seem to be always in the base field?)
 Note: in ArithmeticExtensionGate, since the constraints are normally already calculated in the extension field, we in fact compute in a doubly-extended field $\widetilde{\mathbb{F}}[Y]/(Y^2-7)$! Similarly elsewhere when we reason about $\widetilde{\mathbb{F}}$.
 ### Base sum gate
 This evaluates the constraint $x = \sum_{i=0}^k a_i B^i$ (where $B$ is the radix or base). It can be used for example for simple range checks.
@ -58,45 +60,49 @@ I'm not convinced this is the right design choice, but probably depends on the c
 ### Coset interpolation gate
-TODO. This is presumably used for recursion.
+This gate interpolates a set of values $y_i\in\widetilde{\mathbb{F}}$ on a coset $\eta H$ of a small subgroup $H$ (typically of size $2^4$), and evaluates the resulting polynomial at an arbitrary location $\zeta\in\widetilde{\mathbb{F}}$.
-I think what it does is to apply the barycentric formula to evaluate a polynomial, defined by its values $y_i$ on a (small) coset $\eta H$ at a given point $\zeta$.
+The formula is the [barycentric form of Lagrange polynomials](https://en.wikipedia.org/wiki/Lagrange_polynomial#Barycentric_form), but slightly modified to be chunked and to be iterative.
 This is used for recursion.
 ### Exponentiation
 This computes $y=x^k$ using the standard fast exponentiation algorithm, where $k$ is number fitting into some number of bits (depending on the row width).
-I believe it first decomposes $k$ into digits, and then does the normal thing. Though for some reason it claims to be a degree $4$ gate, and I think it should be degree $3$...
+I believe it first decomposes $k$ into digits, and then does the normal thing. Though for some reason it claims to be a degree $4$ gate, and I think it should be degree $3$... TODO: sanity check this
 ### Lookups
 There are two kind of lookup gates, one containing $(\mathsf{inp},\mathsf{out})$ pairs, and the other containing $(\mathsf{inp},\mathsf{out},\mathsf{mult})$ triples.
-Neither imposes any constraint, as lookups are different from usual gates, and the behaviour is hardcoded in the Plonk protocol.
+Neither imposes any constraint, as lookups are different from usual gates, as their behaviour is hardcoded in the protocol.
-The 2 gates (`LookupGate` for the one without multiplicities and `LookupTableGate` for the one with) are because Plonky2 uses a logarithmic derivative based lookup argument.
+The 2 gates (`LookupGate` for the one without multiplicities and `LookupTableGate` for the one with) encode the lookup usage and the table(s) themselves, respectively. Plonky2 uses a logarithmic derivative based lookup argument.
 See [Lookups.md](Lookups.md) for more details.
 ### Multiplication extension gate
-I think this is the same as the arithmetic gate for the field extension, except that it misses the addition. So the constraint is $z = c_0xy \in \widetilde{\mathbb{F}}$.
+This is the same as `ArithmeticExtensionGate`, except that it misses the addition. So the constraint is $z = c_0xy \in \widetilde{\mathbb{F}}$. In exchange, you can pack 13 of these into 80 columns instead of 10.
 ### Noop gate
-This doesn't enforce any constraint. It's used as a placeholder so each row corresponds to exactly a single gate, and also lookup tables require an empty row (?).
+This doesn't enforce any constraint. It's used as a placeholder so each row corresponds to exactly a single gate; and also lookup tables as implemented require an empty row.
 ### Poseidon gate
-These compute Poseidon hash (with custom constants and MDS matrix). For some reason there is a separate gate only for multiplying by the MDS matrix, not exactly clear where is that used (possibly during recursion).
+This computes the Poseidon hash (with Plonky2's custom constants and MDS matrix). 
 The poseidon gate packs all the (inputs of the) nonlinear sbox-es into a 135 wide row, this results in the standard configuration being 135 advice columns.
 Poseidon hash is used for several purposes: hashing the public inputs into 4 field elements; the recursion verification of FRI; and generating challenges in the verifier.
 See [Poseidon.md](Poseidon.md) for more details.
 ### Poseidon MDS gate
-This appears to compute the multiplication by the 12x12 MDS matrix, but with the input vector consisting of field extension elements. It is used in the recursive proof circuit.
+This simply computes the multiplication by the 12x12 MDS matrix (so all linear constraints), but with the input vector consisting of field extension elements. It is used in the recursive proof circuit.
 ### Public input gate
@ -125,6 +131,14 @@ Or at least that's how I would do it :)
 The degree of the gate is $n+1$, so they probably inline the above selector definitions.
 In the actual implementation, they repeat this as many times as it fits in the routed wires, followed by (at most) 2 cells used the same way as in the constant gate, finally followed by the bit decompositions
 So the cells look like (for `n=4`): `copies ++ consts ++ bits` with 
    copies = [ i,a[i],a0,...,a15; j,b[j],b0,...b15; ... ]
    consts = [ c0, c1 ]
    bits   = [ i0,i1,2,i3; j0,j1,j2,j3; ...; 0... ]
 ### Reducing gates
 These compute $y = \sum_{i=0}^k \alpha^i\cdot c_i$ with the coefficients $c_i$ in either the base field or the extension field, however with $\alpha\in\widetilde{ \mathbb{F}}$ always in the extension field. 
--- a/commentary/Layout.md
+++ b/commentary/Layout.md
@ -14,6 +14,14 @@ These 4 matrices are committed separately, as you have to commit them in separat
 For some of them we also need the "next row" opening at $\omega\cdot\zeta$ (namely: ``"zs"``, ``"lookup_zs"``); for the rest we don't.
          constant                witness              permutation         qotient
    +---+---+----------+   +----------+--------+   +--+-------+------+   +---------+   
    | s | k |  sigmas  |   |  routed  | advice |   |zs|partial|lookup|   |         |
    | e | s |          |   |          |        |   |  |       |R     |   |         |
    | l | t |          |   |          |        |   |  |       |E     |   |         |    
    +---+---+----------+   +----------+--------+   +--+-------+------+   +---------+
      #   2      80             80        55        r    9*r    7*r          8*r
 ### Constant columns
 These columns define the structure of the circuit (together with some metadata). They consist of:
@ -63,5 +71,3 @@ This polynomial (with $8\times 2^n$ coefficients) is then chunked into 8 columns
 However, this is also repeated $r$ times, resulting in $(8\cdot r)$ columns.
--- a/commentary/Selectors.md
+++ b/commentary/Selectors.md
@ -46,4 +46,4 @@ The Poseidon gate has degree 7, while the degree limit is 8, so we can only use
 The arithmetic gate has degree 3 (because $\deg(c_0xy)=3$: the constant coefficients also count!); the noop has degree 0 and both the constant and public input gates have degree 1. As $4+\max(3,0,1,1)=7\le 9$ this still fits.
-The constant columns contain the $c_0,c_1$ constants for the arithmetic gates (they are all 1 here); also the  values for the constant gates. For the remaining gates (Poseidon, public input and noop) they are simply set to zero.
+The constant columns contain the $c_0,c_1$ constants for the arithmetic gates (they are all 1 here); also the  values for the constant gates. For the remaining gates (Poseidon, public input and noop) they are simply set to zero.
--- a/commentary/commentary.pdf
+++ b/commentary/commentary.pdf
`@ -46,4 +46,4 @@ The Poseidon gate has degree 7, while the degree limit is 8, so we can only use`

	`The arithmetic gate has degree 3 (because $\deg(c_0xy)=3$: the constant coefficients also count!); the noop has degree 0 and both the constant and public input gates have degree 1. As $4+\max(3,0,1,1)=7\le 9$ this still fits.`	`The arithmetic gate has degree 3 (because $\deg(c_0xy)=3$: the constant coefficients also count!); the noop has degree 0 and both the constant and public input gates have degree 1. As $4+\max(3,0,1,1)=7\le 9$ this still fits.`

	`The constant columns contain the $c_0,c_1$ constants for the arithmetic gates (they are all 1 here); also the values for the constant gates. For the remaining gates (Poseidon, public input and noop) they are simply set to zero.`	`The constant columns contain the $c_0,c_1$ constants for the arithmetic gates (they are all 1 here); also the values for the constant gates. For the remaining gates (Poseidon, public input and noop) they are simply set to zero.`