* First draft of linking arithmetic Stark into the CTL mechanism.
* Handle {ADD,SUB,MUL}FP254 operations explicitly in `modular.rs`.
* Adjust argument order; add tests.
* Add CTLs for ADD, MUL, SUB, LT and GT.
* Add CTLs for {ADD,MUL,SUB}MOD, DIV and MOD.
* Add CTLs for {ADD,MUL,SUB}FP254 operations.
* Refactor the CPU/arithmetic CTL mapping; add some documentation.
* Minor comment fixes.
* Combine addcy CTLs at the expense of repeated constraint evaluation.
* Combine addcy CTLs at the expense of repeated constraint evaluation.
* Merge `*FP254` CTL into main CTL; rename some registers.
* Connect extra argument from CPU in binary ops to facilitate combining with ternary ops.
* Merge modular ops CTL into main CTL.
* Refactor DIV and MOD code into its own module.
* Merge DIV and MOD into arithmetic CTL.
* Clippy.
* Fixes related to merge.
* Simplify register naming.
* Generate u16 BN254 modulus limbs at compile time.
* Clippy.
* Add degree bits ranges for Arithmetic table.
Which can be used to compress two proofs into one. Each inner proof can be either
- an "EVM root" proof (which typically proves one transaction, though it could be 0 or more)
- another aggregation proof
The goal here is to end up with a single "root" circuit representing any EVM proof. I.e. it must verify each STARK, but be general enough to work with any combination of STARK sizes (within some range of sizes that we chose to support). This root circuit can then be plugged into our aggregation circuit.
In particular, for each STARK, and for each initial `degree_bits` (within a range that we choose to support), this adds a "shrinking chain" of circuits. Such a chain shrinks a STARK proof from that initial `degree_bits` down to a constant, `THRESHOLD_DEGREE_BITS`.
The root circuit then combines these shrunk-to-constant proofs for each table. It's similar to `RecursiveAllProof::verify_circuit`; I adapted the code from there and I think we can remove it after. The main difference is that now instead of having one verification key per STARK, we have several possible VKs, one per initial `degree_bits`. We bake the list of possible VKs into the root circuit, and have the prover indicate the index of the VK they're actually using.
This also partially removes the default feature of CTLs. So far we've used filters instead of defaults. Until now it was easy to keep supporting defaults just in case, but here maintaining support would require some more work. E.g. we couldn't use `exp_u64` any more, since the size delta is now dynamic, it can't be hardcoded. If there are no concerns, I'll fully remove the feature after.
Lots of little bugs!
- The Keccak sponge table's padding logic was wrong, it was mixing up the number of rows with the number of hashes.
- The Keccak sponge table's Keccak-looking data was wrong - input to Keccak-f should be after xor'ing in the block.
- The Keccak sponge table's logic-looking filter was wrong. We do 5 logic CTLs for any final-block row, even if some of the xors are with 0s from Keccak padding.
- The CPU was using the wrong/outdated output memory channel for its Keccak sponge and logic CTLs.
- The Keccak table just didn't have a way to filter out padding rows. I added a filter column for this.
- The Keccak table wasn't remembering the original preimage of a permutation; lookers were seeing the preimage of the final step. I added columns for the original preimage.
- `ctl_data_logic` was using the wrong memory channel
- Kernel bootloading generation was using the wrong length for its Keccak sponge CTL, and its `keccak_sponge_log` was seeing the wrong clock since it was called after adding the final bootloading row.
It was creating a new account with the hash of an empty storage trie, when really it should be a pointer to an empty storage trie. We can use 0 as this pointer since `@SEGMENT_TRIE_DATA[0] = 0 = @MPT_NODE_EMPTY`.
Also a couple tweaks that helped me debug, like moving the memory value range checks from the interpreter into `MemoryState`, so they're done in actual witness generation as well as interpreter tests.