* Add corresponding arithmetic operations to shift ones
* Include SHL/SHR in the arithmetic CTL
* Prevent overflow
* Expand documentation for ctl_data_ternops()
* First draft of linking arithmetic Stark into the CTL mechanism.
* Handle {ADD,SUB,MUL}FP254 operations explicitly in `modular.rs`.
* Adjust argument order; add tests.
* Add CTLs for ADD, MUL, SUB, LT and GT.
* Add CTLs for {ADD,MUL,SUB}MOD, DIV and MOD.
* Add CTLs for {ADD,MUL,SUB}FP254 operations.
* Refactor the CPU/arithmetic CTL mapping; add some documentation.
* Minor comment fixes.
* Combine addcy CTLs at the expense of repeated constraint evaluation.
* Combine addcy CTLs at the expense of repeated constraint evaluation.
* Merge `*FP254` CTL into main CTL; rename some registers.
* Connect extra argument from CPU in binary ops to facilitate combining with ternary ops.
* Merge modular ops CTL into main CTL.
* Refactor DIV and MOD code into its own module.
* Merge DIV and MOD into arithmetic CTL.
* Clippy.
* Fixes related to merge.
* Simplify register naming.
* Generate u16 BN254 modulus limbs at compile time.
* Clippy.
* Add degree bits ranges for Arithmetic table.
Lots of little bugs!
- The Keccak sponge table's padding logic was wrong, it was mixing up the number of rows with the number of hashes.
- The Keccak sponge table's Keccak-looking data was wrong - input to Keccak-f should be after xor'ing in the block.
- The Keccak sponge table's logic-looking filter was wrong. We do 5 logic CTLs for any final-block row, even if some of the xors are with 0s from Keccak padding.
- The CPU was using the wrong/outdated output memory channel for its Keccak sponge and logic CTLs.
- The Keccak table just didn't have a way to filter out padding rows. I added a filter column for this.
- The Keccak table wasn't remembering the original preimage of a permutation; lookers were seeing the preimage of the final step. I added columns for the original preimage.
- `ctl_data_logic` was using the wrong memory channel
- Kernel bootloading generation was using the wrong length for its Keccak sponge CTL, and its `keccak_sponge_log` was seeing the wrong clock since it was called after adding the final bootloading row.
* First parts of shift implementation.
* Disable range check errors.
* Tidy up ASM.
* Update comments; fix some .sum() expressions.
* First full draft of shift left/right.
* Missed a +1.
* Clippy.
* Address Jacqui's comments.
* Add comment.
* Fix missing filter.
* Address second round of comments from Jacqui.
* Make jumps, logic, and syscalls read from/write to memory columns
* Change CTL convention (outputs precede inputs)
* Change convention so outputs follow inputs in memory channel order
I was thinking we could have two sets of shared columns:
- First, a set of "core" columns which would contain instruction decoding registers during an execution cycle, or some counter data during a kernel bootloading cycle.
- Second, a set of "general" columns which would be more general-purpose. For now it could contain "looking" columns for most CTLs (Keccak, arithmetic and logic; NOT memory since memory can be used simultaneously with the others). It could potentially be reused for other things too, such as the registers used for `EQ` and `IS_ZERO` (but I know it's nontrivial to share those since we would need to use lower-degree constraints, so I wouldn't bother for now).
This PR implements just the latter. If it looks good I'll proceed with the former afterward.