* Initial implementation of quintic extensions.
* Update to/from_biguint() methods.
* Draft of fast multiplication on quintic extensions over 64-bit base.
* cargo fmt
* Typo.
* Document functions (a bit).
* Refactor reduction step.
* Change multiplication call so that LLVM generates better assembly.
* Use one main accumulator instead of two minor ones; faster reduce.
* Use one main accumulator in square too; clean up redundant code.
* Call faster routines from Mul and Square impls.
* Fix reduction function.
* Fix square calculation.
* Slightly faster reduction.
* Clean up names and types.
* cargo fmt
* Move extension field mul/sqr specialisations to their own file.
* Rename functions to have unique prefix.
* Add faster quadratic multiplication/squaring.
* Faster quartic multiplication and squaring.
* cargo fmt
* clippy
* Alternative reduce160 function.
* Typo.
* Remove alternative reduction function.
* Remove delayed reduction implementation of squaring.
* Enforce assumptions about extension generators.
* Make the accumulation variable a u32 instead of u64.
* Add test to trigger carry branch in reduce160.
* cargo fmt
* Some documentation.
* Clippy; improved comments.
* cargo fmt
* Remove redundant Square specialisations.
* Fix reduce*() visibility.
* Faster reduce160 from Jakub.
* Change mul-by-const functions to operate on 160 bits instead of 128.
* Move code for extensions of GoldilocksField to its own file.
* Implement a mul-add circuit in the ALU
The inputs are assumed to be `u32`s, while the output is encoded as four `u16 limbs`. Each output limb is range-checked.
So, our basic mul-add constraint looks like
out_0 + 2^16 out_1 + 2^32 out_2 + 2^48 out_3 = in_1 * in_2 + in_3
The right hand side will never overflow, since `u32::MAX * u32::MAX + u32::MAX < |F|`. However, the left hand side could overflow, even though we know each limb is less than `2^16`.
For example, an operation like `0 * 0 + 0` could have two possible outputs, 0 and `|F|`, both of which would satisfy the constraint above. To prevent these non-canonical outputs, we need a comparison to enforce that `out < |F|`.
Thankfully, `F::MAX` has all zeros in its low 32 bits, so `x <= F::MAX` is equivalent to `x_lo == 0 || x_hi != u32::MAX`. `x_hi != u32::MAX` can be checked by showing that `u32::MAX - x_hi` has an inverse. If `x_hi != u32::MAX`, the prover provides this (purported) inverse in an advice column.
See @bobbinth's [post](https://hackmd.io/NC-yRmmtRQSvToTHb96e8Q#Checking-element-validity) for details. That post calls the purported inverse column `m`; I named it `canonical_inv` in this code.
* fix
* PR feedback
* naming
* Batch multiple perm args into one Z and compute Z columnwise
It's slightly complex because we batch `constraint_degree - 1` permutation arguments into a single `Z` polynomial. This is a slight generalization of the [technique](https://zcash.github.io/halo2/design/proving-system/lookup.html) described in the Halo2 book.
Without this batching, we would simply have `num_challenges` random challenges (betas and gammas). With this batching, however, we need to use different randomness for each permutation argument within the same batch. Hence we end up generating `batch_size * num_challenges` challenges for all permutation arguments.
* Feedback + updates for recursion code
* Initial implementation of quintic extensions.
* Update to/from_biguint() methods.
* cargo fmt
* Fix call to test suite.
* Small optimisation in try_inverse().
* Replace multiplicative group generator and document requirement.