* Initial implementation of quintic extensions.
* Update to/from_biguint() methods.
* Draft of fast multiplication on quintic extensions over 64-bit base.
* cargo fmt
* Typo.
* Document functions (a bit).
* Refactor reduction step.
* Change multiplication call so that LLVM generates better assembly.
* Use one main accumulator instead of two minor ones; faster reduce.
* Use one main accumulator in square too; clean up redundant code.
* Call faster routines from Mul and Square impls.
* Fix reduction function.
* Fix square calculation.
* Slightly faster reduction.
* Clean up names and types.
* cargo fmt
* Move extension field mul/sqr specialisations to their own file.
* Rename functions to have unique prefix.
* Add faster quadratic multiplication/squaring.
* Faster quartic multiplication and squaring.
* cargo fmt
* clippy
* Alternative reduce160 function.
* Typo.
* Remove alternative reduction function.
* Remove delayed reduction implementation of squaring.
* Enforce assumptions about extension generators.
* Make the accumulation variable a u32 instead of u64.
* Add test to trigger carry branch in reduce160.
* cargo fmt
* Some documentation.
* Clippy; improved comments.
* cargo fmt
* Remove redundant Square specialisations.
* Fix reduce*() visibility.
* Faster reduce160 from Jakub.
* Change mul-by-const functions to operate on 160 bits instead of 128.
* Move code for extensions of GoldilocksField to its own file.
* Implement a mul-add circuit in the ALU
The inputs are assumed to be `u32`s, while the output is encoded as four `u16 limbs`. Each output limb is range-checked.
So, our basic mul-add constraint looks like
out_0 + 2^16 out_1 + 2^32 out_2 + 2^48 out_3 = in_1 * in_2 + in_3
The right hand side will never overflow, since `u32::MAX * u32::MAX + u32::MAX < |F|`. However, the left hand side could overflow, even though we know each limb is less than `2^16`.
For example, an operation like `0 * 0 + 0` could have two possible outputs, 0 and `|F|`, both of which would satisfy the constraint above. To prevent these non-canonical outputs, we need a comparison to enforce that `out < |F|`.
Thankfully, `F::MAX` has all zeros in its low 32 bits, so `x <= F::MAX` is equivalent to `x_lo == 0 || x_hi != u32::MAX`. `x_hi != u32::MAX` can be checked by showing that `u32::MAX - x_hi` has an inverse. If `x_hi != u32::MAX`, the prover provides this (purported) inverse in an advice column.
See @bobbinth's [post](https://hackmd.io/NC-yRmmtRQSvToTHb96e8Q#Checking-element-validity) for details. That post calls the purported inverse column `m`; I named it `canonical_inv` in this code.
* fix
* PR feedback
* naming