* Halo2 style lookup arguments in System Zero
It's a really nice and simple protocol, particularly for the verifier since the constraints are trivial (aside from the underlying batched permutation checks, which we already support). See the [Halo2 book](https://zcash.github.io/halo2/design/proving-system/lookup.html) and this [talk](https://www.youtube.com/watch?v=YlTt12s7vGE&t=5237s) by @daira.
Previously we generated the whole trace in row-wise form, but it's much more efficient to generate these "permuted" columns column-wise. So I changed our STARK framework to accept the trace in column-wise form. STARK impls now have the flexibility to do some generation row-wise and some column-wise (without extra costs; there's a single transpose as before).
* sorting
* fixes
* PR feedback
* into_iter
* timing
* Initial implementation of quintic extensions.
* Update to/from_biguint() methods.
* Draft of fast multiplication on quintic extensions over 64-bit base.
* cargo fmt
* Typo.
* Document functions (a bit).
* Refactor reduction step.
* Change multiplication call so that LLVM generates better assembly.
* Use one main accumulator instead of two minor ones; faster reduce.
* Use one main accumulator in square too; clean up redundant code.
* Call faster routines from Mul and Square impls.
* Fix reduction function.
* Fix square calculation.
* Slightly faster reduction.
* Clean up names and types.
* cargo fmt
* Move extension field mul/sqr specialisations to their own file.
* Rename functions to have unique prefix.
* Add faster quadratic multiplication/squaring.
* Faster quartic multiplication and squaring.
* cargo fmt
* clippy
* Alternative reduce160 function.
* Typo.
* Remove alternative reduction function.
* Remove delayed reduction implementation of squaring.
* Enforce assumptions about extension generators.
* Make the accumulation variable a u32 instead of u64.
* Add test to trigger carry branch in reduce160.
* cargo fmt
* Some documentation.
* Clippy; improved comments.
* cargo fmt
* Remove redundant Square specialisations.
* Fix reduce*() visibility.
* Faster reduce160 from Jakub.
* Change mul-by-const functions to operate on 160 bits instead of 128.
* Move code for extensions of GoldilocksField to its own file.