* Halo2 style lookup arguments in System Zero
It's a really nice and simple protocol, particularly for the verifier since the constraints are trivial (aside from the underlying batched permutation checks, which we already support). See the [Halo2 book](https://zcash.github.io/halo2/design/proving-system/lookup.html) and this [talk](https://www.youtube.com/watch?v=YlTt12s7vGE&t=5237s) by @daira.
Previously we generated the whole trace in row-wise form, but it's much more efficient to generate these "permuted" columns column-wise. So I changed our STARK framework to accept the trace in column-wise form. STARK impls now have the flexibility to do some generation row-wise and some column-wise (without extra costs; there's a single transpose as before).
* sorting
* fixes
* PR feedback
* into_iter
* timing
* Initial implementation of quintic extensions.
* Update to/from_biguint() methods.
* Draft of fast multiplication on quintic extensions over 64-bit base.
* cargo fmt
* Typo.
* Document functions (a bit).
* Refactor reduction step.
* Change multiplication call so that LLVM generates better assembly.
* Use one main accumulator instead of two minor ones; faster reduce.
* Use one main accumulator in square too; clean up redundant code.
* Call faster routines from Mul and Square impls.
* Fix reduction function.
* Fix square calculation.
* Slightly faster reduction.
* Clean up names and types.
* cargo fmt
* Move extension field mul/sqr specialisations to their own file.
* Rename functions to have unique prefix.
* Add faster quadratic multiplication/squaring.
* Faster quartic multiplication and squaring.
* cargo fmt
* clippy
* Alternative reduce160 function.
* Typo.
* Remove alternative reduction function.
* Remove delayed reduction implementation of squaring.
* Enforce assumptions about extension generators.
* Make the accumulation variable a u32 instead of u64.
* Add test to trigger carry branch in reduce160.
* cargo fmt
* Some documentation.
* Clippy; improved comments.
* cargo fmt
* Remove redundant Square specialisations.
* Fix reduce*() visibility.
* Faster reduce160 from Jakub.
* Change mul-by-const functions to operate on 160 bits instead of 128.
* Move code for extensions of GoldilocksField to its own file.
* Batch multiple perm args into one Z and compute Z columnwise
It's slightly complex because we batch `constraint_degree - 1` permutation arguments into a single `Z` polynomial. This is a slight generalization of the [technique](https://zcash.github.io/halo2/design/proving-system/lookup.html) described in the Halo2 book.
Without this batching, we would simply have `num_challenges` random challenges (betas and gammas). With this batching, however, we need to use different randomness for each permutation argument within the same batch. Hence we end up generating `batch_size * num_challenges` challenges for all permutation arguments.
* Feedback + updates for recursion code
* Initial implementation of quintic extensions.
* Update to/from_biguint() methods.
* cargo fmt
* Fix call to test suite.
* Small optimisation in try_inverse().
* Replace multiplicative group generator and document requirement.
* Column definitions for addition, range checks & lookups
I implemented addition (unsigned for now) as an example of how the arithmetic unit can interact with the 16-bit range check unit.
Range checks and lookups aren't implemented yet.
* Missing constraints
* Tweaks to get tests passing
* Reorg registers into files
* Minor
* trim_to_len helper function
Seems a little nicer IMO to only remove a certain number of zeros, vs removing all trailing zeros then re-adding some.
* PR feedback
* Rename `PrimeField` -> `Field64`
And add TODOs for moving around various methods which aren't well-defined in their current traits, or would be well-defined in a supertrait.
* fix test
* TODOs as per PR feedback
* Split into crates
I kept other changes to a minimum, so 95% of this is just moving things. One complication that came up is that since `PrimeField` is now outside the plonky2 crate, these two impls now conflict:
```
impl<F: PrimeField> From<HashOut<F>> for Vec<u8> { ... }
impl<F: PrimeField> From<HashOut<F>> for Vec<F> { ... }
```
with this note:
```
note: upstream crates may add a new impl of trait `plonky2_field::field_types::PrimeField` for type `u8` in future versions
```
I worked around this by adding a `GenericHashOut` trait with methods like `to_bytes()` instead of overloading `From`/`Into`. Personally I prefer the explicitness anyway.
* Move out permutation network stuff also
* Fix imports
* Fix import
* Also move out insertion
* Comment
* fmt
* PR feedback