* Halo2 style lookup arguments in System Zero
It's a really nice and simple protocol, particularly for the verifier since the constraints are trivial (aside from the underlying batched permutation checks, which we already support). See the [Halo2 book](https://zcash.github.io/halo2/design/proving-system/lookup.html) and this [talk](https://www.youtube.com/watch?v=YlTt12s7vGE&t=5237s) by @daira.
Previously we generated the whole trace in row-wise form, but it's much more efficient to generate these "permuted" columns column-wise. So I changed our STARK framework to accept the trace in column-wise form. STARK impls now have the flexibility to do some generation row-wise and some column-wise (without extra costs; there's a single transpose as before).
* sorting
* fixes
* PR feedback
* into_iter
* timing
* Implement a mul-add circuit in the ALU
The inputs are assumed to be `u32`s, while the output is encoded as four `u16 limbs`. Each output limb is range-checked.
So, our basic mul-add constraint looks like
out_0 + 2^16 out_1 + 2^32 out_2 + 2^48 out_3 = in_1 * in_2 + in_3
The right hand side will never overflow, since `u32::MAX * u32::MAX + u32::MAX < |F|`. However, the left hand side could overflow, even though we know each limb is less than `2^16`.
For example, an operation like `0 * 0 + 0` could have two possible outputs, 0 and `|F|`, both of which would satisfy the constraint above. To prevent these non-canonical outputs, we need a comparison to enforce that `out < |F|`.
Thankfully, `F::MAX` has all zeros in its low 32 bits, so `x <= F::MAX` is equivalent to `x_lo == 0 || x_hi != u32::MAX`. `x_hi != u32::MAX` can be checked by showing that `u32::MAX - x_hi` has an inverse. If `x_hi != u32::MAX`, the prover provides this (purported) inverse in an advice column.
See @bobbinth's [post](https://hackmd.io/NC-yRmmtRQSvToTHb96e8Q#Checking-element-validity) for details. That post calls the purported inverse column `m`; I named it `canonical_inv` in this code.
* fix
* PR feedback
* naming
* Batch multiple perm args into one Z and compute Z columnwise
It's slightly complex because we batch `constraint_degree - 1` permutation arguments into a single `Z` polynomial. This is a slight generalization of the [technique](https://zcash.github.io/halo2/design/proving-system/lookup.html) described in the Halo2 book.
Without this batching, we would simply have `num_challenges` random challenges (betas and gammas). With this batching, however, we need to use different randomness for each permutation argument within the same batch. Hence we end up generating `batch_size * num_challenges` challenges for all permutation arguments.
* Feedback + updates for recursion code
* Column definitions for addition, range checks & lookups
I implemented addition (unsigned for now) as an example of how the arithmetic unit can interact with the 16-bit range check unit.
Range checks and lookups aren't implemented yet.
* Missing constraints
* Tweaks to get tests passing
* Reorg registers into files
* Minor