As discussed, it seems like the batch opening argument will be a significant cost, and we can reduce that cost by not including shifted openings (except for `Z`s which need them).
... by switching to Rescue Prime (which has a smaller security margin), and precomputing an addition chain for the exponent used in the cubic root calculation. Also adds a benchmark.
I think this is the recommended way to apply Fiat-Shamir, to avoid any possible attacks like taking someone else's proof and using it to prove a slightly different statement.
Does this make sense? I think other libraries tend to include the leaf's index (either as an integer, or a series of bits indicating left/right turns) as part of a "proof". In FRI, the leaf indices are chosen by the verifier, so I thought that approach might be sort of redundant. Let me know what you think though.
... and other minor refactoring.
`bench_recursion` will be the default bin run by `cargo run`; the otheres can be selected with the `--bin` flag.
We could probably delete some of the other binaries later. E.g. `field_search` might not be useful any more. `bench_fft` should maybe be converted to a benchmark (although there are some pros and cons, e.g. the bench framework has a minimum number of runs, and isn't helpful in testing multi-core performance).
When we had a large field, we could just pick random shifts, and get disjoint cosets with high probability. With a 64-bit field, I think the probability of a collision is non-negligible (something like 1 in a million), so we should probably verify that the cosets are disjoint.
If there are any concerns with this method (or if it's just confusing), I think it would also be reasonable to use the brute force approach of explicitly computing the cosets and checking that they're disjoint. I coded that as well, and it took like 80ms, so not really a big deal since it's a one-time preprocessing cost.
Also fixes some overflow bugs in the inversion code.