Before it was storing leaf data and Merkle roots, but nothing in between, since it wasn't yet interacting with intermediate layers (but it will once we hook up the FRI code).
As discussed, it seems like the batch opening argument will be a significant cost, and we can reduce that cost by not including shifted openings (except for `Z`s which need them).
... by switching to Rescue Prime (which has a smaller security margin), and precomputing an addition chain for the exponent used in the cubic root calculation. Also adds a benchmark.
I think this is the recommended way to apply Fiat-Shamir, to avoid any possible attacks like taking someone else's proof and using it to prove a slightly different statement.
Does this make sense? I think other libraries tend to include the leaf's index (either as an integer, or a series of bits indicating left/right turns) as part of a "proof". In FRI, the leaf indices are chosen by the verifier, so I thought that approach might be sort of redundant. Let me know what you think though.
... and other minor refactoring.
`bench_recursion` will be the default bin run by `cargo run`; the otheres can be selected with the `--bin` flag.
We could probably delete some of the other binaries later. E.g. `field_search` might not be useful any more. `bench_fft` should maybe be converted to a benchmark (although there are some pros and cons, e.g. the bench framework has a minimum number of runs, and isn't helpful in testing multi-core performance).
When we had a large field, we could just pick random shifts, and get disjoint cosets with high probability. With a 64-bit field, I think the probability of a collision is non-negligible (something like 1 in a million), so we should probably verify that the cosets are disjoint.
If there are any concerns with this method (or if it's just confusing), I think it would also be reasonable to use the brute force approach of explicitly computing the cosets and checking that they're disjoint. I coded that as well, and it took like 80ms, so not really a big deal since it's a one-time preprocessing cost.
Also fixes some overflow bugs in the inversion code.