If we did it all with `ArithmeticGate`s, the main loop (with ~101 iterations of cubing and a couple adds) would be fairly expensive, so this uses a (much smaller) custom gate called `GMiMCEvalGate` which does all the computations for one iteration of that loop.
`absorb_buffered_inputs` is called even if the input buffer is empty. In that case it should no-op, but it was instead replenishing the output buffer because of this line:
self.output_buffer = self.sponge_state[0..SPONGE_RATE].to_vec();
Easiest fix is to skip that code if the input buffer is empty.
This is mostly copy/pasted from plonky1, although there are some differences. E.g. in plonky2 virtual targets are not routable, so they're no longer added as partitions.
Closes#10. This combines Lagrange interpolation with FFTs as mentioned there.
I was previously thinking that all our polynomial encodings might as well just use power-of-two length vectors, so they'll be "FFT-ready", with no need to trim/pad. This sort of breaks that assumption though, as e.g. I think we'll want to compute interpolants with three coefficients in the batch opening argument.
I think we can still skip trimming/padding in most cases, since it the majority of our polynomials will have power-of-two-minus-1 degrees with high probability. But we'll now have one or two uses where that's not the case.
Before it was storing leaf data and Merkle roots, but nothing in between, since it wasn't yet interacting with intermediate layers (but it will once we hook up the FRI code).
As discussed, it seems like the batch opening argument will be a significant cost, and we can reduce that cost by not including shifted openings (except for `Z`s which need them).
... by switching to Rescue Prime (which has a smaller security margin), and precomputing an addition chain for the exponent used in the cubic root calculation. Also adds a benchmark.