Closes#10. This combines Lagrange interpolation with FFTs as mentioned there.
I was previously thinking that all our polynomial encodings might as well just use power-of-two length vectors, so they'll be "FFT-ready", with no need to trim/pad. This sort of breaks that assumption though, as e.g. I think we'll want to compute interpolants with three coefficients in the batch opening argument.
I think we can still skip trimming/padding in most cases, since it the majority of our polynomials will have power-of-two-minus-1 degrees with high probability. But we'll now have one or two uses where that's not the case.
... by switching to Rescue Prime (which has a smaller security margin), and precomputing an addition chain for the exponent used in the cubic root calculation. Also adds a benchmark.
When we had a large field, we could just pick random shifts, and get disjoint cosets with high probability. With a 64-bit field, I think the probability of a collision is non-negligible (something like 1 in a million), so we should probably verify that the cosets are disjoint.
If there are any concerns with this method (or if it's just confusing), I think it would also be reasonable to use the brute force approach of explicitly computing the cosets and checking that they're disjoint. I coded that as well, and it took like 80ms, so not really a big deal since it's a one-time preprocessing cost.
Also fixes some overflow bugs in the inversion code.