When we had a large field, we could just pick random shifts, and get disjoint cosets with high probability. With a 64-bit field, I think the probability of a collision is non-negligible (something like 1 in a million), so we should probably verify that the cosets are disjoint.
If there are any concerns with this method (or if it's just confusing), I think it would also be reasonable to use the brute force approach of explicitly computing the cosets and checking that they're disjoint. I coded that as well, and it took like 80ms, so not really a big deal since it's a one-time preprocessing cost.
Also fixes some overflow bugs in the inversion code.