`keccak_rust` doesn't seem to have much usage, and it treats `x` as the major axis of its 5x5 input. This is not exactly wrong, since Keccak itself doesn't have a notion of axis order. However, there is a convention for mapping bits of the cube to a flat list of bits, which is
> The mapping between the bits of `s` and those of `a` is `s[w(5y + x) + z] = a[x][y][z]`.
Obeying this convention would be awkward with `keccak_rust` - the words in memory would need to be transposed.
Based on the approach @SyxtonPrime described.
In terms of columns, the changes are:
- Store inputs (`A`) as `u32` limbs, rather than individual bits.
- Remove `C_partial`. It was used to store an intermediate product in a 5-way xor, but we've since realized that we can do a 5-way xor directly.
- Add `C_prime`, an intermediate result used to help verify the relation between `A` and `A'`.