* Bit-order in-place reversal optimizations
* optimization/simplification
* Done modulo documentation and testing on x86
* Minor type fixes on non-ARM
* Minor x86
* Transpose docs
* Docs
* Make rustfmt happy
* Bug fixes + tests
* Minor docs + lints
* Split into crates
I kept other changes to a minimum, so 95% of this is just moving things. One complication that came up is that since `PrimeField` is now outside the plonky2 crate, these two impls now conflict:
```
impl<F: PrimeField> From<HashOut<F>> for Vec<u8> { ... }
impl<F: PrimeField> From<HashOut<F>> for Vec<F> { ... }
```
with this note:
```
note: upstream crates may add a new impl of trait `plonky2_field::field_types::PrimeField` for type `u8` in future versions
```
I worked around this by adding a `GenericHashOut` trait with methods like `to_bytes()` instead of overloading `From`/`Into`. Personally I prefer the explicitness anyway.
* Move out permutation network stuff also
* Fix imports
* Fix import
* Also move out insertion
* Comment
* fmt
* PR feedback