This should improve cache locality - since we generally access several values at a time in a given row, we want themt to be close together in memory.
There are a few steps that make more sense column-wise, though, such as generating the `COUNTER` column. I put those after the transpose.
This adds padding rows which satisfy the ordering checks. To ensure that they also satisfy the value consistency checks, I just copied the address and value from the last operation.
I think this method of padding feels more natural, though it is a bit more code since we need to calculate the max range check in a different way. But on the plus side, the constraints are a bit smaller and simpler.
Also added a few constraints that I think we need for soundness:
- Each `is_channel` flag is bool.
- Sum of `is_channel` flags is bool.
- Dummy operations must be reads (otherwise the prover could put writes in the memory table which aren't in the CPU table).
By no longer storing unsorted operations; they are effectively stored in the CPU table already.
I ran into some issues with sorting, since the existing sort method didn't include `is_channel` columns. Rather than update the existing method, I removed it and added a sort on the `MemoryOp`s, which I think seems cleaner.
* Memory naming tweaks
- Define the channel count and value limbs in a single place, so they're easy to adjust.
- Standardize on "channels" which I think is more explicit, since e.g. `num_mem_ops` used to mean either the channel count or total operation count in a trace.
* feedback
* tweaks
* fmt
It seems redundant in most contexts, e.g. `use plonky2::field::extension_field::Extendable;`. One could import `extension_field`, but it's not that common in Rust, and `field::extension` is now about as short.