This should improve cache locality - since we generally access several values at a time in a given row, we want themt to be close together in memory.
There are a few steps that make more sense column-wise, though, such as generating the `COUNTER` column. I put those after the transpose.
For now it doesn't log filenames, but we can compare against the list of filenames in `combined_kernel`.
Current output:
```
[DEBUG plonky2_evm::cpu::kernel::assembler] Assembled file size: 0 bytes
[DEBUG plonky2_evm::cpu::kernel::assembler] Assembled file size: 49 bytes
[DEBUG plonky2_evm::cpu::kernel::assembler] Assembled file size: 387 bytes
[DEBUG plonky2_evm::cpu::kernel::assembler] Assembled file size: 27365 bytes
[DEBUG plonky2_evm::cpu::kernel::assembler] Assembled file size: 0 bytes
[DEBUG plonky2_evm::cpu::kernel::assembler] Assembled file size: 11 bytes
[DEBUG plonky2_evm::cpu::kernel::assembler] Assembled file size: 7 bytes
[DEBUG plonky2_evm::cpu::kernel::aggregator::tests] Total kernel size: 27819 bytes
```
This shows that most of our kernel code is from `curve_add.asm`, which makes sense since it invovles a couple uses of the large `inverse` macro. Thankfully that will be replaced at some point.
This adds padding rows which satisfy the ordering checks. To ensure that they also satisfy the value consistency checks, I just copied the address and value from the last operation.
I think this method of padding feels more natural, though it is a bit more code since we need to calculate the max range check in a different way. But on the plus side, the constraints are a bit smaller and simpler.
Also added a few constraints that I think we need for soundness:
- Each `is_channel` flag is bool.
- Sum of `is_channel` flags is bool.
- Dummy operations must be reads (otherwise the prover could put writes in the memory table which aren't in the CPU table).
By no longer storing unsorted operations; they are effectively stored in the CPU table already.
I ran into some issues with sorting, since the existing sort method didn't include `is_channel` columns. Rather than update the existing method, I removed it and added a sort on the `MemoryOp`s, which I think seems cleaner.