PoseidonGate's recursive evaluations were using a lot of gates, and the MDS layer was the main culprit.
The other issue is that `constant_layer_recursive` creates a bunch of `ArithmeticGate`s with unique constants. We could either change `ArithmeticGate` to support different constants per operation, or wire in constants from `ConstantGate`, and change `ConstantGate` to support several constants per gate.
This won't really help anything near term since we're still between 2^12 and 2^13, but could have some benefits later, depending on what recursion arities and security settings we end up using.
`PoseidonMdsGate` needs `2 * D * WIDTH = 48` routed wires, and the combination of adding a gate and increasing routed wires slows down the prover a bit. So for now, I kept it at 28 wires, and the old code path is still used.
* Split up `PartitionWitness` data
This addresses two minor inefficiencies:
- Some preprocessed forest data was being cloned during proving.
- Some of the `ForestNode` data (like node sizes) is only needed in preprocessing, not proving. It was taking up cache space during proving because it was interleaved with data that is used during proving (parents, values).
Now `Forest` contains the disjoint-set forest. `PartitionWitness` is now mainly a Vec of target values; it also holds a reference to the (preprocessed) representative map.
On my laptop, this speeds up witness generation ~12%, resulting in an overall ~0.5% speedup.
* Feedback
* No size data (#278)
* No size data
* feedback