* Specialize `InterpolationGate`
To cosets of subgroups of roots of unity. This way
- `InterpolationGate` needs fewer routed wires, bringing our minimum routed wires down from 28 to 25.
- The recursive `compute_evaluation` avoids some multiplications, saving 100~200 gates depending on `num_routed_wires`.
* Update test
* feedback
* More wires for ConstantGate
* fix
* fix
* Optimize recursive Poseidon constraint evaluation
- Avoid `ArithmeticGate`s with unique constants; use `ConstantGate` wires instead
- Avoid an unnecessary squaring in exponentiations
Brings Poseidon evaluation down to a reasonable 273 gates when `num_routed_wires = 48`.
PoseidonGate's recursive evaluations were using a lot of gates, and the MDS layer was the main culprit.
The other issue is that `constant_layer_recursive` creates a bunch of `ArithmeticGate`s with unique constants. We could either change `ArithmeticGate` to support different constants per operation, or wire in constants from `ConstantGate`, and change `ConstantGate` to support several constants per gate.
This won't really help anything near term since we're still between 2^12 and 2^13, but could have some benefits later, depending on what recursion arities and security settings we end up using.
`PoseidonMdsGate` needs `2 * D * WIDTH = 48` routed wires, and the combination of adding a gate and increasing routed wires slows down the prover a bit. So for now, I kept it at 28 wires, and the old code path is still used.