* stash prep for Barret Reduction
* benches lost in rebase
* fix vartime reduction
* some improvement and fixes on reduce_vartime
* Fuse reductions when converting to Montgomery + use window=1 in powMont for small exponents. ~2.7x to 3.3x accel
* modexp: Introduce a no-reduction path for small base+exponent compared to modulus. Fix DOS
* optim for padded exponents
* remove commented out code [skip ci]
* Missing noInline for allocStackArray