* fix the new div2n1n_vartime on 32-bit - regression from #286
* remove unnecessary defensive programming
* reactivate 32-bit CI to check on #244
* 32-bit: centralize OS, ISA and env variable config
* enable assemble on x86 32-bit
* stash prep for Barret Reduction
* benches lost in rebase
* fix vartime reduction
* some improvement and fixes on reduce_vartime
* Fuse reductions when converting to Montgomery + use window=1 in powMont for small exponents. ~2.7x to 3.3x accel
* modexp: Introduce a no-reduction path for small base+exponent compared to modulus. Fix DOS
* optim for padded exponents
* remove commented out code [skip ci]
* Missing noInline for allocStackArray