* Implement double-width field multiplication for double-width towering
* Fp2 mul acceleration via double-width lazy reduction (pure Nim)
* Inline assembly for basic add and sub
* Use 2 registers instead of 12+ for ASM conditional copy
* Prepare assembly for extended multiprecision multiplication support
* Add assembly for mul
* initial implementation of assembly reduction
* stash current progress of assembly reduction
* Fix clobbering issue, only P256 comparison remain buggy
* Fix asm montgomery reduction for NIST P256 as well
* MULX/ADCX/ADOX multi-precision multiplication
* MULX/ADCX/ADOX reduction v1
* Add (deactivated) assembly for double-width substraction + rework benches
* Add bench to nimble and deactivate double-width for now. slower than classic
* Fix x86-32 running out of registers for mul
* Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina)
* 32-bit doesn't have enough registers for ASM mul
* Fix again Travis Clang 9 issues
* LLVM 9 is not whitelisted in travis
* deactivated assembler with travis clang
* syntax error
* another
* ...
* missing space, yeah ...