Mamy Ratsimbazafy
|
83dcd988b3
|
FpDbl revisited (#144) - 7% perf improvement everywhere, up to 30% in double-width primitives
* reorg mul -> limbs_double_width, ConstantineASM CttASM
* Implement squaring specialized scalar path (22% faster than mul)
* Implement "portable" assembly for squaring
* stash part of the changes
* Reorg montgomery reduction - prepare to introduce Comba optimization
* Implement comba Montgomery reduce (but it's slower!)
* rename t -> a
* 30% performance improvement by avoiding toOpenArray!
* variable renaming
* Fix 32-bit imports
* slightly better assembly for sub2x
* There is an annoying bottleneck
* use out-of-place Fp assembly instead of in-place
* diffAlias is unneeded now
* cosmetic
* speedup fpDbl sub by 20%
* Fix Fp2 -> Fp6 -> Fp12 towering. It seems 5% faster
* Stash ADCX/ADOX squaring
|
2021-02-01 03:52:27 +01:00 |