Mamy Ratsimbazafy 83dcd988b3
FpDbl revisited (#144) - 7% perf improvement everywhere, up to 30% in double-width primitives
* reorg mul -> limbs_double_width, ConstantineASM CttASM

* Implement squaring specialized scalar path (22% faster than mul)

* Implement "portable" assembly for squaring

* stash part of the changes

* Reorg montgomery reduction - prepare to introduce Comba optimization

* Implement comba Montgomery reduce (but it's slower!)

* rename t -> a

* 30% performance improvement by avoiding toOpenArray!

* variable renaming

* Fix 32-bit imports

* slightly better assembly for sub2x

* There is an annoying bottleneck

* use out-of-place Fp assembly instead of in-place

* diffAlias is unneeded now

* cosmetic

* speedup fpDbl sub by 20%

* Fix Fp2 -> Fp6 -> Fp12 towering. It seems 5% faster

* Stash ADCX/ADOX squaring
2021-02-01 03:52:27 +01:00
..
2020-12-15 19:18:36 +01:00
2020-10-02 00:01:09 +02:00
2020-10-02 00:01:09 +02:00
2020-09-27 13:13:45 +02:00
2020-09-27 13:13:45 +02:00
2020-09-27 09:15:14 +02:00
2021-01-21 21:25:42 +01:00
2020-10-11 21:33:59 +02:00