* add more Fp tests for Twisted Edwards curves
* add fused sqrt+division bench
* Significant fused sqrt+division improvement for any prime field over algorithm described in "High-Speed High-Security Signature", Bernstein et al, p15 "Fast decompression", https://ed25519.cr.yp.to/ed25519-20110705.pdf
* Activate secp256k1 field benches + spring renaming of field multiplication
* addition chains for inversion and sqrt of Curve25519
* Make isSquare use addition chains
* add double-prec mul/square bench for <256-bit prime fields.
* consistent naming for dbl-width
* Isolate double-width Fp2 mul
* Implement double-width complex multiplication
* Lay out Fp4 double-width mul
* Off by p in square Fp4 as well :/
* less copies and stack space in addition chains
* Address https://github.com/mratsim/constantine/issues/154 partly
* Fix#154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total)
* Fix typo
* better assembly scheduling for add/sub
* Double-width -> Double-precision
* Unred -> Unr
* double-precision modular addition
* Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits
* Complete the double-precision implementation
* Use double-precision path for Fp4 squaring and mul
* remove mixin annotations
* Lazy reduction in Fp4 prod
* Fix assembly for sum2xMod
* Assembly for double-precision negation
* reduce white spaces in pairing benchmarks
* ADX implies BMI2