* Fix 8x bigger than necessary encoding size of miniscalars in scalar mul
* initial windowed GLV-SAC implementation
* Simplify table encoding to match k0 without flipping bits
* Sage: Lattice decomp script fixes from anonymous reviewer
* update recoding mini test and add recoding primitives
* Update the GLV recoding
* update comments on positive/negative recoding [skip ci]
* sprinkle some {.noInit.} where possible
* Implement double-width field multiplication for double-width towering
* Fp2 mul acceleration via double-width lazy reduction (pure Nim)
* Inline assembly for basic add and sub
* Use 2 registers instead of 12+ for ASM conditional copy
* Prepare assembly for extended multiprecision multiplication support
* Add assembly for mul
* initial implementation of assembly reduction
* stash current progress of assembly reduction
* Fix clobbering issue, only P256 comparison remain buggy
* Fix asm montgomery reduction for NIST P256 as well
* MULX/ADCX/ADOX multi-precision multiplication
* MULX/ADCX/ADOX reduction v1
* Add (deactivated) assembly for double-width substraction + rework benches
* Add bench to nimble and deactivate double-width for now. slower than classic
* Fix x86-32 running out of registers for mul
* Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina)
* 32-bit doesn't have enough registers for ASM mul
* Fix again Travis Clang 9 issues
* LLVM 9 is not whitelisted in travis
* deactivated assembler with travis clang
* syntax error
* another
* ...
* missing space, yeah ...
* Proof-of-Concept Assembly code generator
* Tag inline per procedure so we can easily track the tradeoff on tower fields
* Implement Assembly for modular addition (but very curious off-by-one)
* Fix off-by one for moduli with non msb set
* Stash (super fast) alternative but still off by carry
* Fix GCC optimizing ASM away
* Save 1 register to allow compiling for BLS12-381 (in the GMP test)
* The compiler cannot find enough registers if the ASM file is not compiled with -O3
* Add modsub
* Add field negation
* Implement no-carry Assembly optimized field multiplication
* Expose UseX86ASM to the EC benchmark
* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang
* Prepare for assembly fallback
* Implement fallback for CPU that don't support ADX and BMI2
* Add CPU runtime detection
* Update README closes#66
* Remove commented out code
* Add test case for #30 - Euler's criterion doesn't return 1 for a square
* Detect #42 in the test suite
* Detect #43 in the test suite
* comment in sqrt tests
* Add #67 to the anti-regression suite
* Add #61 to the anti-regression suite
* Add #62 to anti-regression suite
* Add #60 to the anti-regression suite
* Add #64 to the test suite
* Add #65 - case 1
* Add #65 case 2
* Add #65 case 3
* Add debug check to isSquare/Euler's Criterion/Legendre Symbol
* Make sure our primitives are correct
* For now deactivate montySquare CIOS fix#61#62
* Narrow down #42 and #43 to powinv on 32-bit
* Detect #42#43 at the fast squaring level
* More #42, #43 tests, Use multiplication instead of squaring as a temporary workaround, see https://github.com/mratsim/constantine/issues/68
* Prevent regression of #67 now that squaring is "fixed"
* Split elliptic curve tests to better use parallel testing
* Add support for printing points on G2
* Implement multiplication and division by optimal sextic non-residue (BLS12-381)
* Implement modular square root in 𝔽p2
* Support EC add and EC double on G2 (for BLS12-381)
* Support G2 divisive twists with non-unit sextic-non-residue like BN254 snarks
* Add EC G2 bench
* cleanup some unused warnings
* Reorg the tests for parallelization and to avoid instantiating huge files
* Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al
* precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585
* Add the φ-accelerated lookup table builder
* Add a dedicated bithacks file
* cosmetic import consistency
* Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions
* remove binary
* Add the GLV precomputations to the sage scripts
* You can't avoid it, bigint multiplication is needed at one point
* Add bigint multiplication discarding some low words
* Implement the lattice decomposition in sage
* Proper decomposition for BN254
* Prepare the code for a new scalar mul
* We compile, and now debugging hunt
* More helpers to debug GLV scalar Mul
* Fix conditional negation
* Endomorphism accelerated scalar mul working for BN254 curve
* Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point)
* fix nimble test script after bench rename
* Add sage script for BN254
* Implement (failing) scalar multiplication tests
* Add a first test against sagemath
* Finish the tests against SAGE for BN254
* Add significant test coverage of scalar multiplication with reference checks for BN254_Snarks and BLS12_381
* Mention that the inverse of 0 is 0 (TODO tests)
* Introduce "Higher-Kinded tower extensions"
* rename isCOmplexExtension -> fromComplexExtension
* update benchmarks with the new tower scheme
* Try to recover some speed on mul/squaring for an optimal tower (but this was not it)