* Implement double-width field multiplication for double-width towering
* Fp2 mul acceleration via double-width lazy reduction (pure Nim)
* Inline assembly for basic add and sub
* Use 2 registers instead of 12+ for ASM conditional copy
* Prepare assembly for extended multiprecision multiplication support
* Add assembly for mul
* initial implementation of assembly reduction
* stash current progress of assembly reduction
* Fix clobbering issue, only P256 comparison remain buggy
* Fix asm montgomery reduction for NIST P256 as well
* MULX/ADCX/ADOX multi-precision multiplication
* MULX/ADCX/ADOX reduction v1
* Add (deactivated) assembly for double-width substraction + rework benches
* Add bench to nimble and deactivate double-width for now. slower than classic
* Fix x86-32 running out of registers for mul
* Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina)
* 32-bit doesn't have enough registers for ASM mul
* Fix again Travis Clang 9 issues
* LLVM 9 is not whitelisted in travis
* deactivated assembler with travis clang
* syntax error
* another
* ...
* missing space, yeah ...
* Proof-of-Concept Assembly code generator
* Tag inline per procedure so we can easily track the tradeoff on tower fields
* Implement Assembly for modular addition (but very curious off-by-one)
* Fix off-by one for moduli with non msb set
* Stash (super fast) alternative but still off by carry
* Fix GCC optimizing ASM away
* Save 1 register to allow compiling for BLS12-381 (in the GMP test)
* The compiler cannot find enough registers if the ASM file is not compiled with -O3
* Add modsub
* Add field negation
* Implement no-carry Assembly optimized field multiplication
* Expose UseX86ASM to the EC benchmark
* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang
* Prepare for assembly fallback
* Implement fallback for CPU that don't support ADX and BMI2
* Add CPU runtime detection
* Update README closes#66
* Remove commented out code
* Split elliptic curve tests to better use parallel testing
* Add support for printing points on G2
* Implement multiplication and division by optimal sextic non-residue (BLS12-381)
* Implement modular square root in 𝔽p2
* Support EC add and EC double on G2 (for BLS12-381)
* Support G2 divisive twists with non-unit sextic-non-residue like BN254 snarks
* Add EC G2 bench
* cleanup some unused warnings
* Reorg the tests for parallelization and to avoid instantiating huge files
* Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al
* precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585
* Add the φ-accelerated lookup table builder
* Add a dedicated bithacks file
* cosmetic import consistency
* Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions
* remove binary
* Add the GLV precomputations to the sage scripts
* You can't avoid it, bigint multiplication is needed at one point
* Add bigint multiplication discarding some low words
* Implement the lattice decomposition in sage
* Proper decomposition for BN254
* Prepare the code for a new scalar mul
* We compile, and now debugging hunt
* More helpers to debug GLV scalar Mul
* Fix conditional negation
* Endomorphism accelerated scalar mul working for BN254 curve
* Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point)
* fix nimble test script after bench rename
* Add sage script for BN254
* Implement (failing) scalar multiplication tests
* Add a first test against sagemath
* Finish the tests against SAGE for BN254
* Add significant test coverage of scalar multiplication with reference checks for BN254_Snarks and BLS12_381