* unoptimized msm
* MSM: reorder loops
* add a signed windowed recoding technique
* improve wNAF table access
* use batchAffine
* revamp EC tests
* MSM signed digit support
* refactor MSM: recode signed ahead of time
* missing test vector
* refactor allocs and Alloca sideeffect
* add an endomorphism threshold
* Add Jacobian extended coordinates
* refactor recodings, prepare for parallelizable on-the-fly signed recoding
* recoding changes, introduce proper NAF for pairings
* more pairings refactoring, introduce miller accumulator for EVM
* some optim to the addchain miller loop
* start optimizing multi-pairing
* finish multi-miller loop refactoring
* minor tuning
* MSM: signed encoding suitable for parallelism (no precompute)
* cleanup signed window encoding
* add prefetching
* add metering
* properly init result to infinity
* comment on prefetching
* introduce vartime inversion for batch additions
* fix JacExt infinity conversion
* add batchAffine for MSM, though slower than JacExtended at the moment
* add a batch affine scheduler for MSM
* Add Multi-Scalar-Multiplication endomorphism acceleration
* some tuning
* signed integer fixes + 32-bit + tuning
* Some more tuning
* common msm bench + don't use affine for c < 9
* nit
* add Sage for constant time tonelli shanks
* Fused sqrt and invsqrt via Tonelli Shanks
* isolate sqrt in their own folder
* Implement constant-time Tonelli Shanks for any prime
* Implement Fp2 sqrt for any non-residue
* Add tests for BLS12_377
* Lattice decomposition script for BLS12_377 G1
* BLS12-377 G1 GLV ok, G2 GLV issue
* Proper endomorphism acceleration support for BLS12-377
* Add naive pairing support for BLS12-377
* Activate more bench for BLS12-377
* Fix MSB computation
* Optimize final exponentiation + add benches
* Proof-of-Concept Assembly code generator
* Tag inline per procedure so we can easily track the tradeoff on tower fields
* Implement Assembly for modular addition (but very curious off-by-one)
* Fix off-by one for moduli with non msb set
* Stash (super fast) alternative but still off by carry
* Fix GCC optimizing ASM away
* Save 1 register to allow compiling for BLS12-381 (in the GMP test)
* The compiler cannot find enough registers if the ASM file is not compiled with -O3
* Add modsub
* Add field negation
* Implement no-carry Assembly optimized field multiplication
* Expose UseX86ASM to the EC benchmark
* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang
* Prepare for assembly fallback
* Implement fallback for CPU that don't support ADX and BMI2
* Add CPU runtime detection
* Update README closes#66
* Remove commented out code
* Mention that the inverse of 0 is 0 (TODO tests)
* Introduce "Higher-Kinded tower extensions"
* rename isCOmplexExtension -> fromComplexExtension
* update benchmarks with the new tower scheme
* Try to recover some speed on mul/squaring for an optimal tower (but this was not it)