15 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy
5b1d280486
Fix 50% perf regression (2x with GCC) on binary GCD based inversion (#135)
* Fix 50% perf regresion Revert part of #95, fix #134

* Deactivate inversion via addition chain for BW6-761. 2x slower than Euclid
2021-01-23 21:44:22 +01:00
Mamy André-Ratsimbazafy
c89c78d2d9
Typo Borrow instead of Carry in return type 2020-12-13 18:57:23 +01:00
Mamy Ratsimbazafy
986245b5c1
Jacobian coordinates (#95)
* Add projective-> affine bench

* Add conditional copy and div2 benches

* Fp4 benchmarks

* Constant-time Jacobian addition

* Jacobian doubling

* Use a simpler Add+Dbl complete formula

* Update tests

* Fix conditional negate

* Rollaback complete addition, we were only handling curve coef a == 0
2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy
3f48a590e8
Move assembly to their own folder 2020-09-27 17:25:21 +02:00
Mamy Ratsimbazafy
0e4dbfe400
BLS12-377 (#91)
* add Sage for constant time tonelli shanks

* Fused sqrt and invsqrt via Tonelli Shanks

* isolate sqrt in their own folder

* Implement constant-time Tonelli Shanks for any prime

* Implement Fp2 sqrt for any non-residue

* Add tests for BLS12_377

* Lattice decomposition script for BLS12_377 G1

* BLS12-377 G1 GLV ok, G2 GLV issue

* Proper endomorphism acceleration support for BLS12-377

* Add naive pairing support for BLS12-377

* Activate more bench for BLS12-377

* Fix MSB computation

* Optimize final exponentiation + add benches
2020-09-27 09:15:14 +02:00
Mamy Ratsimbazafy
85d365359d
Endomorphism G2 (#79)
* Clear cofactor in BN254 G2 testgen and frobenius

* Implement G2 endomorphism acceleration in Sage

* Somewhat working accelerated scalar mul G2 (2.2x) faster
- OK for BN254_Snarks
- Some test failing for BLS12-381

* Fix negative miniscalars by adding an extra bit of encoding

* Cleanup accel params

* Small recoding optimizations
2020-09-03 23:10:48 +02:00
Mamy Ratsimbazafy
eee0f4f0fc
Lattice decomposition fixes (#71)
* Sage: Lattice decomp script fixes from anonymous reviewer

* update recoding mini test and add recoding primitives

* Update the GLV recoding

* update comments on positive/negative recoding [skip ci]

* sprinkle some {.noInit.} where possible
2020-08-22 19:45:44 +02:00
Mamy Ratsimbazafy
d41c653c8a
Double-width tower extension part 1 (#72)
* Implement double-width field multiplication for double-width towering

* Fp2 mul acceleration via double-width lazy reduction (pure Nim)

* Inline assembly for basic add and sub

* Use 2 registers instead of 12+ for ASM conditional copy

* Prepare assembly for extended multiprecision multiplication support

* Add assembly for mul

* initial implementation of assembly reduction

* stash current progress of assembly reduction

* Fix clobbering issue, only P256 comparison remain buggy

* Fix asm montgomery reduction for NIST P256 as well

* MULX/ADCX/ADOX multi-precision multiplication

* MULX/ADCX/ADOX reduction v1

* Add (deactivated) assembly for double-width substraction + rework benches

* Add bench to nimble and deactivate double-width for now. slower than classic

* Fix x86-32 running out of registers for mul

* Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina)

* 32-bit doesn't have enough registers for ASM mul

* Fix again Travis Clang 9 issues

* LLVM 9 is not whitelisted in travis

* deactivated assembler with travis clang

* syntax error

* another

* ...

* missing space, yeah ...
2020-08-20 10:21:39 +02:00
Mamy André-Ratsimbazafy
5e8b1870a6
Rename files 2020-07-24 23:08:00 +02:00
Mamy Ratsimbazafy
d97bc9b61c
Assembly backend (#69)
* Proof-of-Concept Assembly code generator

* Tag inline per procedure so we can easily track the tradeoff on tower fields

* Implement Assembly for modular addition (but very curious off-by-one)

* Fix off-by one for moduli with non msb set

* Stash (super fast) alternative but still off by carry

* Fix GCC optimizing ASM away

* Save 1 register to allow compiling for BLS12-381 (in the GMP test)

* The compiler cannot find enough registers if the ASM file is not compiled with -O3

* Add modsub

* Add field negation

* Implement no-carry Assembly optimized field multiplication

* Expose UseX86ASM to the EC benchmark

* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang

* Prepare for assembly fallback

* Implement fallback for CPU that don't support ADX and BMI2

* Add CPU runtime detection

* Update README closes #66

* Remove commented out code
2020-07-24 22:02:30 +02:00
Mamy Ratsimbazafy
2613356281
Endomorphism acceleration for Scalar Multiplication (#44)
* Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al

* precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585

* Add the φ-accelerated lookup table builder

* Add a dedicated bithacks file

* cosmetic import consistency

* Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions

* remove binary

* Add the GLV precomputations to the sage scripts

* You can't avoid it, bigint multiplication is needed at one point

* Add bigint multiplication discarding some low words

* Implement the lattice decomposition in sage

* Proper decomposition for BN254

* Prepare the code for a new scalar mul

* We compile, and now debugging hunt

* More helpers to debug GLV scalar Mul

* Fix conditional negation

* Endomorphism accelerated scalar mul working for BN254 curve

* Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point)

* fix nimble test script after bench rename
2020-06-14 15:39:06 +02:00
Mamy André-Ratsimbazafy
8a9cb9287c Highlight that bools and words are "Secret" in the codebase 2020-04-15 00:04:44 +02:00
Mamy André-Ratsimbazafy
c40bc1977d
Inverse in cubic extension field 𝔽p6 = 𝔽p2[∛(1 + 𝑖)] 2020-03-21 23:47:43 +01:00
Mamy André-Ratsimbazafy
bde619155b
30% faster constant-time inversion 2020-03-20 23:03:52 +01:00
Mamy Ratsimbazafy
4ff0e3d90b
Internals refactor + renewed focus on perf (#17)
* Lay out the refactoring objectives and tradeoffs

* Refactor the 32 and 64-bit primitives [skip ci]

* BigInts and Modular BigInts compile

* Make the bigints test compile

* Fix modular reduction

* Fix reduction tests vs GMP

* Implement montegomery mul, pow, inverse, WIP finite field compilation

* Make FiniteField compile

* Fix exponentiation compilation

* Fix Montgomery magic constant computation  for 2^64 words

* Fix typo in non-optimized CIOS - passing finite fields IO tests

* Add limbs comparisons [skip ci]

* Fix on precomputation of the Montgomery magic constant

* Passing all tests including 𝔽p2

* modular addition, the test for mersenne prime was wrong

* update benches

* Fix "nimble test" + typo on out-of-place field addition

* bigint division, normalization is needed: https://travis-ci.com/github/mratsim/constantine/jobs/298359743

* missing conversion in subborrow non-x86 fallback - https://travis-ci.com/github/mratsim/constantine/jobs/298359744

* Fix little-endian serialization

* Constantine32 flag to run 32-bit constantine on 64-bit machines

* IO Field test, ensure that BaseType is used instead of uint64 when the prime can field in uint32

* Implement proper addcarry and subborrow fallback for the compile-time VM

* Fix export issue when the logical wordbitwidth == physical wordbitwidth - passes all tests (32-bit and 64-bit)

* Fix uint128 on ARM

* Fix C++ conditional copy and ARM addcarry/subborrow

* Add investigation for SIGFPE in Travis

* Fix debug display for unsafeDiv2n1n

* multiplexer typo

* moveMem bug in glibc of Ubuntu 16.04?

* Was probably missing an early clobbered register annotation on conditional mov

* Note on Montgomery-friendly moduli

* Strongly suspect a GCC before GCC 7 codegen bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87139)

* hex conversion was (for debugging) not taking requested order into account + inlining comment

* Use 32-bit limbs on ARM64, uint128 builtin __udivti4 bug?

* Revert "Use 32-bit limbs on ARM64, uint128 builtin __udivti4 bug?"

This reverts commit 087f9aa7fb40bbd058d05cbd8eec7fc082911f49.

* Fix subborrow fallback for non-x86 (need to maks the borrow)
2020-03-16 16:33:51 +01:00