47 Commits

Author SHA1 Message Date
mratsim
1383aae105 Remove outdated TODOs [skip ci]
- noinline consts: https://github.com/nim-lang/RFCs/issues/257
2020-10-11 21:33:59 +02:00
Mamy Ratsimbazafy
6530596032
Endomorphism acceleration for BN254-Nogami (#102) 2020-10-10 18:53:48 +02:00
Mamy Ratsimbazafy
a2f46f77b7
Sage constants & tests codegen (#101)
* Implement a Sage codegenerator for frobenius constants

* Sage codegen for pairings

* Autogen of endomorphism acceleration constants

* The autogen fixed a copy-paste bug in lattice decomposition. We can use conditional negation now and save an add+dbl in scalar mul

* small fixes

* sage code for square root bls12-377 is not old

* readme updates

* Provide test suggestions for derive_frobenius

* indentation + add equation form to sage

* Sage test vector generator

* Use the json vectors
- includes type system workaround: generic sandwich https://github.com/nim-lang/Nim/issues/11225
- converting NimNode to typedesc: https://github.com/nim-lang/Nim/issues/6785

* Delete old sage code

* Install nim-serialization and nim-json-serialization in CI

* CI nimble install force yes
2020-10-10 16:19:23 +02:00
Mamy Ratsimbazafy
71bb4c799a
BW6-761 part 1 (#100)
* Add Fp, Fp2, Fp6 support for BW6-761

* Add G1 for BW6-761

* Prepare to support G2 twists on the same field as G1

* Remove a useless dependent type for lines

* Implement G2 for BW6-761

* Fix Line leftover
2020-10-09 07:51:47 +02:00
Mamy Ratsimbazafy
986245b5c1
Jacobian coordinates (#95)
* Add projective-> affine bench

* Add conditional copy and div2 benches

* Fp4 benchmarks

* Constant-time Jacobian addition

* Jacobian doubling

* Use a simpler Add+Dbl complete formula

* Update tests

* Fix conditional negate

* Rollaback complete addition, we were only handling curve coef a == 0
2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy
0effd66dbd
SWei -> SHortW, weierstrass -> shortweierstrass 2020-09-27 23:02:48 +02:00
Mamy André-Ratsimbazafy
92183c8b05
Remove unused curves 2020-09-27 13:13:45 +02:00
Mamy Ratsimbazafy
0e4dbfe400
BLS12-377 (#91)
* add Sage for constant time tonelli shanks

* Fused sqrt and invsqrt via Tonelli Shanks

* isolate sqrt in their own folder

* Implement constant-time Tonelli Shanks for any prime

* Implement Fp2 sqrt for any non-residue

* Add tests for BLS12_377

* Lattice decomposition script for BLS12_377 G1

* BLS12-377 G1 GLV ok, G2 GLV issue

* Proper endomorphism acceleration support for BLS12-377

* Add naive pairing support for BLS12-377

* Activate more bench for BLS12-377

* Fix MSB computation

* Optimize final exponentiation + add benches
2020-09-27 09:15:14 +02:00
Mamy Ratsimbazafy
6ecbedbd09
Mixed addition (#90)
* ptrettier comments

* Implement mixed addition on G1

* Test for mixed addition in G2 and use it for Miller Loop
2020-09-26 09:16:29 +02:00
Mamy Ratsimbazafy
03ecb31c57
Pairings for BN254-Nogami and BN254-Snarks (#86)
* Implement optimized final exponentiation for BN254-Nogami

* And BN254 Snarks support

* Optimize D-Twist sparse Fp12 x line multiplication

* Move quadruple/octuple and add to Github issues: https://github.com/mratsim/constantine/issues/88 [skip ci]
2020-09-25 21:58:20 +02:00
Mamy Ratsimbazafy
f78ed23dad
Pairing optim (#85)
* Fix fp12 Frobenius map

* Implement cyclotomic subgroup acceleration

* make cyclotomic squaring in-place

* Add back out-place cycl squaring and add cyclotomic inverse

* Implement state-of-the-art BLS12-381 final exponentiation

* save a cyclotomic squaring

* Accelerate sparse line multiplication in Miller loop

* Add pairing bench

* fix comments
2020-09-24 17:18:23 +02:00
Mamy Ratsimbazafy
d84edcd217
Naive pairings + Naive cofactor clearing (#82)
* Pairing - initial commit
- line functions
- sparse Fp12 functions

* Small fixes:
- Line parametrized by twist for generic algorithm
- Add a conjugate operator for quadratic extensions
- Have frobenius use it
- Create an Affine coordinate type for elliptic curve

* Implement (failing) pairing test

* Stash pairing debug session, temp switch Fp12 over Fp4

* Proper naive pairing on BLS12-381

* Frobenius map

* Implement naive pairing for BN curves

* Add pairing tests to CI + reduce time spent on lower-level tests

* Test without assembler in Github Actions + less base layers test iterations
2020-09-21 23:24:00 +02:00
Mamy Ratsimbazafy
28e83e7b49
Faster inversion with addition chains (#80) 2020-09-04 19:04:32 +02:00
Mamy Ratsimbazafy
85d365359d
Endomorphism G2 (#79)
* Clear cofactor in BN254 G2 testgen and frobenius

* Implement G2 endomorphism acceleration in Sage

* Somewhat working accelerated scalar mul G2 (2.2x) faster
- OK for BN254_Snarks
- Some test failing for BLS12-381

* Fix negative miniscalars by adding an extra bit of encoding

* Cleanup accel params

* Small recoding optimizations
2020-09-03 23:10:48 +02:00
Mamy Ratsimbazafy
6ac974d65e
Windowed GLV acceleration - 25% faster signing on G1 (#74)
* Fix 8x bigger than necessary encoding size of miniscalars in scalar mul

* initial windowed GLV-SAC implementation

* Simplify table encoding to match k0 without flipping bits
2020-08-25 00:02:30 +02:00
Mamy Ratsimbazafy
d41c653c8a
Double-width tower extension part 1 (#72)
* Implement double-width field multiplication for double-width towering

* Fp2 mul acceleration via double-width lazy reduction (pure Nim)

* Inline assembly for basic add and sub

* Use 2 registers instead of 12+ for ASM conditional copy

* Prepare assembly for extended multiprecision multiplication support

* Add assembly for mul

* initial implementation of assembly reduction

* stash current progress of assembly reduction

* Fix clobbering issue, only P256 comparison remain buggy

* Fix asm montgomery reduction for NIST P256 as well

* MULX/ADCX/ADOX multi-precision multiplication

* MULX/ADCX/ADOX reduction v1

* Add (deactivated) assembly for double-width substraction + rework benches

* Add bench to nimble and deactivate double-width for now. slower than classic

* Fix x86-32 running out of registers for mul

* Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina)

* 32-bit doesn't have enough registers for ASM mul

* Fix again Travis Clang 9 issues

* LLVM 9 is not whitelisted in travis

* deactivated assembler with travis clang

* syntax error

* another

* ...

* missing space, yeah ...
2020-08-20 10:21:39 +02:00
Mamy Ratsimbazafy
d97bc9b61c
Assembly backend (#69)
* Proof-of-Concept Assembly code generator

* Tag inline per procedure so we can easily track the tradeoff on tower fields

* Implement Assembly for modular addition (but very curious off-by-one)

* Fix off-by one for moduli with non msb set

* Stash (super fast) alternative but still off by carry

* Fix GCC optimizing ASM away

* Save 1 register to allow compiling for BLS12-381 (in the GMP test)

* The compiler cannot find enough registers if the ASM file is not compiled with -O3

* Add modsub

* Add field negation

* Implement no-carry Assembly optimized field multiplication

* Expose UseX86ASM to the EC benchmark

* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang

* Prepare for assembly fallback

* Implement fallback for CPU that don't support ADX and BMI2

* Add CPU runtime detection

* Update README closes #66

* Remove commented out code
2020-07-24 22:02:30 +02:00
Mamy Ratsimbazafy
a2a2495351
Github Action CI (without GMP) (#29)
* Github Action CI (without GMP)

* Deactivate MacOS, spurious failures: https://github.com/actions/virtual-environments/issues/841

* force install with nimble

* Add badge

* Don"t include Nim 1.2.x https://github.com/mratsim/constantine/pull/20#issuecomment-646327952

* Action branch mistake

* Add back OSX? https://github.com/actions/virtual-environments/issues/841, https://github.com/actions/virtual-environments/issues/969

* fix MacOS target

* comment out RDTSC on i386

* Add initialization canaries

* Add more verbose output to debug windows failures

* spurious windows i386 test

* For now only activate Linux and mac

* missed include
2020-06-19 22:08:15 +02:00
Mamy André-Ratsimbazafy
d22d981e9e
Implement fused sqrt invsqrt on Fp: Accelerate sqrt on Fp2 by 20% (hashToG2 and property-based testing bottleneck, 4 times slower than inversion and 87 times slower than Fp2 multiplication) 2020-06-17 22:44:52 +02:00
Mamy Ratsimbazafy
d376f08d1b
G2 / Operations on the twisted curve E'(Fp2) (#51)
* Split elliptic curve tests to better use parallel testing

* Add support for printing points on G2

* Implement multiplication and division by optimal sextic non-residue (BLS12-381)

* Implement modular square root in 𝔽p2

* Support EC add and EC double on G2 (for BLS12-381)

* Support G2 divisive twists with non-unit sextic-non-residue like BN254 snarks

* Add EC G2 bench

* cleanup some unused warnings

* Reorg the tests for parallelization and to avoid instantiating huge files
2020-06-15 22:58:56 +02:00
Mamy Ratsimbazafy
2613356281
Endomorphism acceleration for Scalar Multiplication (#44)
* Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al

* precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585

* Add the φ-accelerated lookup table builder

* Add a dedicated bithacks file

* cosmetic import consistency

* Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions

* remove binary

* Add the GLV precomputations to the sage scripts

* You can't avoid it, bigint multiplication is needed at one point

* Add bigint multiplication discarding some low words

* Implement the lattice decomposition in sage

* Proper decomposition for BN254

* Prepare the code for a new scalar mul

* We compile, and now debugging hunt

* More helpers to debug GLV scalar Mul

* Fix conditional negation

* Endomorphism accelerated scalar mul working for BN254 curve

* Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point)

* fix nimble test script after bench rename
2020-06-14 15:39:06 +02:00
Mamy Ratsimbazafy
3d1b1fab98
Fix benchmark on ARM (#31) 2020-06-04 22:09:30 +02:00
Mamy Ratsimbazafy
82ceca6e3b
Scalar mul tests (#28)
* Add sage script for BN254

* Implement (failing) scalar multiplication tests

* Add a first test against sagemath

* Finish the tests against SAGE for BN254

* Add significant test coverage of scalar multiplication with reference checks for BN254_Snarks and BLS12_381
2020-06-04 20:37:29 +02:00
Mamy André-Ratsimbazafy
44350d08af
Add elliptic doubling in projective coordinates 2020-04-15 22:23:46 +02:00
Mamy André-Ratsimbazafy
7ae0f51000
benchmarking skips cycle counting for ARM 2020-04-15 21:24:18 +02:00
Mamy André-Ratsimbazafy
e0c1e0b1c8
Add EC bench on G1 + Add throughput to benches 2020-04-15 19:38:02 +02:00
Mamy André-Ratsimbazafy
aff44f4d8e
Implement constant-time div2 on finite and extension fields 2020-04-15 02:12:45 +02:00
Mamy Ratsimbazafy
c04721a04e
Refactor: Higher-Kinded Tower of Extension Fields (#25)
* Mention that the inverse of 0 is 0 (TODO tests)

* Introduce "Higher-Kinded tower extensions"

* rename isCOmplexExtension -> fromComplexExtension

* update benchmarks with the new tower scheme

* Try to recover some speed on mul/squaring for an optimal tower (but this was not it)
2020-04-14 02:05:42 +02:00
Mamy André-Ratsimbazafy
33314fe725
Properly distinguish between Nogami and Snark/Ethereum BN254 closes #19 2020-04-12 03:01:50 +02:00
Mamy André-Ratsimbazafy
a6e4517be2
Implement 𝔽p12 inversion, enable 𝔽p12 tests and bench 2020-04-09 14:28:01 +02:00
Mamy André-Ratsimbazafy
8b7374f405
Cleanup in Montgomery Mul, Square, Pow 2020-03-22 13:24:37 +01:00
Mamy André-Ratsimbazafy
c40bc1977d
Inverse in cubic extension field 𝔽p6 = 𝔽p2[∛(1 + 𝑖)] 2020-03-21 23:47:43 +01:00
Mamy André-Ratsimbazafy
ff4a54daba
Add multiplication in 𝔽p6 = 𝔽p2[∛(1+𝑖)] 2020-03-21 19:03:57 +01:00
Mamy André-Ratsimbazafy
1855d14497
Add more curves for testing: Curve25519, BLS12-377, BN446, FKM-447, BLS12-461, BN462 2020-03-21 13:05:58 +01:00
Mamy André-Ratsimbazafy
9e78cd5d6d
Benchmark template for 𝔽p, 𝔽p2, 𝔽p6 2020-03-21 02:31:31 +01:00
Mamy André-Ratsimbazafy
bde619155b
30% faster constant-time inversion 2020-03-20 23:03:52 +01:00
Mamy Ratsimbazafy
4ff0e3d90b
Internals refactor + renewed focus on perf (#17)
* Lay out the refactoring objectives and tradeoffs

* Refactor the 32 and 64-bit primitives [skip ci]

* BigInts and Modular BigInts compile

* Make the bigints test compile

* Fix modular reduction

* Fix reduction tests vs GMP

* Implement montegomery mul, pow, inverse, WIP finite field compilation

* Make FiniteField compile

* Fix exponentiation compilation

* Fix Montgomery magic constant computation  for 2^64 words

* Fix typo in non-optimized CIOS - passing finite fields IO tests

* Add limbs comparisons [skip ci]

* Fix on precomputation of the Montgomery magic constant

* Passing all tests including 𝔽p2

* modular addition, the test for mersenne prime was wrong

* update benches

* Fix "nimble test" + typo on out-of-place field addition

* bigint division, normalization is needed: https://travis-ci.com/github/mratsim/constantine/jobs/298359743

* missing conversion in subborrow non-x86 fallback - https://travis-ci.com/github/mratsim/constantine/jobs/298359744

* Fix little-endian serialization

* Constantine32 flag to run 32-bit constantine on 64-bit machines

* IO Field test, ensure that BaseType is used instead of uint64 when the prime can field in uint32

* Implement proper addcarry and subborrow fallback for the compile-time VM

* Fix export issue when the logical wordbitwidth == physical wordbitwidth - passes all tests (32-bit and 64-bit)

* Fix uint128 on ARM

* Fix C++ conditional copy and ARM addcarry/subborrow

* Add investigation for SIGFPE in Travis

* Fix debug display for unsafeDiv2n1n

* multiplexer typo

* moveMem bug in glibc of Ubuntu 16.04?

* Was probably missing an early clobbered register annotation on conditional mov

* Note on Montgomery-friendly moduli

* Strongly suspect a GCC before GCC 7 codegen bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87139)

* hex conversion was (for debugging) not taking requested order into account + inlining comment

* Use 32-bit limbs on ARM64, uint128 builtin __udivti4 bug?

* Revert "Use 32-bit limbs on ARM64, uint128 builtin __udivti4 bug?"

This reverts commit 087f9aa7fb40bbd058d05cbd8eec7fc082911f49.

* Fix subborrow fallback for non-x86 (need to maks the borrow)
2020-03-16 16:33:51 +01:00
Mamy André-Ratsimbazafy
191bb7710c
Add a warmup to the Fp bench to deal with CPU scaling 2020-03-15 21:02:17 +01:00
Mamy André-Ratsimbazafy
b810422486
Add benchmark for Ethereum 1 and Ethereum 2 curves 2020-03-15 20:54:14 +01:00
Mamy André-Ratsimbazafy
dc0c1c181c
enable substraction benchmarks 2020-03-07 12:23:46 +01:00
Mamy André-Ratsimbazafy
472823b749
more comprehensive benchmark of Fp 2020-03-06 17:44:30 +01:00
Mamy André-Ratsimbazafy
1fdb1df80a
Add benchmark clock timers 2020-02-29 19:36:35 +01:00
Mamy André-Ratsimbazafy
ca817fcb69
Use Assembly cmov on x86 2020-02-29 18:27:20 +01:00
Mamy André-Ratsimbazafy
05bce529b4
1st experiment at accelerating montgomery multiplication (665 lines of specialized duplicated ASM code for some reason, monomorphization is probably better than that) 2020-02-28 22:46:20 +01:00
Mamy André-Ratsimbazafy
ddce056bb4
make bench compile 2020-02-25 03:07:42 +01:00
Mamy André-Ratsimbazafy
8cbbd40a0c
Add benchmark of constant-time vs unsafe powmod 2020-02-22 18:39:29 +01:00
Mamy André-Ratsimbazafy
10346d83a4
Benchmark: BigInt -> Montgomery conversion:
- shlAddMod (with assembly division) is already 4x slower than Montgomery Multiplication based.
- constant-time division will be even slower
- use montgomery-multiplication based conversion
2020-02-16 01:43:17 +01:00