Commit Graph

393 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy 8b5d5089cb
forgot to commit function sig change (#177) 2022-02-14 17:12:30 +01:00
Mamy Ratsimbazafy 5db30ef68d
Low-level refactor part 2 (#176) 2022-02-14 14:38:22 +01:00
Mamy Ratsimbazafy 14af7e8724
Low-level refactoring (#175)
* Add specific fromMont conversion routine. Rename montyResidue to getMont

* missed test file

* Add x86_64 ASM for fromMont

* Add x86_64 MULX/ADCX/ADOX for fromMont

* rework Montgomery Multiplication with prefetch/latency hiding techniques

* Fix ADX autodetection, closes #174. Rollback faster mul_mont attempt, no improvement and debug pain.

* finalSub in fromMont & adx_bmi -> adx

* Some {.noInit.} to avoid Nim zeroMem (which should be optimized away but who knows)

* Uniformize name 'op+domain': mulmod - mulmont

* Fix asm codegen bug "0x0000555555565930 <+896>:   sbb    0x20(%r8),%r8" with Clang in final substraction

* Prepare for skipping final substraction

* Don't forget to copy the result when we skip the final substraction

* Seems like we need to stash the idea of skipping the final substraction for now, needs bounds analysis https://eprint.iacr.org/2017/1057.pdf

* fix condition for ASM 32-bit

* optim modular addition when sparebit is available
2022-02-14 00:16:55 +01:00
Mamy Ratsimbazafy 53c4db7ead
Fast modular inversion (#172)
* split modular inversion in its own file

* Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf

* Stash Pornin's bingcd -> issue with inner modular reduction

* Implement Bernstein-Yang inversion

* Avoid Nim checks on signed integers (32-bit runtime issue)

* cleanup: remove old inversion impls

* cleanup: static moduli, move div2

* small comments (skip ci)

* comment cleanup (skip ci)

* fix total iterations on 32-bit

* Add batch conversion to affine coordinates using simultaneous inversion trick

* fix conditional setZero and batchAffine conversion

* cleanup unneeded branches following affine conversion unification

* Fix batchAffine with zero inputs and add fuzz failure to test suite
2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy c02e6bdf84
Tag vartime the bithacks that are not constant-time 2022-02-06 18:36:02 +01:00
Mamy Ratsimbazafy 404a966601
^k to ᵏ (skip ci) 2022-02-06 15:38:26 +01:00
Mamy Ratsimbazafy 50717d8de6
Test GT-subgroup for BW6-761 (#171) 2022-01-08 17:30:26 +01:00
Mamy Ratsimbazafy f6c02fe075
Optimized subgroup checks and cofactor clearing (#169)
* Move cofactor clearing to dedicated per-curve subgroups file

* Add BLS12-381 fast subgroup checks

* Implement fast cofactor clearing for BN254_snarks

* Add fast subgroup check to BN254Snarks

* add BLS12_377 optimized cofactor and subgroup functions

* Add BN254_Nogami

* Add GT-subgroup tests

* Use the new subgroup checks for Eth1 EVM precompiles
2022-01-03 14:12:58 +01:00
Mamy Ratsimbazafy c42e2a0251
Rename NotOnTwist/OnTwist => subgroup G1 and G2 2022-01-01 19:17:04 +01:00
Mamy Ratsimbazafy 86a67013dd
glv filename -> endomorphisms 2022-01-01 17:49:26 +01:00
Mamy Ratsimbazafy bea798e27c
Field sqrt optimization (#168)
* add more Fp tests for Twisted Edwards curves

* add fused sqrt+division bench

* Significant fused sqrt+division improvement for any prime field over algorithm described in  "High-Speed High-Security Signature", Bernstein et al, p15 "Fast decompression", https://ed25519.cr.yp.to/ed25519-20110705.pdf

* Activate secp256k1 field benches + spring renaming of field multiplication

* addition chains for inversion and sqrt of Curve25519

* Make isSquare use addition chains

* add double-prec mul/square bench for <256-bit prime fields.
2022-01-01 16:19:35 +01:00
Mamy Ratsimbazafy 53f9708c2b
Initial support for Twisted Edwards curves (#167)
* Point decoding: optimized sqrt for p ≡ 5 (mod 8) (Curve25519)

* Implement fused sqrt(u/v) for twisted edwards point deserialization

* Introduce twisted edwards affine

* Allow declaration of curve field elements (and fight against recursive dependencies

* Twisted edwards group law + tests

* Add support for jubjub and bandersnatch #162

* test twisted edwards scalar mul
2021-12-29 01:54:17 +01:00
Mamy Ratsimbazafy 1195e5e980
Eth1 evm precompiles (#166)
* Prepare support for Eth1 EVM

* Implement EIP 196 (Ethereum BN254 add/mul)

* Implement ETH1 pairing precompile

* Accelerate isOnCurve for G2 with precomputation
2021-12-15 00:02:11 +01:00
Mamy Ratsimbazafy f5c0b6245d
Multipairing (#165)
* Productionize multipairings for BLS12-381

* typo

* arg order + benchmark

* Introduce mul_3way_sparse_sparse

* cleanup MultiMiller loop

* fix init sparse optimization in multimiller loop [skip ci]
2021-08-16 22:22:51 +02:00
Mamy Ratsimbazafy 979d183657
Tests for the eth2 BLS signature protocol (BLS12-381, pubkeys G1, signatures G2) using low-level primitives (#164) 2021-08-15 11:41:46 +02:00
Mamy Ratsimbazafy 0bc228126a
hash-to-curve BLS12-381 perf (#163)
* fp square noasm split from non-4 non-6 limbs fallback (40% speedup)

* optimized cofactor clearing for BLS12-381 G2

* Support jacobian isogenies and point_add on isogenies

* fuse addition and isogeny map

* {.noInit.} and sparseMul

* poly_eval_horner init

* dedicated invsqrt + cleanup square root file

* hash to field: reduce copy overhead and don't return arrays

* h2c isogeny jacobian reuse pow 3 precomputed value

* Fix sqrt bench
2021-08-14 21:01:50 +02:00
Mamy Ratsimbazafy 499f9605b2
Hash to curve - BLS12-381 (#110)
* Hash to Curve: impl expand_message_xmd

* Try to precompute part of hash to curve at compile-time

* sha256 bench - use the new hashes module

* [WIP] smoke test hash to field

* Implement hash_to_field with expected output

* unoptimized hash-to-curve G2 for BLS12-381

* Don't run sanitizer on hash to field as it uses GC-ed strings
2021-08-13 22:07:26 +02:00
Mamy André-Ratsimbazafy 5404437d18
CI: don't cancel master 2021-07-25 13:21:38 +02:00
Mamy André-Ratsimbazafy c2d716b056
update the benches in README 2021-03-06 09:20:56 +01:00
Mamy Ratsimbazafy afb33a5a77
Assembly for Fp2 (#161)
* Assembly for Fp2

* fix import
2021-02-20 15:21:23 +01:00
Mamy Ratsimbazafy aefd40f455
Square ADX (#160)
* Add MULX/ADOX/ADCX assembly for squaring 4 limbs

* Add squarings for 6 limbs

* Use the new square assembly where relevant

* Fix 32-bit register name and calling convention

* typo

* Disable MontRed ASM for 2 limbs or less
2021-02-20 13:18:49 +01:00
Mamy André-Ratsimbazafy 8a7c35af59
Cleanup: consolidate extensions and instantiation + reorg extension module 2021-02-15 22:00:15 +01:00
Mamy André-Ratsimbazafy 8918cabb56
Cleanup: introduce clobbered registers, remove explicit rax, rdx for multiplication (minus 30-50 lines for related assembly files) 2021-02-15 20:38:12 +01:00
Mamy André-Ratsimbazafy 18069e54d3
unrolled SHA256 (for 32B faster only if using ssse3) 2021-02-15 18:43:35 +01:00
Mamy André-Ratsimbazafy 976edb64bb
Move pairing_bw6_761 to staging area 2021-02-14 18:35:20 +01:00
Mamy André-Ratsimbazafy e9a1ef91fb
[Research] KZG polynomial commit and verify 2021-02-14 17:59:52 +01:00
Mamy André-Ratsimbazafy 2242650d38
move the multipairing file to research [skip ci] 2021-02-14 17:18:42 +01:00
Mamy André-Ratsimbazafy 799b6530f8
[research] Polynomial evaluation and verification [skip ci] 2021-02-14 17:14:33 +01:00
Mamy André-Ratsimbazafy 3e977488a9
add bench whole summary for curves 2021-02-14 14:24:48 +01:00
Mamy Ratsimbazafy 9ac9862401
Optimize Miller Loop and prepare Multi-pairing (#159)
* Pairing with affine: align API to BLST and Gurvy and common use-case.

* Implement multi-pairing / aggregate verif for BLS12-381 (+2% pairing perf)

* Generalize the optimized miller loop for single pairing

* Immplement the miller loop addchain for BLS12-377

* Miller addition chain for BN254-Nogami

* no Miller adchain for BN254-Snarks

* Update the line test with new tower https://github.com/mratsim/constantine/pull/153

* Somewhat sparse for Fp2 M-Twist

* Implement line by line multiplication for Fp12 D-Twist

* Somewhat sparse Mul for Fp12 D-Twist

* Finish the sparse and somewhat sparse multiplications
2021-02-14 13:06:57 +01:00
Mamy André-Ratsimbazafy 0e43c12095
Cleanup cyclotomic square, 2 less temporaries and support aliasing 2021-02-12 23:16:57 +01:00
Mamy Ratsimbazafy e7296a78a8
Double-precision cubic towering + pairing (#158)
* Double-precision cubic towering 5% perf+

* Lazy Cubic squaring, yet another 3% boost.

* Implement lazy reduced inverse (but inclusive perf boost)

* Double precision sparse multiplication for D-Twist ~ 2% for BN254 Nogami and Snarks curves

* Implement lazy sparse mul for M-twist

* Try to introduce more laziness but need bound proofs
2021-02-12 21:27:58 +01:00
Mamy André-Ratsimbazafy 0e02524225
What is this, printing constant-time values? Oh no you don't. 2021-02-11 20:27:31 +01:00
Mamy Ratsimbazafy 6a2b172bbc
CI revamp (#157)
* Azure try to fix certificates on windows

* Split ASM / no asm

* switch to stable in travis

* Split ASM tests in all CI

* workflow name

* typo

* PPC fails at download or when compiling Nim for unknown reason

* try to fix curl
2021-02-10 22:21:02 +01:00
Mamy Ratsimbazafy 5806cc4638
Double-Precision towering (#155)
* consistent naming for dbl-width

* Isolate double-width Fp2 mul

* Implement double-width complex multiplication

* Lay out Fp4 double-width mul

* Off by p in square Fp4 as well :/

* less copies and stack space in addition chains

* Address https://github.com/mratsim/constantine/issues/154 partly

* Fix #154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total)

* Fix typo

* better assembly scheduling for add/sub

* Double-width -> Double-precision

* Unred -> Unr

* double-precision modular addition

* Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits

* Complete the double-precision implementation

* Use double-precision path for Fp4 squaring and mul

* remove mixin annotations

* Lazy reduction in Fp4 prod

* Fix assembly for sum2xMod

* Assembly for double-precision negation

* reduce white spaces in pairing benchmarks

* ADX implies BMI2
2021-02-09 22:57:45 +01:00
Mamy Ratsimbazafy 491b4d4d21
Drop nim-json-serialization for testing (#156) 2021-02-09 22:10:16 +01:00
Mamy André-Ratsimbazafy c4a2dee42d
Fix to test Fp12 towering: Fp4 vs Fp6 2021-02-07 14:10:06 +01:00
Mamy Ratsimbazafy e23f990280
Tower drop concepts (#153)
* Fix affine instantiation

* drop concept from the codebase

* Remove alignment requirement, this cases problem in sequences on 32-bit for t_fp12_anti_regression

* slight sparse optim
2021-02-07 14:03:56 +01:00
Mamy André-Ratsimbazafy ffc77cd087
Fix cofactor in BW6-761 naive final exp (but still buggy - see #152) 2021-02-07 10:24:52 +01:00
Mamy Ratsimbazafy 258e7e516f
[WIP] Pairings for bw6 761 (#108)
* Prepare BW6-761 pairing constants

* Extract the basic miller loop from pairings

* template and method call syntax issue

* Layout pairing for BW6-761

* Fix rebasing woes

* Try to match the paper (still buggy)

* Stash BW6-761
2021-02-07 09:46:41 +01:00
Mamy Ratsimbazafy 54887b1777
[Research] KZG polynomial commitment - part 1 FFT (#151)
* FFT compiles, now on to debugging ... [skip CI]

* Fix FFT and add bench [skip ci]

* rename + add KZG resources

* rename fft_fr

* Implement FFT on elliptic curves =)

* FFT G1 bench
2021-02-06 22:11:17 +01:00
Mamy Ratsimbazafy 94419db783
Arg aliasing in elliptic curves (#150)
* Jacobian addition can handle aliasing fine

* handle aliasing and use less mem for Jacobian double

* Handle aliasing for Projective EC
2021-02-06 19:32:44 +01:00
Mamy André-Ratsimbazafy 5710a961a1
Rename ECP_ShortW_Proj -> ECP_ShortW_Prj 2021-02-06 16:29:53 +01:00
Mamy Ratsimbazafy c312210878
Rework towering (#148)
* naive removal of out-of-place mul by non residue

* Use {.inline.} in a consistent manner across the codebase

* Handle aliasing for quadratic multiplication

* reorg optimization

* Handle aliasing for quadratic squaring

* handle aliasing in mul_sparse_complex_by_0y

* Rework multiplication by nonresidue, assume tower and twist use same non-residue

* continue rework

* continue on non-residues

* Remove "NonResidue *" calls

* handle aliasing in Chung-Hasan SQR2

* Handla aliasing in Chung-Hasan SQR3

* Use one less temporary in Chung Hasan sqr2

* handle aliasing in cubic extensions

* merge extension tower in the same file to reduce duplicate proc and allow better inlining

* handle aliasing in cubic inversion

* drop out-of-place proc from BigInt and finite fields as well

* less copies in line_projective

* remove a copy in fp12 by lines
2021-02-06 16:28:38 +01:00
Mamy André-Ratsimbazafy 2c5e12d5f8
Workaround aliasing in Fp12[BLS12-377] inversion, fix #147 2021-02-02 12:53:36 +01:00
Mamy Ratsimbazafy 83dcd988b3
FpDbl revisited (#144) - 7% perf improvement everywhere, up to 30% in double-width primitives
* reorg mul -> limbs_double_width, ConstantineASM CttASM

* Implement squaring specialized scalar path (22% faster than mul)

* Implement "portable" assembly for squaring

* stash part of the changes

* Reorg montgomery reduction - prepare to introduce Comba optimization

* Implement comba Montgomery reduce (but it's slower!)

* rename t -> a

* 30% performance improvement by avoiding toOpenArray!

* variable renaming

* Fix 32-bit imports

* slightly better assembly for sub2x

* There is an annoying bottleneck

* use out-of-place Fp assembly instead of in-place

* diffAlias is unneeded now

* cosmetic

* speedup fpDbl sub by 20%

* Fix Fp2 -> Fp6 -> Fp12 towering. It seems 5% faster

* Stash ADCX/ADOX squaring
2021-02-01 03:52:27 +01:00
Mamy Ratsimbazafy d12d5faf21
Implement Jacobian mixed addition (#142) 2021-01-30 14:21:55 +01:00
Mamy Ratsimbazafy b91ec1cb15
Metering (#140)
* Add metering facilities

* Metering reporting

* Add example report on metering BLS12-381 pairings
2021-01-29 22:21:19 +01:00
Mamy Ratsimbazafy 95e23339b2
Decimal conversion (#139)
* Add constant-time fromDecimal conversion. Add warnings on intended purposes of hex/decimals

* introduce setuint + cosmetic fixes Wordbitsize -> Wordbitwidth in comments

* Add decimal conversion (non-constant-time)

* fix comments [skip ci]
2021-01-29 20:42:36 +01:00
Mamy André-Ratsimbazafy 47daefde1f
forgot an import 2021-01-24 13:55:18 +01:00