24 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy
e5612f5705
Multi-Scalar-Multiplication / Linear combination (#220)
* unoptimized msm

* MSM: reorder loops

* add a signed windowed recoding technique

* improve wNAF table access

* use batchAffine

* revamp EC tests

* MSM signed digit support

* refactor MSM: recode signed ahead of time

* missing test vector

* refactor allocs and Alloca sideeffect

* add an endomorphism threshold

* Add Jacobian extended coordinates

* refactor recodings, prepare for parallelizable on-the-fly signed recoding

* recoding changes, introduce proper NAF for pairings

* more pairings refactoring, introduce miller accumulator for EVM

* some optim to the addchain miller loop

* start optimizing multi-pairing

* finish multi-miller loop refactoring

* minor tuning

* MSM: signed encoding suitable for parallelism (no precompute)

* cleanup signed window encoding

* add prefetching

* add metering

* properly init result to infinity

* comment on prefetching

* introduce vartime inversion for batch additions

* fix JacExt infinity conversion

* add batchAffine for MSM, though slower than JacExtended at the moment

* add a batch affine scheduler for MSM

* Add Multi-Scalar-Multiplication endomorphism acceleration

* some tuning

* signed integer fixes + 32-bit + tuning

* Some more tuning

* common msm bench + don't use affine for c < 9

* nit
2023-02-16 12:45:05 +01:00
Mamy Ratsimbazafy
7c5421ffdc
move staticFor to the inner repo, not helpers/ for unblocking nimble install (#216) 2023-02-07 13:11:44 +01:00
Mamy Ratsimbazafy
9770b3108c
Fp12 over fp6 (#201)
* introduce sumprod for direct fp6_mul

* change curves -> constants

* forgotten constants

* Full pairing using Fp2->Fp6->Fp12 towering
2022-08-14 09:48:10 +02:00
Mamy Ratsimbazafy
74a23244d2
bench isSquare 2022-08-07 19:50:28 +02:00
Mamy Ratsimbazafy
39a8a413de
Pasta curves (#191)
* Pasta curves field arithmetic

* implement elliptic curve arith for the Pasta curves
2022-04-27 00:58:48 +02:00
Mamy Ratsimbazafy
ffacf61e8a
Don't dump all in "backend" (#184)
* backend -> math

* towers -> extension fields

* move ISA and compiler specific code out of math/

* fix export
2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy
fe500a6a79
Productionize: move protocols top-level vs backend (#179)
* Productionize: move protocols top-level vs backend

* fix path

* import fix

* the last one

* benches as well
2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy
dc73c71801
Pairings optimizations (#178)
* bench for cyclotomic square, exp and rename cyclotomic exp + multipairings for BLS12-377

* refactor/unify lines and cyclotomic functions

* Add Karabina's compressed squaring

* Use compressed squarings in final exponentiation

* Weighted addchain for bn254_snarks

* Add new towering options and cost functions

* Rearrange bench summaries

* fix BW6-761
2022-02-20 20:15:20 +01:00
Mamy Ratsimbazafy
14af7e8724
Low-level refactoring (#175)
* Add specific fromMont conversion routine. Rename montyResidue to getMont

* missed test file

* Add x86_64 ASM for fromMont

* Add x86_64 MULX/ADCX/ADOX for fromMont

* rework Montgomery Multiplication with prefetch/latency hiding techniques

* Fix ADX autodetection, closes #174. Rollback faster mul_mont attempt, no improvement and debug pain.

* finalSub in fromMont & adx_bmi -> adx

* Some {.noInit.} to avoid Nim zeroMem (which should be optimized away but who knows)

* Uniformize name 'op+domain': mulmod - mulmont

* Fix asm codegen bug "0x0000555555565930 <+896>:   sbb    0x20(%r8),%r8" with Clang in final substraction

* Prepare for skipping final substraction

* Don't forget to copy the result when we skip the final substraction

* Seems like we need to stash the idea of skipping the final substraction for now, needs bounds analysis https://eprint.iacr.org/2017/1057.pdf

* fix condition for ASM 32-bit

* optim modular addition when sparebit is available
2022-02-14 00:16:55 +01:00
Mamy Ratsimbazafy
53c4db7ead
Fast modular inversion (#172)
* split modular inversion in its own file

* Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf

* Stash Pornin's bingcd -> issue with inner modular reduction

* Implement Bernstein-Yang inversion

* Avoid Nim checks on signed integers (32-bit runtime issue)

* cleanup: remove old inversion impls

* cleanup: static moduli, move div2

* small comments (skip ci)

* comment cleanup (skip ci)

* fix total iterations on 32-bit

* Add batch conversion to affine coordinates using simultaneous inversion trick

* fix conditional setZero and batchAffine conversion

* cleanup unneeded branches following affine conversion unification

* Fix batchAffine with zero inputs and add fuzz failure to test suite
2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy
bea798e27c
Field sqrt optimization (#168)
* add more Fp tests for Twisted Edwards curves

* add fused sqrt+division bench

* Significant fused sqrt+division improvement for any prime field over algorithm described in  "High-Speed High-Security Signature", Bernstein et al, p15 "Fast decompression", https://ed25519.cr.yp.to/ed25519-20110705.pdf

* Activate secp256k1 field benches + spring renaming of field multiplication

* addition chains for inversion and sqrt of Curve25519

* Make isSquare use addition chains

* add double-prec mul/square bench for <256-bit prime fields.
2022-01-01 16:19:35 +01:00
Mamy Ratsimbazafy
82819b1b10
Square Root & Inversion addition chains - 20% perf increase (#132)
* Addition chain for sqrt BLS12-381: 20% perf improvement

* sqrt addchain for BN254_Snarks - 20% perf improvement as well

* Fix operation count [skip ci]

* BLS12-377 sqrt - 10% perf improvement

* sqrt addition chain for BW6-761 - 6% speedup

* BN254_Nogami inversion addchain

* sqrt addchain for BN254_Nogami

* Inversion addchain for BLS12-377

* inversion ddition chain for BW6-761
2021-01-23 20:55:40 +01:00
Mamy Ratsimbazafy
986245b5c1
Jacobian coordinates (#95)
* Add projective-> affine bench

* Add conditional copy and div2 benches

* Fp4 benchmarks

* Constant-time Jacobian addition

* Jacobian doubling

* Use a simpler Add+Dbl complete formula

* Update tests

* Fix conditional negate

* Rollaback complete addition, we were only handling curve coef a == 0
2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy
92183c8b05
Remove unused curves 2020-09-27 13:13:45 +02:00
Mamy Ratsimbazafy
0e4dbfe400
BLS12-377 (#91)
* add Sage for constant time tonelli shanks

* Fused sqrt and invsqrt via Tonelli Shanks

* isolate sqrt in their own folder

* Implement constant-time Tonelli Shanks for any prime

* Implement Fp2 sqrt for any non-residue

* Add tests for BLS12_377

* Lattice decomposition script for BLS12_377 G1

* BLS12-377 G1 GLV ok, G2 GLV issue

* Proper endomorphism acceleration support for BLS12-377

* Add naive pairing support for BLS12-377

* Activate more bench for BLS12-377

* Fix MSB computation

* Optimize final exponentiation + add benches
2020-09-27 09:15:14 +02:00
Mamy Ratsimbazafy
28e83e7b49
Faster inversion with addition chains (#80) 2020-09-04 19:04:32 +02:00
Mamy Ratsimbazafy
d97bc9b61c
Assembly backend (#69)
* Proof-of-Concept Assembly code generator

* Tag inline per procedure so we can easily track the tradeoff on tower fields

* Implement Assembly for modular addition (but very curious off-by-one)

* Fix off-by one for moduli with non msb set

* Stash (super fast) alternative but still off by carry

* Fix GCC optimizing ASM away

* Save 1 register to allow compiling for BLS12-381 (in the GMP test)

* The compiler cannot find enough registers if the ASM file is not compiled with -O3

* Add modsub

* Add field negation

* Implement no-carry Assembly optimized field multiplication

* Expose UseX86ASM to the EC benchmark

* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang

* Prepare for assembly fallback

* Implement fallback for CPU that don't support ADX and BMI2

* Add CPU runtime detection

* Update README closes #66

* Remove commented out code
2020-07-24 22:02:30 +02:00
Mamy André-Ratsimbazafy
d22d981e9e
Implement fused sqrt invsqrt on Fp: Accelerate sqrt on Fp2 by 20% (hashToG2 and property-based testing bottleneck, 4 times slower than inversion and 87 times slower than Fp2 multiplication) 2020-06-17 22:44:52 +02:00
Mamy André-Ratsimbazafy
e0c1e0b1c8
Add EC bench on G1 + Add throughput to benches 2020-04-15 19:38:02 +02:00
Mamy Ratsimbazafy
c04721a04e
Refactor: Higher-Kinded Tower of Extension Fields (#25)
* Mention that the inverse of 0 is 0 (TODO tests)

* Introduce "Higher-Kinded tower extensions"

* rename isCOmplexExtension -> fromComplexExtension

* update benchmarks with the new tower scheme

* Try to recover some speed on mul/squaring for an optimal tower (but this was not it)
2020-04-14 02:05:42 +02:00
Mamy André-Ratsimbazafy
33314fe725
Properly distinguish between Nogami and Snark/Ethereum BN254 closes #19 2020-04-12 03:01:50 +02:00
Mamy André-Ratsimbazafy
8b7374f405
Cleanup in Montgomery Mul, Square, Pow 2020-03-22 13:24:37 +01:00
Mamy André-Ratsimbazafy
1855d14497
Add more curves for testing: Curve25519, BLS12-377, BN446, FKM-447, BLS12-461, BN462 2020-03-21 13:05:58 +01:00
Mamy André-Ratsimbazafy
9e78cd5d6d
Benchmark template for 𝔽p, 𝔽p2, 𝔽p6 2020-03-21 02:31:31 +01:00