Commit Graph

33 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy b9c911ba37
Accelerate FFT - endomorphism + wNAF vartime scalar mul (#258)
* accel FFT by 30+% with vartime endomorphism support

* silly error fix

* endomorphism + wNAF, closes #253, FFT 20% speedup

* vartime EC addition for all repr

* implement vartime EC add

* finishing touches, renam to fft_vartime
2023-09-04 10:19:14 +02:00
Mamy Ratsimbazafy 0eba593951
Pasta / Halo2 MSM bench (#243)
* Pasta bench

* cleanup env variables

* [MSM]: generate benchmark coef-points pairs in parallel

* try to fix windows Ci

* add diagnostic info

* fix old test for new codecs/io primitives

* Ensure the projective point at infinity is not all zeros, but (0, 1, 0)
2023-06-04 17:41:54 +02:00
Mamy Ratsimbazafy 1c5341fd7e
Perf quick wins - 10% Fp12 mul (#235)
* improve FP12_mul  perf by 10%

* update README [skip ci]
2023-04-28 11:31:17 +02:00
Mamy Ratsimbazafy 6c48975aee
Parallel Multi-Scalar-Multiplication (#226)
* try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores

* initial impl of parallel MSM - scaling to debug, threads not woken fast enough

* improve comment [skip ci]

* skip top window when c divides the number of bits

* for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue?

* Add counters and timers to audit threadpool bottlenecks

* metrics and profiling fixes, (slower) latency hiding, activate tests

* fix thief thread trying to wake another before canceling its own sleep

* easier to sort metrics and parallel endomorphism application

* selective endomorphism acceleration

* some tuning

* spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs

* improve MSM overview [skip ci]

* bench cleanup
2023-04-10 23:30:14 +02:00
Mamy Ratsimbazafy e5612f5705
Multi-Scalar-Multiplication / Linear combination (#220)
* unoptimized msm

* MSM: reorder loops

* add a signed windowed recoding technique

* improve wNAF table access

* use batchAffine

* revamp EC tests

* MSM signed digit support

* refactor MSM: recode signed ahead of time

* missing test vector

* refactor allocs and Alloca sideeffect

* add an endomorphism threshold

* Add Jacobian extended coordinates

* refactor recodings, prepare for parallelizable on-the-fly signed recoding

* recoding changes, introduce proper NAF for pairings

* more pairings refactoring, introduce miller accumulator for EVM

* some optim to the addchain miller loop

* start optimizing multi-pairing

* finish multi-miller loop refactoring

* minor tuning

* MSM: signed encoding suitable for parallelism (no precompute)

* cleanup signed window encoding

* add prefetching

* add metering

* properly init result to infinity

* comment on prefetching

* introduce vartime inversion for batch additions

* fix JacExt infinity conversion

* add batchAffine for MSM, though slower than JacExtended at the moment

* add a batch affine scheduler for MSM

* Add Multi-Scalar-Multiplication endomorphism acceleration

* some tuning

* signed integer fixes + 32-bit + tuning

* Some more tuning

* common msm bench + don't use affine for c < 9

* nit
2023-02-16 12:45:05 +01:00
Mamy Ratsimbazafy 082cd1deb9
MSB-to-LSB minimum Hamming Weight Recoding (#219)
* signed recoding

* use recoding
2023-02-07 16:27:53 +01:00
Mamy Ratsimbazafy 495ef4497b
Parallel batchadd (#215)
* [Threadpool] Fix syncAll releasing while a thread was attempting to steal + force no exception in tasks

* fix unguarded access on MacOS barriers

* parallel batchadd

* moved import
2023-01-29 01:06:37 +01:00
Mamy Ratsimbazafy 928f515582
Batch additions (#207)
* Batch elliptic curve addition

* accelerate chained muls

* jac mixed add handle doubling. jac additions handle aliasing when adding infinity

* properly skip sanitizer on BLS signature test

* properly skip sanitizer² on BLS signature test
2022-10-29 22:43:40 +02:00
Mamy Ratsimbazafy 99c9730793
Self-contained bindings generation (#196)
* First draft at bindings generation

* finite field bindings PoC

* support openarray, export NimMain

* PoC extension fields and elliptic curve bindings

* Pasta

* expose more bindings, remove nimZeroMem, remove tracer when unused, codegen name_mangling`gensym issue

* workaround bad C gensym codegen with {.inline.} pragma in non-dirty template nested in generic proc instantiated by template
2022-08-06 19:05:54 +02:00
Mamy Ratsimbazafy ffacf61e8a
Don't dump all in "backend" (#184)
* backend -> math

* towers -> extension fields

* move ISA and compiler specific code out of math/

* fix export
2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy fe500a6a79
Productionize: move protocols top-level vs backend (#179)
* Productionize: move protocols top-level vs backend

* fix path

* import fix

* the last one

* benches as well
2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy 53c4db7ead
Fast modular inversion (#172)
* split modular inversion in its own file

* Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf

* Stash Pornin's bingcd -> issue with inner modular reduction

* Implement Bernstein-Yang inversion

* Avoid Nim checks on signed integers (32-bit runtime issue)

* cleanup: remove old inversion impls

* cleanup: static moduli, move div2

* small comments (skip ci)

* comment cleanup (skip ci)

* fix total iterations on 32-bit

* Add batch conversion to affine coordinates using simultaneous inversion trick

* fix conditional setZero and batchAffine conversion

* cleanup unneeded branches following affine conversion unification

* Fix batchAffine with zero inputs and add fuzz failure to test suite
2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy c42e2a0251
Rename NotOnTwist/OnTwist => subgroup G1 and G2 2022-01-01 19:17:04 +01:00
Mamy André-Ratsimbazafy 5710a961a1
Rename ECP_ShortW_Proj -> ECP_ShortW_Prj 2021-02-06 16:29:53 +01:00
Mamy Ratsimbazafy d12d5faf21
Implement Jacobian mixed addition (#142) 2021-01-30 14:21:55 +01:00
Mamy André-Ratsimbazafy e89429e822
SHA256 Hash function 2020-12-15 19:18:36 +01:00
Mamy Ratsimbazafy 6530596032
Endomorphism acceleration for BN254-Nogami (#102) 2020-10-10 18:53:48 +02:00
Mamy Ratsimbazafy a2f46f77b7
Sage constants & tests codegen (#101)
* Implement a Sage codegenerator for frobenius constants

* Sage codegen for pairings

* Autogen of endomorphism acceleration constants

* The autogen fixed a copy-paste bug in lattice decomposition. We can use conditional negation now and save an add+dbl in scalar mul

* small fixes

* sage code for square root bls12-377 is not old

* readme updates

* Provide test suggestions for derive_frobenius

* indentation + add equation form to sage

* Sage test vector generator

* Use the json vectors
- includes type system workaround: generic sandwich https://github.com/nim-lang/Nim/issues/11225
- converting NimNode to typedesc: https://github.com/nim-lang/Nim/issues/6785

* Delete old sage code

* Install nim-serialization and nim-json-serialization in CI

* CI nimble install force yes
2020-10-10 16:19:23 +02:00
Mamy Ratsimbazafy 71bb4c799a
BW6-761 part 1 (#100)
* Add Fp, Fp2, Fp6 support for BW6-761

* Add G1 for BW6-761

* Prepare to support G2 twists on the same field as G1

* Remove a useless dependent type for lines

* Implement G2 for BW6-761

* Fix Line leftover
2020-10-09 07:51:47 +02:00
Mamy Ratsimbazafy 986245b5c1
Jacobian coordinates (#95)
* Add projective-> affine bench

* Add conditional copy and div2 benches

* Fp4 benchmarks

* Constant-time Jacobian addition

* Jacobian doubling

* Use a simpler Add+Dbl complete formula

* Update tests

* Fix conditional negate

* Rollaback complete addition, we were only handling curve coef a == 0
2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy 0effd66dbd
SWei -> SHortW, weierstrass -> shortweierstrass 2020-09-27 23:02:48 +02:00
Mamy Ratsimbazafy 6ecbedbd09
Mixed addition (#90)
* ptrettier comments

* Implement mixed addition on G1

* Test for mixed addition in G2 and use it for Miller Loop
2020-09-26 09:16:29 +02:00
Mamy Ratsimbazafy 85d365359d
Endomorphism G2 (#79)
* Clear cofactor in BN254 G2 testgen and frobenius

* Implement G2 endomorphism acceleration in Sage

* Somewhat working accelerated scalar mul G2 (2.2x) faster
- OK for BN254_Snarks
- Some test failing for BLS12-381

* Fix negative miniscalars by adding an extra bit of encoding

* Cleanup accel params

* Small recoding optimizations
2020-09-03 23:10:48 +02:00
Mamy Ratsimbazafy 6ac974d65e
Windowed GLV acceleration - 25% faster signing on G1 (#74)
* Fix 8x bigger than necessary encoding size of miniscalars in scalar mul

* initial windowed GLV-SAC implementation

* Simplify table encoding to match k0 without flipping bits
2020-08-25 00:02:30 +02:00
Mamy Ratsimbazafy d41c653c8a
Double-width tower extension part 1 (#72)
* Implement double-width field multiplication for double-width towering

* Fp2 mul acceleration via double-width lazy reduction (pure Nim)

* Inline assembly for basic add and sub

* Use 2 registers instead of 12+ for ASM conditional copy

* Prepare assembly for extended multiprecision multiplication support

* Add assembly for mul

* initial implementation of assembly reduction

* stash current progress of assembly reduction

* Fix clobbering issue, only P256 comparison remain buggy

* Fix asm montgomery reduction for NIST P256 as well

* MULX/ADCX/ADOX multi-precision multiplication

* MULX/ADCX/ADOX reduction v1

* Add (deactivated) assembly for double-width substraction + rework benches

* Add bench to nimble and deactivate double-width for now. slower than classic

* Fix x86-32 running out of registers for mul

* Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina)

* 32-bit doesn't have enough registers for ASM mul

* Fix again Travis Clang 9 issues

* LLVM 9 is not whitelisted in travis

* deactivated assembler with travis clang

* syntax error

* another

* ...

* missing space, yeah ...
2020-08-20 10:21:39 +02:00
Mamy Ratsimbazafy d97bc9b61c
Assembly backend (#69)
* Proof-of-Concept Assembly code generator

* Tag inline per procedure so we can easily track the tradeoff on tower fields

* Implement Assembly for modular addition (but very curious off-by-one)

* Fix off-by one for moduli with non msb set

* Stash (super fast) alternative but still off by carry

* Fix GCC optimizing ASM away

* Save 1 register to allow compiling for BLS12-381 (in the GMP test)

* The compiler cannot find enough registers if the ASM file is not compiled with -O3

* Add modsub

* Add field negation

* Implement no-carry Assembly optimized field multiplication

* Expose UseX86ASM to the EC benchmark

* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang

* Prepare for assembly fallback

* Implement fallback for CPU that don't support ADX and BMI2

* Add CPU runtime detection

* Update README closes #66

* Remove commented out code
2020-07-24 22:02:30 +02:00
Mamy Ratsimbazafy d376f08d1b
G2 / Operations on the twisted curve E'(Fp2) (#51)
* Split elliptic curve tests to better use parallel testing

* Add support for printing points on G2

* Implement multiplication and division by optimal sextic non-residue (BLS12-381)

* Implement modular square root in 𝔽p2

* Support EC add and EC double on G2 (for BLS12-381)

* Support G2 divisive twists with non-unit sextic-non-residue like BN254 snarks

* Add EC G2 bench

* cleanup some unused warnings

* Reorg the tests for parallelization and to avoid instantiating huge files
2020-06-15 22:58:56 +02:00
Mamy Ratsimbazafy 2613356281
Endomorphism acceleration for Scalar Multiplication (#44)
* Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al

* precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585

* Add the φ-accelerated lookup table builder

* Add a dedicated bithacks file

* cosmetic import consistency

* Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions

* remove binary

* Add the GLV precomputations to the sage scripts

* You can't avoid it, bigint multiplication is needed at one point

* Add bigint multiplication discarding some low words

* Implement the lattice decomposition in sage

* Proper decomposition for BN254

* Prepare the code for a new scalar mul

* We compile, and now debugging hunt

* More helpers to debug GLV scalar Mul

* Fix conditional negation

* Endomorphism accelerated scalar mul working for BN254 curve

* Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point)

* fix nimble test script after bench rename
2020-06-14 15:39:06 +02:00
Mamy Ratsimbazafy 3d1b1fab98
Fix benchmark on ARM (#31) 2020-06-04 22:09:30 +02:00
Mamy Ratsimbazafy 82ceca6e3b
Scalar mul tests (#28)
* Add sage script for BN254

* Implement (failing) scalar multiplication tests

* Add a first test against sagemath

* Finish the tests against SAGE for BN254

* Add significant test coverage of scalar multiplication with reference checks for BN254_Snarks and BLS12_381
2020-06-04 20:37:29 +02:00
Mamy André-Ratsimbazafy 44350d08af
Add elliptic doubling in projective coordinates 2020-04-15 22:23:46 +02:00
Mamy André-Ratsimbazafy 7ae0f51000
benchmarking skips cycle counting for ARM 2020-04-15 21:24:18 +02:00
Mamy André-Ratsimbazafy e0c1e0b1c8
Add EC bench on G1 + Add throughput to benches 2020-04-15 19:38:02 +02:00