Commit Graph

103 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy 9a7137466e
C API for Ethereum BLS signatures (#228)
* [testsuite] Rework parallel test runner to buffer beyond 65536 chars and properly wait for process exit

* [testsuite] improve error reporting

* rework openArray[byte/char] for BLS signature C API

* Prepare for optimized library and bindings

* properly link to constantine

* Compiler fixes, global sanitizers, GCC bug with --opt:size

* workaround/fix #229: don't inline field reduction in Fp2

* fix clang running out of registers with LTO

* [C API] missed length parameters for ctt_eth_bls_fast_aggregate_verify

* double-precision asm is too large for inlining, try to fix Linux and MacOS woes at https://github.com/mratsim/constantine/pull/228#issuecomment-1512773460

* Use FORTIFY_SOURCE for testing

* Fix #230 - gcc miscompiles Fp6 mul with LTO

* disable LTO for now, PR is too long
2023-04-18 22:02:23 +02:00
Mamy Ratsimbazafy 6c48975aee
Parallel Multi-Scalar-Multiplication (#226)
* try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores

* initial impl of parallel MSM - scaling to debug, threads not woken fast enough

* improve comment [skip ci]

* skip top window when c divides the number of bits

* for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue?

* Add counters and timers to audit threadpool bottlenecks

* metrics and profiling fixes, (slower) latency hiding, activate tests

* fix thief thread trying to wake another before canceling its own sleep

* easier to sort metrics and parallel endomorphism application

* selective endomorphism acceleration

* some tuning

* spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs

* improve MSM overview [skip ci]

* bench cleanup
2023-04-10 23:30:14 +02:00
Mamy Ratsimbazafy 4dc2610557
Bindings "filesystem" (#225)
* bindings structure

* missed some renaming

* add back the headers

* path fixes

* need to sleep at night

* windows path mystery is unfathomable
2023-03-01 12:59:06 +01:00
Mamy Ratsimbazafy 1dfbb8bd4f
[Threadpool] Remove reserve threads (#223)
* remove reserve threads

* recover last perf diff: 1. don't import primitives, cpu features detection globals are noticeable, 2. noinit + conditional zeroMem are unnecessary when sync is inline 3. inline 'newSpawn' and don't init the loop part

* avoid syscalls if possible if thred is awake but idle

* renaming eventLoop

* remove unused code: steal-half

* renaming

* no need for 0-init sync, T can be large in cryptography
2023-02-24 17:36:04 +01:00
Mamy Ratsimbazafy bf32c2d408
Parallel for (#222)
* introduce reserve threads to minimize latency and maximize throughput when awaiting a future

* introduce a ceilDiv proc

* threadpool: implement parallel-for loops

* 10x perf improvement by not waking reserveBackoff on syncAll

* bench overhead: new reserve system might introduce too much wakeup latency, 2x slower, for fine-grained parallelism

* add parallelForStrided

* Threadpool: Implement parallel reductions

* refactor parallel loop codegen: introduce descriptor, parsing and codegen stages

* parallel strided, test transpose bench

* tight loop is faster when backoff is not inline

* no POSIX stuff on windows, larger types for histogram bench

* fix tests

* max RSS overflow?

* missed an undefined var

* exit histogram on 32-bit

* forgot to return early dor 32-bit
2023-02-24 09:47:36 +01:00
Mamy Ratsimbazafy e5612f5705
Multi-Scalar-Multiplication / Linear combination (#220)
* unoptimized msm

* MSM: reorder loops

* add a signed windowed recoding technique

* improve wNAF table access

* use batchAffine

* revamp EC tests

* MSM signed digit support

* refactor MSM: recode signed ahead of time

* missing test vector

* refactor allocs and Alloca sideeffect

* add an endomorphism threshold

* Add Jacobian extended coordinates

* refactor recodings, prepare for parallelizable on-the-fly signed recoding

* recoding changes, introduce proper NAF for pairings

* more pairings refactoring, introduce miller accumulator for EVM

* some optim to the addchain miller loop

* start optimizing multi-pairing

* finish multi-miller loop refactoring

* minor tuning

* MSM: signed encoding suitable for parallelism (no precompute)

* cleanup signed window encoding

* add prefetching

* add metering

* properly init result to infinity

* comment on prefetching

* introduce vartime inversion for batch additions

* fix JacExt infinity conversion

* add batchAffine for MSM, though slower than JacExtended at the moment

* add a batch affine scheduler for MSM

* Add Multi-Scalar-Multiplication endomorphism acceleration

* some tuning

* signed integer fixes + 32-bit + tuning

* Some more tuning

* common msm bench + don't use affine for c < 9

* nit
2023-02-16 12:45:05 +01:00
Mamy Ratsimbazafy 082cd1deb9
MSB-to-LSB minimum Hamming Weight Recoding (#219)
* signed recoding

* use recoding
2023-02-07 16:27:53 +01:00
Mamy Ratsimbazafy a11fca9c60
panics:on (#218) 2023-02-07 13:11:15 +01:00
Mamy Ratsimbazafy 495ef4497b
Parallel batchadd (#215)
* [Threadpool] Fix syncAll releasing while a thread was attempting to steal + force no exception in tasks

* fix unguarded access on MacOS barriers

* parallel batchadd

* moved import
2023-01-29 01:06:37 +01:00
Mamy Ratsimbazafy 7c01affe24
speedup test suite, focus on "integration" tests 2023-01-25 05:47:57 +01:00
Mamy Ratsimbazafy 2931913b67
Add a threadpool (#213)
* Implement a threadpool

* int and SomeUnsignedInt ...

* Type conversion for windows SynchronizationBarrier

* Use the latest MacOS 11, Big Sur API (jan 2021) for MacOS futexes, Github action offers MacOS 12 and can test them

* bench need posix timer not available on windows and darwin futex

* Windows: nimble exec empty line is an error, Mac: use defined(osx) instead of defined(macos)

* file rename

* okay, that's the last one hopefully

* deactivate stealHalf for now
2023-01-24 02:32:28 +01:00
Mamy Ratsimbazafy 1f4bb174a3
[Backend] Add support for Nvidia GPUs (#210)
* Add PoC of JIT exec on Nvidia GPUs [skip ci]

* Split GPU bindings into low-level (ABI) and high-level [skip ci]

* small typedef reorg [skip ci]

* refine LLVM IR/Nvidia GPU hello worlds

* [Nvidia GPU] PoC implementation of field addition [skip ci]

* prod-ready field addition + tests on Nvidia GPUs via LLVM codegen
2023-01-12 01:01:57 +01:00
Mamy Ratsimbazafy 928f515582
Batch additions (#207)
* Batch elliptic curve addition

* accelerate chained muls

* jac mixed add handle doubling. jac additions handle aliasing when adding infinity

* properly skip sanitizer on BLS signature test

* properly skip sanitizer² on BLS signature test
2022-10-29 22:43:40 +02:00
Mamy Ratsimbazafy 93654d580e
pararun: Ignore error #259, sha256: add back a paper 2022-09-19 09:11:16 +02:00
Mamy Ratsimbazafy 7c7290115f
nimble: fix bench_poly1305, improve reporting in pararun 2022-09-19 00:41:37 +02:00
Mamy Ratsimbazafy 2f6144fb7a
add missing benches compilation to test suite 2022-09-18 15:26:42 +02:00
Mamy Ratsimbazafy df048112c3
Example+Test C API vs GMP (#203)
* Example+Test C API vs GMP

* Create build directory for bindings test

* --nimMainPrefix is 1.6 only

* Add libdl for  dynamic loading

* absolute paths

* add static link test

* Fix man main, rename Nimmain to init_NimMain

* Deal with MacOS annoying linker w.r.t. static libraries

* use .exe extension to satisfy windows (?)

* annoying GCC which doesn't create paths

* Try skipping DLL test on windows

* windows extensions ...

* no lib prefix on windows
2022-09-15 17:11:57 +02:00
Mamy Ratsimbazafy 962e7ccf49
CI: enable GMP tests on Windows and Linux 32-bit and fix caching (#204)
* Try to compile with GMP on windows and 32-bit linux

* remove leftover msys shell

* Don't use GMP Mersenne Twister, bad randomness and untested Nim wrapper

* properly cache nim

* fix path after cache

* run pacman in msys2 env

* rework msys2 ... again

* shell compat for file clearing

* shell compat try-again for file clearing

* force bash for clearing parallel builds on windows

* Use nimscript directly (why didn't it work last time?)

* Avoid IO redirection to support any shell

* Avoid IO redirection v2 to support any shell

* add debug data

* add debug again

* Introduce pararun, a parallel test runner to remove need of GNU parallel

* pararun: style
2022-09-15 09:33:34 +02:00
Mamy Ratsimbazafy 094445482b
Eip2333 (#202)
* HMAC-SHA256

* EIP2333

* activate EIP2333 tests and faster random test case generation
2022-08-16 12:07:57 +02:00
Mamy Ratsimbazafy 9770b3108c
Fp12 over fp6 (#201)
* introduce sumprod for direct fp6_mul

* change curves -> constants

* forgotten constants

* Full pairing using Fp2->Fp6->Fp12 towering
2022-08-14 09:48:10 +02:00
Mamy Ratsimbazafy 99c9730793
Self-contained bindings generation (#196)
* First draft at bindings generation

* finite field bindings PoC

* support openarray, export NimMain

* PoC extension fields and elliptic curve bindings

* Pasta

* expose more bindings, remove nimZeroMem, remove tracer when unused, codegen name_mangling`gensym issue

* workaround bad C gensym codegen with {.inline.} pragma in non-dirty template nested in generic proc instantiated by template
2022-08-06 19:05:54 +02:00
Mamy Ratsimbazafy e29e529f18
Add multipairing for BN curves (#194) 2022-05-08 19:01:23 +02:00
Mamy Ratsimbazafy 39a8a413de
Pasta curves (#191)
* Pasta curves field arithmetic

* implement elliptic curve arith for the Pasta curves
2022-04-27 00:58:48 +02:00
Mamy Ratsimbazafy e9e7a1809c
BN254 - Hash-to-Curve (SVDW method) (#190)
* Hash to BN254-Snarks

* Test SVDW code path with old v7 vectors for BLS12-381

* add benches
2022-04-26 21:24:07 +02:00
Mamy Ratsimbazafy 742cecce08
Poly1305 Message Authentication Code (#186)
* Groundwork for Poly1305 MAC

* Implement fast reduction for Poly1305

* don't import assembly files when compiling without assembly
2022-03-05 23:39:24 +01:00
Mamy Ratsimbazafy c2eb42b769
Add ChaCha20 stream cipher 2022-03-02 01:18:47 +01:00
Mamy Ratsimbazafy ffacf61e8a
Don't dump all in "backend" (#184)
* backend -> math

* towers -> extension fields

* move ISA and compiler specific code out of math/

* fix export
2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy 5bc6d1d426
BLS signatures for Ethereum (BLS sig on BLS12-381 G2 with SHA256) (#183)
* Finally add the (Ethereum) bls signatures (on BLS12-381 G2)

* fix test path and remove old low-level signature test
2022-02-26 21:22:34 +01:00
Mamy Ratsimbazafy fe500a6a79
Productionize: move protocols top-level vs backend (#179)
* Productionize: move protocols top-level vs backend

* fix path

* import fix

* the last one

* benches as well
2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy 81acfb1626
Nim 1.6 in CI (#170)
* try 1.6 CI

* Try CI with 1.6 and windows.

* Bend the knee

* have fun debugging CI

* have fun debugging CI

* more CI spam

* branch -> nim_version

* fight or flight

* properly detect windows

* Fix galore

* 🐍 🐍 snake:

* meh give up on parallelizing windows and dealing with windows PATH issues

* ¯\_ (ツ)_/¯
2022-02-20 23:44:00 +01:00
Mamy Ratsimbazafy dc73c71801
Pairings optimizations (#178)
* bench for cyclotomic square, exp and rename cyclotomic exp + multipairings for BLS12-377

* refactor/unify lines and cyclotomic functions

* Add Karabina's compressed squaring

* Use compressed squarings in final exponentiation

* Weighted addchain for bn254_snarks

* Add new towering options and cost functions

* Rearrange bench summaries

* fix BW6-761
2022-02-20 20:15:20 +01:00
Mamy Ratsimbazafy 53c4db7ead
Fast modular inversion (#172)
* split modular inversion in its own file

* Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf

* Stash Pornin's bingcd -> issue with inner modular reduction

* Implement Bernstein-Yang inversion

* Avoid Nim checks on signed integers (32-bit runtime issue)

* cleanup: remove old inversion impls

* cleanup: static moduli, move div2

* small comments (skip ci)

* comment cleanup (skip ci)

* fix total iterations on 32-bit

* Add batch conversion to affine coordinates using simultaneous inversion trick

* fix conditional setZero and batchAffine conversion

* cleanup unneeded branches following affine conversion unification

* Fix batchAffine with zero inputs and add fuzz failure to test suite
2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy 50717d8de6
Test GT-subgroup for BW6-761 (#171) 2022-01-08 17:30:26 +01:00
Mamy Ratsimbazafy f6c02fe075
Optimized subgroup checks and cofactor clearing (#169)
* Move cofactor clearing to dedicated per-curve subgroups file

* Add BLS12-381 fast subgroup checks

* Implement fast cofactor clearing for BN254_snarks

* Add fast subgroup check to BN254Snarks

* add BLS12_377 optimized cofactor and subgroup functions

* Add BN254_Nogami

* Add GT-subgroup tests

* Use the new subgroup checks for Eth1 EVM precompiles
2022-01-03 14:12:58 +01:00
Mamy Ratsimbazafy 53f9708c2b
Initial support for Twisted Edwards curves (#167)
* Point decoding: optimized sqrt for p ≡ 5 (mod 8) (Curve25519)

* Implement fused sqrt(u/v) for twisted edwards point deserialization

* Introduce twisted edwards affine

* Allow declaration of curve field elements (and fight against recursive dependencies

* Twisted edwards group law + tests

* Add support for jubjub and bandersnatch #162

* test twisted edwards scalar mul
2021-12-29 01:54:17 +01:00
Mamy Ratsimbazafy 1195e5e980
Eth1 evm precompiles (#166)
* Prepare support for Eth1 EVM

* Implement EIP 196 (Ethereum BN254 add/mul)

* Implement ETH1 pairing precompile

* Accelerate isOnCurve for G2 with precomputation
2021-12-15 00:02:11 +01:00
Mamy Ratsimbazafy f5c0b6245d
Multipairing (#165)
* Productionize multipairings for BLS12-381

* typo

* arg order + benchmark

* Introduce mul_3way_sparse_sparse

* cleanup MultiMiller loop

* fix init sparse optimization in multimiller loop [skip ci]
2021-08-16 22:22:51 +02:00
Mamy Ratsimbazafy 979d183657
Tests for the eth2 BLS signature protocol (BLS12-381, pubkeys G1, signatures G2) using low-level primitives (#164) 2021-08-15 11:41:46 +02:00
Mamy Ratsimbazafy 0bc228126a
hash-to-curve BLS12-381 perf (#163)
* fp square noasm split from non-4 non-6 limbs fallback (40% speedup)

* optimized cofactor clearing for BLS12-381 G2

* Support jacobian isogenies and point_add on isogenies

* fuse addition and isogeny map

* {.noInit.} and sparseMul

* poly_eval_horner init

* dedicated invsqrt + cleanup square root file

* hash to field: reduce copy overhead and don't return arrays

* h2c isogeny jacobian reuse pow 3 precomputed value

* Fix sqrt bench
2021-08-14 21:01:50 +02:00
Mamy Ratsimbazafy 499f9605b2
Hash to curve - BLS12-381 (#110)
* Hash to Curve: impl expand_message_xmd

* Try to precompute part of hash to curve at compile-time

* sha256 bench - use the new hashes module

* [WIP] smoke test hash to field

* Implement hash_to_field with expected output

* unoptimized hash-to-curve G2 for BLS12-381

* Don't run sanitizer on hash to field as it uses GC-ed strings
2021-08-13 22:07:26 +02:00
Mamy André-Ratsimbazafy 3e977488a9
add bench whole summary for curves 2021-02-14 14:24:48 +01:00
Mamy Ratsimbazafy 9ac9862401
Optimize Miller Loop and prepare Multi-pairing (#159)
* Pairing with affine: align API to BLST and Gurvy and common use-case.

* Implement multi-pairing / aggregate verif for BLS12-381 (+2% pairing perf)

* Generalize the optimized miller loop for single pairing

* Immplement the miller loop addchain for BLS12-377

* Miller addition chain for BN254-Nogami

* no Miller adchain for BN254-Snarks

* Update the line test with new tower https://github.com/mratsim/constantine/pull/153

* Somewhat sparse for Fp2 M-Twist

* Implement line by line multiplication for Fp12 D-Twist

* Somewhat sparse Mul for Fp12 D-Twist

* Finish the sparse and somewhat sparse multiplications
2021-02-14 13:06:57 +01:00
Mamy Ratsimbazafy 5806cc4638
Double-Precision towering (#155)
* consistent naming for dbl-width

* Isolate double-width Fp2 mul

* Implement double-width complex multiplication

* Lay out Fp4 double-width mul

* Off by p in square Fp4 as well :/

* less copies and stack space in addition chains

* Address https://github.com/mratsim/constantine/issues/154 partly

* Fix #154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total)

* Fix typo

* better assembly scheduling for add/sub

* Double-width -> Double-precision

* Unred -> Unr

* double-precision modular addition

* Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits

* Complete the double-precision implementation

* Use double-precision path for Fp4 squaring and mul

* remove mixin annotations

* Lazy reduction in Fp4 prod

* Fix assembly for sum2xMod

* Assembly for double-precision negation

* reduce white spaces in pairing benchmarks

* ADX implies BMI2
2021-02-09 22:57:45 +01:00
Mamy André-Ratsimbazafy 2c5e12d5f8
Workaround aliasing in Fp12[BLS12-377] inversion, fix #147 2021-02-02 12:53:36 +01:00
Mamy Ratsimbazafy 83dcd988b3
FpDbl revisited (#144) - 7% perf improvement everywhere, up to 30% in double-width primitives
* reorg mul -> limbs_double_width, ConstantineASM CttASM

* Implement squaring specialized scalar path (22% faster than mul)

* Implement "portable" assembly for squaring

* stash part of the changes

* Reorg montgomery reduction - prepare to introduce Comba optimization

* Implement comba Montgomery reduce (but it's slower!)

* rename t -> a

* 30% performance improvement by avoiding toOpenArray!

* variable renaming

* Fix 32-bit imports

* slightly better assembly for sub2x

* There is an annoying bottleneck

* use out-of-place Fp assembly instead of in-place

* diffAlias is unneeded now

* cosmetic

* speedup fpDbl sub by 20%

* Fix Fp2 -> Fp6 -> Fp12 towering. It seems 5% faster

* Stash ADCX/ADOX squaring
2021-02-01 03:52:27 +01:00
Mamy Ratsimbazafy d12d5faf21
Implement Jacobian mixed addition (#142) 2021-01-30 14:21:55 +01:00
Mamy Ratsimbazafy 7e97cd4ac5
Fuzz fix - non-unique modular representation after Assembly negate (#137)
* Fix #114 - Negating 0 left the prime modulus, which is working most of the time for everything except for comparison. (also somehow triggers and workaround weird compiler bug where exceptions tracking is activated in macros and all the curve enums were stringified as their ordinal value)

* https://github.com/mratsim/constantine/issues/136 was also fixed, add to anti-regression

* add comment in test

* Fix the pure Nim fallback as well
2021-01-24 12:35:27 +01:00
Mamy Ratsimbazafy 82819b1b10
Square Root & Inversion addition chains - 20% perf increase (#132)
* Addition chain for sqrt BLS12-381: 20% perf improvement

* sqrt addchain for BN254_Snarks - 20% perf improvement as well

* Fix operation count [skip ci]

* BLS12-377 sqrt - 10% perf improvement

* sqrt addition chain for BW6-761 - 6% speedup

* BN254_Nogami inversion addchain

* sqrt addchain for BN254_Nogami

* Inversion addchain for BLS12-377

* inversion ddition chain for BW6-761
2021-01-23 20:55:40 +01:00
Mamy Ratsimbazafy 638cb71e16
Fr: Finite Field parametrized by the curve order (#115)
* Introduce Fr type: finite field over curve order. Need workaround for https://github.com/nim-lang/Nim/issues/16774

* Split curve properties into core and derived

* Attach field properties to an instantiated field instead of the curve enum

* Workaround https://github.com/nim-lang/Nim/issues/14021, yet another "working with types in macros" is difficult https://github.com/nim-lang/RFCs/issues/44

* Implement finite field over prime order of a curve subgroup

* skip OpenSSL tests on windows
2021-01-22 00:09:52 +01:00
Mamy Ratsimbazafy ac6300555a
Fix test suite (#116)
* Pin nim-serialization. Workaround #113 and https://github.com/status-im/nim-serialization/issues/33

* Need to workaround nimble installing dependency multiple times

* non-interactive

* UB sanitizer missing on mingw

* Fix OpenSSL benchmark on non-Linux platforms

* Accelerate CI:
- Skip 32-bit on 64-bit tests
- Only test leaf functionality.

* Don't define -fstack-protector-all with MinGW

* skip line functions and cyclotomic tests (already tested in pairing) + only compile the benches don't run them.
2021-01-21 21:25:42 +01:00