Commit Graph

103 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy 3ed57d3690
add modexp/modmul benches vs GMP 2023-09-09 10:09:47 +02:00
Mamy Ratsimbazafy 15757557b4
modexp: 2.5x accel on small exponent (#268)
* add metering to modexp

* modexp: accel exponent = 1

* modexp: improve runtime Montgomery constants compute. 2.49x faster on DOS vectors
2023-09-09 09:21:05 +02:00
Mamy Ratsimbazafy b645d68e1a
update bench for modexp (#265) 2023-09-06 17:18:07 +02:00
Mamy Ratsimbazafy b9c911ba37
Accelerate FFT - endomorphism + wNAF vartime scalar mul (#258)
* accel FFT by 30+% with vartime endomorphism support

* silly error fix

* endomorphism + wNAF, closes #253, FFT 20% speedup

* vartime EC addition for all repr

* implement vartime EC add

* finishing touches, renam to fft_vartime
2023-09-04 10:19:14 +02:00
Mamy Ratsimbazafy f57d071f11
Ethereum KZG polynomial commitments / EIP-4844 (part 1) (#239)
* common error model for serialization of BLS signatures and KZG objects

* [KZG] add Ethereum's test vectors [skip ci]

* dump progress on KZG

* Stash: trusted setup generator

* implement cache optimized bit-reversal-permutation

* Add generator for the Ethereum test trusted setups

* implement naive deserialization for the trusted setup interchange format

* implement verify_kzg_proof

* Add test skeleton of verify KZG proof

* rebase import fixes
2023-08-13 15:08:04 +02:00
Mamy Ratsimbazafy b7687ddc4a
Accelerate eth_evm_modexp by 25x by dividing input size by 8 (#249)
* Accelerate eth_evm_modexp by 25x by dividing input size by 8 (scales quadratically)

* instant exponentiation by power of 2 depending on trailing zeroes

* improve bench report

* rename

* rewrite the pow2k even/trailingZero accel

* eth_evm_modexp: remove leftover TimeEffect
2023-07-03 01:45:36 +02:00
Mamy Ratsimbazafy 0eba593951
Pasta / Halo2 MSM bench (#243)
* Pasta bench

* cleanup env variables

* [MSM]: generate benchmark coef-points pairs in parallel

* try to fix windows Ci

* add diagnostic info

* fix old test for new codecs/io primitives

* Ensure the projective point at infinity is not all zeros, but (0, 1, 0)
2023-06-04 17:41:54 +02:00
Mamy Ratsimbazafy b1ef2682d6
Modular exponentiation (arbitrary output) and EIP-198 (#242)
* implement arbitrary precision modular exponentiation (prerequisite EIP-198)

* [modexp] implement exponentiation modulo 2ᵏ

* add inversion (mod 2ᵏ)

* [modexp] High-level wrapper for powmod with odd modulus

* [modexp] faster exponentiation (mod 2ᵏ) for even case and Euler's totient function odd case

* [modexp] implement general fast modular exponentiation

* Fix modular reduction with 64-bit modulus + fuzz powmod vs GMP

* add benchmark

* add EIP-198 support

* fixups following self review

* fix test paths
2023-06-01 23:38:41 +02:00
Mamy Ratsimbazafy d996ccd5d8
Path reorgs (#240)
* move tests

* move threadpool to root path

* fix hints and warnings, print nim versions for tests for debugging the new strange issue in CI

* print nim version

* mixup on branches

* mixup on branches reloaded
2023-05-29 20:14:30 +02:00
Mamy Ratsimbazafy 1c5341fd7e
Perf quick wins - 10% Fp12 mul (#235)
* improve FP12_mul  perf by 10%

* update README [skip ci]
2023-04-28 11:31:17 +02:00
Mamy Ratsimbazafy c6d9a213f2
Rework assembly to be compatible with LTO (#231)
* rework assembler register/mem and constraint declarations

* Introduce constraint UnmutatedPointerToWriteMem

* Create invidual memory cell operands

* [Assembly] fully support indirect memory addressing

* fix calling convention for exported procs

* Prepare for switch to intel syntax to avoid clang constant propagation asm symbol name interfering OR pointer+offset addressing

* use modifiers to prevent bad string mixin fo assembler to linker of propagated consts

* Assembly: switch to intel syntax

* with working memory operand - now works with LTO on both GCC and clang and constant folding

* use memory operand in more places

* remove some inline now that we have lto

* cleanup compiler config and benches

* tracer shouldn't force dependencies when unused

* fix cc on linux

* nimble fixes

* update README [skip CI]

* update MacOS CI with Homebrew Clang

* oops nimble bindings disappeared

* more nimble fixes

* fix sha256 exported symbol

* improve constraints on modular addition

* Add extra constraint to force reloading of pointer in reg inputs

* Fix LLVM gold linker running out of registers

* workaround MinGW64 GCC 12.2 bad codegen in t_pairing_cyclotomic_subgroup with LTO
2023-04-26 06:58:31 +02:00
Mamy Ratsimbazafy 9a7137466e
C API for Ethereum BLS signatures (#228)
* [testsuite] Rework parallel test runner to buffer beyond 65536 chars and properly wait for process exit

* [testsuite] improve error reporting

* rework openArray[byte/char] for BLS signature C API

* Prepare for optimized library and bindings

* properly link to constantine

* Compiler fixes, global sanitizers, GCC bug with --opt:size

* workaround/fix #229: don't inline field reduction in Fp2

* fix clang running out of registers with LTO

* [C API] missed length parameters for ctt_eth_bls_fast_aggregate_verify

* double-precision asm is too large for inlining, try to fix Linux and MacOS woes at https://github.com/mratsim/constantine/pull/228#issuecomment-1512773460

* Use FORTIFY_SOURCE for testing

* Fix #230 - gcc miscompiles Fp6 mul with LTO

* disable LTO for now, PR is too long
2023-04-18 22:02:23 +02:00
Mamy Ratsimbazafy 93dac2503c
MSM tuning for high core count (#227)
* tune for high core count

* reentrancy: allow nesting of parallel functions by introducing precise scoped barriers

* increase collision queue depth
2023-04-14 20:02:59 +02:00
Mamy Ratsimbazafy 6c48975aee
Parallel Multi-Scalar-Multiplication (#226)
* try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores

* initial impl of parallel MSM - scaling to debug, threads not woken fast enough

* improve comment [skip ci]

* skip top window when c divides the number of bits

* for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue?

* Add counters and timers to audit threadpool bottlenecks

* metrics and profiling fixes, (slower) latency hiding, activate tests

* fix thief thread trying to wake another before canceling its own sleep

* easier to sort metrics and parallel endomorphism application

* selective endomorphism acceleration

* some tuning

* spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs

* improve MSM overview [skip ci]

* bench cleanup
2023-04-10 23:30:14 +02:00
Mamy Ratsimbazafy e5612f5705
Multi-Scalar-Multiplication / Linear combination (#220)
* unoptimized msm

* MSM: reorder loops

* add a signed windowed recoding technique

* improve wNAF table access

* use batchAffine

* revamp EC tests

* MSM signed digit support

* refactor MSM: recode signed ahead of time

* missing test vector

* refactor allocs and Alloca sideeffect

* add an endomorphism threshold

* Add Jacobian extended coordinates

* refactor recodings, prepare for parallelizable on-the-fly signed recoding

* recoding changes, introduce proper NAF for pairings

* more pairings refactoring, introduce miller accumulator for EVM

* some optim to the addchain miller loop

* start optimizing multi-pairing

* finish multi-miller loop refactoring

* minor tuning

* MSM: signed encoding suitable for parallelism (no precompute)

* cleanup signed window encoding

* add prefetching

* add metering

* properly init result to infinity

* comment on prefetching

* introduce vartime inversion for batch additions

* fix JacExt infinity conversion

* add batchAffine for MSM, though slower than JacExtended at the moment

* add a batch affine scheduler for MSM

* Add Multi-Scalar-Multiplication endomorphism acceleration

* some tuning

* signed integer fixes + 32-bit + tuning

* Some more tuning

* common msm bench + don't use affine for c < 9

* nit
2023-02-16 12:45:05 +01:00
Mamy Ratsimbazafy 082cd1deb9
MSB-to-LSB minimum Hamming Weight Recoding (#219)
* signed recoding

* use recoding
2023-02-07 16:27:53 +01:00
Mamy Ratsimbazafy 7c5421ffdc
move staticFor to the inner repo, not helpers/ for unblocking nimble install (#216) 2023-02-07 13:11:44 +01:00
Mamy Ratsimbazafy 495ef4497b
Parallel batchadd (#215)
* [Threadpool] Fix syncAll releasing while a thread was attempting to steal + force no exception in tasks

* fix unguarded access on MacOS barriers

* parallel batchadd

* moved import
2023-01-29 01:06:37 +01:00
Mamy Ratsimbazafy ff8c26c1fe
BLS Aggregate and Batch verify (#214)
* pairing -> pairings, and use alloca arrays instead of static arrays

* aggregate and batched BLS signature

* DLL generation broken by path changes
2023-01-27 00:42:12 +01:00
Mamy Ratsimbazafy 4be89d309f
chore: remove stew/byteutils dependencies and unneeded imports 2023-01-12 20:25:57 +01:00
Mamy Ratsimbazafy 928f515582
Batch additions (#207)
* Batch elliptic curve addition

* accelerate chained muls

* jac mixed add handle doubling. jac additions handle aliasing when adding infinity

* properly skip sanitizer on BLS signature test

* properly skip sanitizer² on BLS signature test
2022-10-29 22:43:40 +02:00
Mamy Ratsimbazafy 351a3f6bd2
Sha256 refactor (#206)
* sha256: separate message scheduling and state updates to help implement specific use-cases like #205; also implement SSSE3 acceleration (2006, Intel Core 2 Duo)

* sha256: simplify update flow, store less metadata in context

* sha256: Fix reworked update function

* Implement x86 hardware SHA acceleration

* typo
2022-09-19 02:02:57 +02:00
Mamy Ratsimbazafy 094445482b
Eip2333 (#202)
* HMAC-SHA256

* EIP2333

* activate EIP2333 tests and faster random test case generation
2022-08-16 12:07:57 +02:00
Mamy Ratsimbazafy 9770b3108c
Fp12 over fp6 (#201)
* introduce sumprod for direct fp6_mul

* change curves -> constants

* forgotten constants

* Full pairing using Fp2->Fp6->Fp12 towering
2022-08-14 09:48:10 +02:00
Mamy Ratsimbazafy 74a23244d2
bench isSquare 2022-08-07 19:50:28 +02:00
Mamy Ratsimbazafy 99c9730793
Self-contained bindings generation (#196)
* First draft at bindings generation

* finite field bindings PoC

* support openarray, export NimMain

* PoC extension fields and elliptic curve bindings

* Pasta

* expose more bindings, remove nimZeroMem, remove tracer when unused, codegen name_mangling`gensym issue

* workaround bad C gensym codegen with {.inline.} pragma in non-dirty template nested in generic proc instantiated by template
2022-08-06 19:05:54 +02:00
Mamy Ratsimbazafy e29e529f18
Add multipairing for BN curves (#194) 2022-05-08 19:01:23 +02:00
Mamy Ratsimbazafy 39a8a413de
Pasta curves (#191)
* Pasta curves field arithmetic

* implement elliptic curve arith for the Pasta curves
2022-04-27 00:58:48 +02:00
Mamy Ratsimbazafy e9e7a1809c
BN254 - Hash-to-Curve (SVDW method) (#190)
* Hash to BN254-Snarks

* Test SVDW code path with old v7 vectors for BLS12-381

* add benches
2022-04-26 21:24:07 +02:00
Mamy Ratsimbazafy 062ae56867
Try to use hash-to-curve for BN254_Snarks but no low-degree isogeny [skip ci] 2022-04-12 23:40:07 +02:00
Mamy Ratsimbazafy 65eedd1cf7
Hash-to-Curve BLS12-381 G1 (#189)
* Skeleton of hash to curve for BLS12-381 G1

* Remove isodegree parameter

* Fix polynomial evaluation of hashToG1

* Optimize hash_to_curve and add bench for hash to G1

* slight optim of jacobian isomap + v7 test vectors
2022-04-11 00:57:16 +02:00
Mamy Ratsimbazafy bde4f97b56
Line refactor (#188)
* Align line evaluations to papers notations

* Adjust line fusion op

* precompute G2 b' for costly D-Twists
2022-04-04 10:10:36 +02:00
Mamy Ratsimbazafy 742cecce08
Poly1305 Message Authentication Code (#186)
* Groundwork for Poly1305 MAC

* Implement fast reduction for Poly1305

* don't import assembly files when compiling without assembly
2022-03-05 23:39:24 +01:00
Mamy Ratsimbazafy ffacf61e8a
Don't dump all in "backend" (#184)
* backend -> math

* towers -> extension fields

* move ISA and compiler specific code out of math/

* fix export
2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy 5bc6d1d426
BLS signatures for Ethereum (BLS sig on BLS12-381 G2 with SHA256) (#183)
* Finally add the (Ethereum) bls signatures (on BLS12-381 G2)

* fix test path and remove old low-level signature test
2022-02-26 21:22:34 +01:00
Mamy Ratsimbazafy fe500a6a79
Productionize: move protocols top-level vs backend (#179)
* Productionize: move protocols top-level vs backend

* fix path

* import fix

* the last one

* benches as well
2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy dc73c71801
Pairings optimizations (#178)
* bench for cyclotomic square, exp and rename cyclotomic exp + multipairings for BLS12-377

* refactor/unify lines and cyclotomic functions

* Add Karabina's compressed squaring

* Use compressed squarings in final exponentiation

* Weighted addchain for bn254_snarks

* Add new towering options and cost functions

* Rearrange bench summaries

* fix BW6-761
2022-02-20 20:15:20 +01:00
Mamy Ratsimbazafy 14af7e8724
Low-level refactoring (#175)
* Add specific fromMont conversion routine. Rename montyResidue to getMont

* missed test file

* Add x86_64 ASM for fromMont

* Add x86_64 MULX/ADCX/ADOX for fromMont

* rework Montgomery Multiplication with prefetch/latency hiding techniques

* Fix ADX autodetection, closes #174. Rollback faster mul_mont attempt, no improvement and debug pain.

* finalSub in fromMont & adx_bmi -> adx

* Some {.noInit.} to avoid Nim zeroMem (which should be optimized away but who knows)

* Uniformize name 'op+domain': mulmod - mulmont

* Fix asm codegen bug "0x0000555555565930 <+896>:   sbb    0x20(%r8),%r8" with Clang in final substraction

* Prepare for skipping final substraction

* Don't forget to copy the result when we skip the final substraction

* Seems like we need to stash the idea of skipping the final substraction for now, needs bounds analysis https://eprint.iacr.org/2017/1057.pdf

* fix condition for ASM 32-bit

* optim modular addition when sparebit is available
2022-02-14 00:16:55 +01:00
Mamy Ratsimbazafy 53c4db7ead
Fast modular inversion (#172)
* split modular inversion in its own file

* Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf

* Stash Pornin's bingcd -> issue with inner modular reduction

* Implement Bernstein-Yang inversion

* Avoid Nim checks on signed integers (32-bit runtime issue)

* cleanup: remove old inversion impls

* cleanup: static moduli, move div2

* small comments (skip ci)

* comment cleanup (skip ci)

* fix total iterations on 32-bit

* Add batch conversion to affine coordinates using simultaneous inversion trick

* fix conditional setZero and batchAffine conversion

* cleanup unneeded branches following affine conversion unification

* Fix batchAffine with zero inputs and add fuzz failure to test suite
2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy f6c02fe075
Optimized subgroup checks and cofactor clearing (#169)
* Move cofactor clearing to dedicated per-curve subgroups file

* Add BLS12-381 fast subgroup checks

* Implement fast cofactor clearing for BN254_snarks

* Add fast subgroup check to BN254Snarks

* add BLS12_377 optimized cofactor and subgroup functions

* Add BN254_Nogami

* Add GT-subgroup tests

* Use the new subgroup checks for Eth1 EVM precompiles
2022-01-03 14:12:58 +01:00
Mamy Ratsimbazafy c42e2a0251
Rename NotOnTwist/OnTwist => subgroup G1 and G2 2022-01-01 19:17:04 +01:00
Mamy Ratsimbazafy bea798e27c
Field sqrt optimization (#168)
* add more Fp tests for Twisted Edwards curves

* add fused sqrt+division bench

* Significant fused sqrt+division improvement for any prime field over algorithm described in  "High-Speed High-Security Signature", Bernstein et al, p15 "Fast decompression", https://ed25519.cr.yp.to/ed25519-20110705.pdf

* Activate secp256k1 field benches + spring renaming of field multiplication

* addition chains for inversion and sqrt of Curve25519

* Make isSquare use addition chains

* add double-prec mul/square bench for <256-bit prime fields.
2022-01-01 16:19:35 +01:00
Mamy Ratsimbazafy f5c0b6245d
Multipairing (#165)
* Productionize multipairings for BLS12-381

* typo

* arg order + benchmark

* Introduce mul_3way_sparse_sparse

* cleanup MultiMiller loop

* fix init sparse optimization in multimiller loop [skip ci]
2021-08-16 22:22:51 +02:00
Mamy Ratsimbazafy 0bc228126a
hash-to-curve BLS12-381 perf (#163)
* fp square noasm split from non-4 non-6 limbs fallback (40% speedup)

* optimized cofactor clearing for BLS12-381 G2

* Support jacobian isogenies and point_add on isogenies

* fuse addition and isogeny map

* {.noInit.} and sparseMul

* poly_eval_horner init

* dedicated invsqrt + cleanup square root file

* hash to field: reduce copy overhead and don't return arrays

* h2c isogeny jacobian reuse pow 3 precomputed value

* Fix sqrt bench
2021-08-14 21:01:50 +02:00
Mamy Ratsimbazafy 499f9605b2
Hash to curve - BLS12-381 (#110)
* Hash to Curve: impl expand_message_xmd

* Try to precompute part of hash to curve at compile-time

* sha256 bench - use the new hashes module

* [WIP] smoke test hash to field

* Implement hash_to_field with expected output

* unoptimized hash-to-curve G2 for BLS12-381

* Don't run sanitizer on hash to field as it uses GC-ed strings
2021-08-13 22:07:26 +02:00
Mamy André-Ratsimbazafy 18069e54d3
unrolled SHA256 (for 32B faster only if using ssse3) 2021-02-15 18:43:35 +01:00
Mamy André-Ratsimbazafy 3e977488a9
add bench whole summary for curves 2021-02-14 14:24:48 +01:00
Mamy Ratsimbazafy 9ac9862401
Optimize Miller Loop and prepare Multi-pairing (#159)
* Pairing with affine: align API to BLST and Gurvy and common use-case.

* Implement multi-pairing / aggregate verif for BLS12-381 (+2% pairing perf)

* Generalize the optimized miller loop for single pairing

* Immplement the miller loop addchain for BLS12-377

* Miller addition chain for BN254-Nogami

* no Miller adchain for BN254-Snarks

* Update the line test with new tower https://github.com/mratsim/constantine/pull/153

* Somewhat sparse for Fp2 M-Twist

* Implement line by line multiplication for Fp12 D-Twist

* Somewhat sparse Mul for Fp12 D-Twist

* Finish the sparse and somewhat sparse multiplications
2021-02-14 13:06:57 +01:00
Mamy Ratsimbazafy 5806cc4638
Double-Precision towering (#155)
* consistent naming for dbl-width

* Isolate double-width Fp2 mul

* Implement double-width complex multiplication

* Lay out Fp4 double-width mul

* Off by p in square Fp4 as well :/

* less copies and stack space in addition chains

* Address https://github.com/mratsim/constantine/issues/154 partly

* Fix #154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total)

* Fix typo

* better assembly scheduling for add/sub

* Double-width -> Double-precision

* Unred -> Unr

* double-precision modular addition

* Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits

* Complete the double-precision implementation

* Use double-precision path for Fp4 squaring and mul

* remove mixin annotations

* Lazy reduction in Fp4 prod

* Fix assembly for sum2xMod

* Assembly for double-precision negation

* reduce white spaces in pairing benchmarks

* ADX implies BMI2
2021-02-09 22:57:45 +01:00
Mamy André-Ratsimbazafy 5710a961a1
Rename ECP_ShortW_Proj -> ECP_ShortW_Prj 2021-02-06 16:29:53 +01:00