constantine

Commit Graph

Author	SHA1	Message	Date
Mamy Ratsimbazafy	9a7137466e	C API for Ethereum BLS signatures (#228 ) * [testsuite] Rework parallel test runner to buffer beyond 65536 chars and properly wait for process exit * [testsuite] improve error reporting * rework openArray[byte/char] for BLS signature C API * Prepare for optimized library and bindings * properly link to constantine * Compiler fixes, global sanitizers, GCC bug with --opt:size * workaround/fix #229: don't inline field reduction in Fp2 * fix clang running out of registers with LTO * [C API] missed length parameters for ctt_eth_bls_fast_aggregate_verify * double-precision asm is too large for inlining, try to fix Linux and MacOS woes at https://github.com/mratsim/constantine/pull/228#issuecomment-1512773460 * Use FORTIFY_SOURCE for testing * Fix #230 - gcc miscompiles Fp6 mul with LTO * disable LTO for now, PR is too long	2023-04-18 22:02:23 +02:00
Mamy Ratsimbazafy	93dac2503c	MSM tuning for high core count (#227 ) * tune for high core count * reentrancy: allow nesting of parallel functions by introducing precise scoped barriers * increase collision queue depth	2023-04-14 20:02:59 +02:00
Mamy Ratsimbazafy	6c48975aee	Parallel Multi-Scalar-Multiplication (#226 ) * try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores * initial impl of parallel MSM - scaling to debug, threads not woken fast enough * improve comment [skip ci] * skip top window when c divides the number of bits * for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue? * Add counters and timers to audit threadpool bottlenecks * metrics and profiling fixes, (slower) latency hiding, activate tests * fix thief thread trying to wake another before canceling its own sleep * easier to sort metrics and parallel endomorphism application * selective endomorphism acceleration * some tuning * spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs * improve MSM overview [skip ci] * bench cleanup	2023-04-10 23:30:14 +02:00
Mamy Ratsimbazafy	e5612f5705	Multi-Scalar-Multiplication / Linear combination (#220 ) * unoptimized msm * MSM: reorder loops * add a signed windowed recoding technique * improve wNAF table access * use batchAffine * revamp EC tests * MSM signed digit support * refactor MSM: recode signed ahead of time * missing test vector * refactor allocs and Alloca sideeffect * add an endomorphism threshold * Add Jacobian extended coordinates * refactor recodings, prepare for parallelizable on-the-fly signed recoding * recoding changes, introduce proper NAF for pairings * more pairings refactoring, introduce miller accumulator for EVM * some optim to the addchain miller loop * start optimizing multi-pairing * finish multi-miller loop refactoring * minor tuning * MSM: signed encoding suitable for parallelism (no precompute) * cleanup signed window encoding * add prefetching * add metering * properly init result to infinity * comment on prefetching * introduce vartime inversion for batch additions * fix JacExt infinity conversion * add batchAffine for MSM, though slower than JacExtended at the moment * add a batch affine scheduler for MSM * Add Multi-Scalar-Multiplication endomorphism acceleration * some tuning * signed integer fixes + 32-bit + tuning * Some more tuning * common msm bench + don't use affine for c < 9 * nit	2023-02-16 12:45:05 +01:00
Mamy Ratsimbazafy	082cd1deb9	MSB-to-LSB minimum Hamming Weight Recoding (#219 ) * signed recoding * use recoding	2023-02-07 16:27:53 +01:00
Mamy Ratsimbazafy	7c5421ffdc	move staticFor to the inner repo, not helpers/ for unblocking nimble install (#216 )	2023-02-07 13:11:44 +01:00
Mamy Ratsimbazafy	495ef4497b	Parallel batchadd (#215 ) * [Threadpool] Fix syncAll releasing while a thread was attempting to steal + force no exception in tasks * fix unguarded access on MacOS barriers * parallel batchadd * moved import	2023-01-29 01:06:37 +01:00
Mamy Ratsimbazafy	ff8c26c1fe	BLS Aggregate and Batch verify (#214 ) * pairing -> pairings, and use alloca arrays instead of static arrays * aggregate and batched BLS signature * DLL generation broken by path changes	2023-01-27 00:42:12 +01:00
Mamy Ratsimbazafy	4be89d309f	chore: remove stew/byteutils dependencies and unneeded imports	2023-01-12 20:25:57 +01:00
Mamy Ratsimbazafy	928f515582	Batch additions (#207 ) * Batch elliptic curve addition * accelerate chained muls * jac mixed add handle doubling. jac additions handle aliasing when adding infinity * properly skip sanitizer on BLS signature test * properly skip sanitizer² on BLS signature test	2022-10-29 22:43:40 +02:00
Mamy Ratsimbazafy	351a3f6bd2	Sha256 refactor (#206 ) * sha256: separate message scheduling and state updates to help implement specific use-cases like #205; also implement SSSE3 acceleration (2006, Intel Core 2 Duo) * sha256: simplify update flow, store less metadata in context * sha256: Fix reworked update function * Implement x86 hardware SHA acceleration * typo	2022-09-19 02:02:57 +02:00
Mamy Ratsimbazafy	094445482b	Eip2333 (#202 ) * HMAC-SHA256 * EIP2333 * activate EIP2333 tests and faster random test case generation	2022-08-16 12:07:57 +02:00
Mamy Ratsimbazafy	9770b3108c	Fp12 over fp6 (#201 ) * introduce sumprod for direct fp6_mul * change curves -> constants * forgotten constants * Full pairing using Fp2->Fp6->Fp12 towering	2022-08-14 09:48:10 +02:00
Mamy Ratsimbazafy	74a23244d2	bench isSquare	2022-08-07 19:50:28 +02:00
Mamy Ratsimbazafy	99c9730793	Self-contained bindings generation (#196 ) * First draft at bindings generation * finite field bindings PoC * support openarray, export NimMain * PoC extension fields and elliptic curve bindings * Pasta * expose more bindings, remove nimZeroMem, remove tracer when unused, codegen name_mangling`gensym issue * workaround bad C gensym codegen with {.inline.} pragma in non-dirty template nested in generic proc instantiated by template	2022-08-06 19:05:54 +02:00
Mamy Ratsimbazafy	e29e529f18	Add multipairing for BN curves (#194 )	2022-05-08 19:01:23 +02:00
Mamy Ratsimbazafy	39a8a413de	Pasta curves (#191 ) * Pasta curves field arithmetic * implement elliptic curve arith for the Pasta curves	2022-04-27 00:58:48 +02:00
Mamy Ratsimbazafy	e9e7a1809c	BN254 - Hash-to-Curve (SVDW method) (#190 ) * Hash to BN254-Snarks * Test SVDW code path with old v7 vectors for BLS12-381 * add benches	2022-04-26 21:24:07 +02:00
Mamy Ratsimbazafy	062ae56867	Try to use hash-to-curve for BN254_Snarks but no low-degree isogeny [skip ci]	2022-04-12 23:40:07 +02:00
Mamy Ratsimbazafy	65eedd1cf7	Hash-to-Curve BLS12-381 G1 (#189 ) * Skeleton of hash to curve for BLS12-381 G1 * Remove isodegree parameter * Fix polynomial evaluation of hashToG1 * Optimize hash_to_curve and add bench for hash to G1 * slight optim of jacobian isomap + v7 test vectors	2022-04-11 00:57:16 +02:00
Mamy Ratsimbazafy	bde4f97b56	Line refactor (#188 ) * Align line evaluations to papers notations * Adjust line fusion op * precompute G2 b' for costly D-Twists	2022-04-04 10:10:36 +02:00
Mamy Ratsimbazafy	742cecce08	Poly1305 Message Authentication Code (#186 ) * Groundwork for Poly1305 MAC * Implement fast reduction for Poly1305 * don't import assembly files when compiling without assembly	2022-03-05 23:39:24 +01:00
Mamy Ratsimbazafy	ffacf61e8a	Don't dump all in "backend" (#184 ) * backend -> math * towers -> extension fields * move ISA and compiler specific code out of math/ * fix export	2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy	5bc6d1d426	BLS signatures for Ethereum (BLS sig on BLS12-381 G2 with SHA256) (#183 ) * Finally add the (Ethereum) bls signatures (on BLS12-381 G2) * fix test path and remove old low-level signature test	2022-02-26 21:22:34 +01:00
Mamy Ratsimbazafy	fe500a6a79	Productionize: move protocols top-level vs backend (#179 ) * Productionize: move protocols top-level vs backend * fix path * import fix * the last one * benches as well	2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy	dc73c71801	Pairings optimizations (#178 ) * bench for cyclotomic square, exp and rename cyclotomic exp + multipairings for BLS12-377 * refactor/unify lines and cyclotomic functions * Add Karabina's compressed squaring * Use compressed squarings in final exponentiation * Weighted addchain for bn254_snarks * Add new towering options and cost functions * Rearrange bench summaries * fix BW6-761	2022-02-20 20:15:20 +01:00
Mamy Ratsimbazafy	14af7e8724	Low-level refactoring (#175 ) * Add specific fromMont conversion routine. Rename montyResidue to getMont * missed test file * Add x86_64 ASM for fromMont * Add x86_64 MULX/ADCX/ADOX for fromMont * rework Montgomery Multiplication with prefetch/latency hiding techniques * Fix ADX autodetection, closes #174. Rollback faster mul_mont attempt, no improvement and debug pain. * finalSub in fromMont & adx_bmi -> adx * Some {.noInit.} to avoid Nim zeroMem (which should be optimized away but who knows) * Uniformize name 'op+domain': mulmod - mulmont * Fix asm codegen bug "0x0000555555565930 <+896>: sbb 0x20(%r8),%r8" with Clang in final substraction * Prepare for skipping final substraction * Don't forget to copy the result when we skip the final substraction * Seems like we need to stash the idea of skipping the final substraction for now, needs bounds analysis https://eprint.iacr.org/2017/1057.pdf * fix condition for ASM 32-bit * optim modular addition when sparebit is available	2022-02-14 00:16:55 +01:00
Mamy Ratsimbazafy	53c4db7ead	Fast modular inversion (#172 ) * split modular inversion in its own file * Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf * Stash Pornin's bingcd -> issue with inner modular reduction * Implement Bernstein-Yang inversion * Avoid Nim checks on signed integers (32-bit runtime issue) * cleanup: remove old inversion impls * cleanup: static moduli, move div2 * small comments (skip ci) * comment cleanup (skip ci) * fix total iterations on 32-bit * Add batch conversion to affine coordinates using simultaneous inversion trick * fix conditional setZero and batchAffine conversion * cleanup unneeded branches following affine conversion unification * Fix batchAffine with zero inputs and add fuzz failure to test suite	2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy	f6c02fe075	Optimized subgroup checks and cofactor clearing (#169 ) * Move cofactor clearing to dedicated per-curve subgroups file * Add BLS12-381 fast subgroup checks * Implement fast cofactor clearing for BN254_snarks * Add fast subgroup check to BN254Snarks * add BLS12_377 optimized cofactor and subgroup functions * Add BN254_Nogami * Add GT-subgroup tests * Use the new subgroup checks for Eth1 EVM precompiles	2022-01-03 14:12:58 +01:00
Mamy Ratsimbazafy	c42e2a0251	Rename NotOnTwist/OnTwist => subgroup G1 and G2	2022-01-01 19:17:04 +01:00
Mamy Ratsimbazafy	bea798e27c	Field sqrt optimization (#168 ) * add more Fp tests for Twisted Edwards curves * add fused sqrt+division bench * Significant fused sqrt+division improvement for any prime field over algorithm described in "High-Speed High-Security Signature", Bernstein et al, p15 "Fast decompression", https://ed25519.cr.yp.to/ed25519-20110705.pdf * Activate secp256k1 field benches + spring renaming of field multiplication * addition chains for inversion and sqrt of Curve25519 * Make isSquare use addition chains * add double-prec mul/square bench for <256-bit prime fields.	2022-01-01 16:19:35 +01:00
Mamy Ratsimbazafy	f5c0b6245d	Multipairing (#165 ) * Productionize multipairings for BLS12-381 * typo * arg order + benchmark * Introduce mul_3way_sparse_sparse * cleanup MultiMiller loop * fix init sparse optimization in multimiller loop [skip ci]	2021-08-16 22:22:51 +02:00
Mamy Ratsimbazafy	0bc228126a	hash-to-curve BLS12-381 perf (#163 ) * fp square noasm split from non-4 non-6 limbs fallback (40% speedup) * optimized cofactor clearing for BLS12-381 G2 * Support jacobian isogenies and point_add on isogenies * fuse addition and isogeny map * {.noInit.} and sparseMul * poly_eval_horner init * dedicated invsqrt + cleanup square root file * hash to field: reduce copy overhead and don't return arrays * h2c isogeny jacobian reuse pow 3 precomputed value * Fix sqrt bench	2021-08-14 21:01:50 +02:00
Mamy Ratsimbazafy	499f9605b2	Hash to curve - BLS12-381 (#110 ) * Hash to Curve: impl expand_message_xmd * Try to precompute part of hash to curve at compile-time * sha256 bench - use the new hashes module * [WIP] smoke test hash to field * Implement hash_to_field with expected output * unoptimized hash-to-curve G2 for BLS12-381 * Don't run sanitizer on hash to field as it uses GC-ed strings	2021-08-13 22:07:26 +02:00
Mamy André-Ratsimbazafy	18069e54d3	unrolled SHA256 (for 32B faster only if using ssse3)	2021-02-15 18:43:35 +01:00
Mamy André-Ratsimbazafy	3e977488a9	add bench whole summary for curves	2021-02-14 14:24:48 +01:00
Mamy Ratsimbazafy	9ac9862401	Optimize Miller Loop and prepare Multi-pairing (#159 ) * Pairing with affine: align API to BLST and Gurvy and common use-case. * Implement multi-pairing / aggregate verif for BLS12-381 (+2% pairing perf) * Generalize the optimized miller loop for single pairing * Immplement the miller loop addchain for BLS12-377 * Miller addition chain for BN254-Nogami * no Miller adchain for BN254-Snarks * Update the line test with new tower https://github.com/mratsim/constantine/pull/153 * Somewhat sparse for Fp2 M-Twist * Implement line by line multiplication for Fp12 D-Twist * Somewhat sparse Mul for Fp12 D-Twist * Finish the sparse and somewhat sparse multiplications	2021-02-14 13:06:57 +01:00
Mamy Ratsimbazafy	5806cc4638	Double-Precision towering (#155 ) * consistent naming for dbl-width * Isolate double-width Fp2 mul * Implement double-width complex multiplication * Lay out Fp4 double-width mul * Off by p in square Fp4 as well :/ * less copies and stack space in addition chains * Address https://github.com/mratsim/constantine/issues/154 partly * Fix #154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total) * Fix typo * better assembly scheduling for add/sub * Double-width -> Double-precision * Unred -> Unr * double-precision modular addition * Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits * Complete the double-precision implementation * Use double-precision path for Fp4 squaring and mul * remove mixin annotations * Lazy reduction in Fp4 prod * Fix assembly for sum2xMod * Assembly for double-precision negation * reduce white spaces in pairing benchmarks * ADX implies BMI2	2021-02-09 22:57:45 +01:00
Mamy André-Ratsimbazafy	5710a961a1	Rename ECP_ShortW_Proj -> ECP_ShortW_Prj	2021-02-06 16:29:53 +01:00
Mamy Ratsimbazafy	83dcd988b3	FpDbl revisited (#144 ) - 7% perf improvement everywhere, up to 30% in double-width primitives * reorg mul -> limbs_double_width, ConstantineASM CttASM * Implement squaring specialized scalar path (22% faster than mul) * Implement "portable" assembly for squaring * stash part of the changes * Reorg montgomery reduction - prepare to introduce Comba optimization * Implement comba Montgomery reduce (but it's slower!) * rename t -> a * 30% performance improvement by avoiding toOpenArray! * variable renaming * Fix 32-bit imports * slightly better assembly for sub2x * There is an annoying bottleneck * use out-of-place Fp assembly instead of in-place * diffAlias is unneeded now * cosmetic * speedup fpDbl sub by 20% * Fix Fp2 -> Fp6 -> Fp12 towering. It seems 5% faster * Stash ADCX/ADOX squaring	2021-02-01 03:52:27 +01:00
Mamy Ratsimbazafy	d12d5faf21	Implement Jacobian mixed addition (#142 )	2021-01-30 14:21:55 +01:00
Mamy Ratsimbazafy	82819b1b10	Square Root & Inversion addition chains - 20% perf increase (#132 ) * Addition chain for sqrt BLS12-381: 20% perf improvement * sqrt addchain for BN254_Snarks - 20% perf improvement as well * Fix operation count [skip ci] * BLS12-377 sqrt - 10% perf improvement * sqrt addition chain for BW6-761 - 6% speedup * BN254_Nogami inversion addchain * sqrt addchain for BN254_Nogami * Inversion addchain for BLS12-377 * inversion ddition chain for BW6-761	2021-01-23 20:55:40 +01:00
Mamy Ratsimbazafy	638cb71e16	Fr: Finite Field parametrized by the curve order (#115 ) * Introduce Fr type: finite field over curve order. Need workaround for https://github.com/nim-lang/Nim/issues/16774 * Split curve properties into core and derived * Attach field properties to an instantiated field instead of the curve enum * Workaround https://github.com/nim-lang/Nim/issues/14021, yet another "working with types in macros" is difficult https://github.com/nim-lang/RFCs/issues/44 * Implement finite field over prime order of a curve subgroup * skip OpenSSL tests on windows	2021-01-22 00:09:52 +01:00
Mamy Ratsimbazafy	ac6300555a	Fix test suite (#116 ) * Pin nim-serialization. Workaround #113 and https://github.com/status-im/nim-serialization/issues/33 * Need to workaround nimble installing dependency multiple times * non-interactive * UB sanitizer missing on mingw * Fix OpenSSL benchmark on non-Linux platforms * Accelerate CI: - Skip 32-bit on 64-bit tests - Only test leaf functionality. * Don't define -fstack-protector-all with MinGW * skip line functions and cyclotomic tests (already tested in pairing) + only compile the benches don't run them.	2021-01-21 21:25:42 +01:00
Mamy André-Ratsimbazafy	e89429e822	SHA256 Hash function	2020-12-15 19:18:36 +01:00
mratsim	1383aae105	Remove outdated TODOs [skip ci] - noinline consts: https://github.com/nim-lang/RFCs/issues/257	2020-10-11 21:33:59 +02:00
Mamy Ratsimbazafy	6530596032	Endomorphism acceleration for BN254-Nogami (#102 )	2020-10-10 18:53:48 +02:00
Mamy Ratsimbazafy	a2f46f77b7	Sage constants & tests codegen (#101 ) * Implement a Sage codegenerator for frobenius constants * Sage codegen for pairings * Autogen of endomorphism acceleration constants * The autogen fixed a copy-paste bug in lattice decomposition. We can use conditional negation now and save an add+dbl in scalar mul * small fixes * sage code for square root bls12-377 is not old * readme updates * Provide test suggestions for derive_frobenius * indentation + add equation form to sage * Sage test vector generator * Use the json vectors - includes type system workaround: generic sandwich https://github.com/nim-lang/Nim/issues/11225 - converting NimNode to typedesc: https://github.com/nim-lang/Nim/issues/6785 * Delete old sage code * Install nim-serialization and nim-json-serialization in CI * CI nimble install force yes	2020-10-10 16:19:23 +02:00
Mamy Ratsimbazafy	71bb4c799a	BW6-761 part 1 (#100 ) * Add Fp, Fp2, Fp6 support for BW6-761 * Add G1 for BW6-761 * Prepare to support G2 twists on the same field as G1 * Remove a useless dependent type for lines * Implement G2 for BW6-761 * Fix Line leftover	2020-10-09 07:51:47 +02:00
Mamy Ratsimbazafy	986245b5c1	Jacobian coordinates (#95 ) * Add projective-> affine bench * Add conditional copy and div2 benches * Fp4 benchmarks * Constant-time Jacobian addition * Jacobian doubling * Use a simpler Add+Dbl complete formula * Update tests * Fix conditional negate * Rollaback complete addition, we were only handling curve coef a == 0	2020-10-02 00:01:09 +02:00

1 2

92 Commits