constantine

Commit Graph

Author	SHA1	Message	Date
Mamy Ratsimbazafy	ffacf61e8a	Don't dump all in "backend" (#184 ) * backend -> math * towers -> extension fields * move ISA and compiler specific code out of math/ * fix export	2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy	5bc6d1d426	BLS signatures for Ethereum (BLS sig on BLS12-381 G2 with SHA256) (#183 ) * Finally add the (Ethereum) bls signatures (on BLS12-381 G2) * fix test path and remove old low-level signature test	2022-02-26 21:22:34 +01:00
Mamy Ratsimbazafy	fe500a6a79	Productionize: move protocols top-level vs backend (#179 ) * Productionize: move protocols top-level vs backend * fix path * import fix * the last one * benches as well	2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy	dc73c71801	Pairings optimizations (#178 ) * bench for cyclotomic square, exp and rename cyclotomic exp + multipairings for BLS12-377 * refactor/unify lines and cyclotomic functions * Add Karabina's compressed squaring * Use compressed squarings in final exponentiation * Weighted addchain for bn254_snarks * Add new towering options and cost functions * Rearrange bench summaries * fix BW6-761	2022-02-20 20:15:20 +01:00
Mamy Ratsimbazafy	14af7e8724	Low-level refactoring (#175 ) * Add specific fromMont conversion routine. Rename montyResidue to getMont * missed test file * Add x86_64 ASM for fromMont * Add x86_64 MULX/ADCX/ADOX for fromMont * rework Montgomery Multiplication with prefetch/latency hiding techniques * Fix ADX autodetection, closes #174. Rollback faster mul_mont attempt, no improvement and debug pain. * finalSub in fromMont & adx_bmi -> adx * Some {.noInit.} to avoid Nim zeroMem (which should be optimized away but who knows) * Uniformize name 'op+domain': mulmod - mulmont * Fix asm codegen bug "0x0000555555565930 <+896>: sbb 0x20(%r8),%r8" with Clang in final substraction * Prepare for skipping final substraction * Don't forget to copy the result when we skip the final substraction * Seems like we need to stash the idea of skipping the final substraction for now, needs bounds analysis https://eprint.iacr.org/2017/1057.pdf * fix condition for ASM 32-bit * optim modular addition when sparebit is available	2022-02-14 00:16:55 +01:00
Mamy Ratsimbazafy	53c4db7ead	Fast modular inversion (#172 ) * split modular inversion in its own file * Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf * Stash Pornin's bingcd -> issue with inner modular reduction * Implement Bernstein-Yang inversion * Avoid Nim checks on signed integers (32-bit runtime issue) * cleanup: remove old inversion impls * cleanup: static moduli, move div2 * small comments (skip ci) * comment cleanup (skip ci) * fix total iterations on 32-bit * Add batch conversion to affine coordinates using simultaneous inversion trick * fix conditional setZero and batchAffine conversion * cleanup unneeded branches following affine conversion unification * Fix batchAffine with zero inputs and add fuzz failure to test suite	2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy	f6c02fe075	Optimized subgroup checks and cofactor clearing (#169 ) * Move cofactor clearing to dedicated per-curve subgroups file * Add BLS12-381 fast subgroup checks * Implement fast cofactor clearing for BN254_snarks * Add fast subgroup check to BN254Snarks * add BLS12_377 optimized cofactor and subgroup functions * Add BN254_Nogami * Add GT-subgroup tests * Use the new subgroup checks for Eth1 EVM precompiles	2022-01-03 14:12:58 +01:00
Mamy Ratsimbazafy	c42e2a0251	Rename NotOnTwist/OnTwist => subgroup G1 and G2	2022-01-01 19:17:04 +01:00
Mamy Ratsimbazafy	bea798e27c	Field sqrt optimization (#168 ) * add more Fp tests for Twisted Edwards curves * add fused sqrt+division bench * Significant fused sqrt+division improvement for any prime field over algorithm described in "High-Speed High-Security Signature", Bernstein et al, p15 "Fast decompression", https://ed25519.cr.yp.to/ed25519-20110705.pdf * Activate secp256k1 field benches + spring renaming of field multiplication * addition chains for inversion and sqrt of Curve25519 * Make isSquare use addition chains * add double-prec mul/square bench for <256-bit prime fields.	2022-01-01 16:19:35 +01:00
Mamy Ratsimbazafy	f5c0b6245d	Multipairing (#165 ) * Productionize multipairings for BLS12-381 * typo * arg order + benchmark * Introduce mul_3way_sparse_sparse * cleanup MultiMiller loop * fix init sparse optimization in multimiller loop [skip ci]	2021-08-16 22:22:51 +02:00
Mamy Ratsimbazafy	0bc228126a	hash-to-curve BLS12-381 perf (#163 ) * fp square noasm split from non-4 non-6 limbs fallback (40% speedup) * optimized cofactor clearing for BLS12-381 G2 * Support jacobian isogenies and point_add on isogenies * fuse addition and isogeny map * {.noInit.} and sparseMul * poly_eval_horner init * dedicated invsqrt + cleanup square root file * hash to field: reduce copy overhead and don't return arrays * h2c isogeny jacobian reuse pow 3 precomputed value * Fix sqrt bench	2021-08-14 21:01:50 +02:00
Mamy Ratsimbazafy	499f9605b2	Hash to curve - BLS12-381 (#110 ) * Hash to Curve: impl expand_message_xmd * Try to precompute part of hash to curve at compile-time * sha256 bench - use the new hashes module * [WIP] smoke test hash to field * Implement hash_to_field with expected output * unoptimized hash-to-curve G2 for BLS12-381 * Don't run sanitizer on hash to field as it uses GC-ed strings	2021-08-13 22:07:26 +02:00
Mamy André-Ratsimbazafy	18069e54d3	unrolled SHA256 (for 32B faster only if using ssse3)	2021-02-15 18:43:35 +01:00
Mamy André-Ratsimbazafy	3e977488a9	add bench whole summary for curves	2021-02-14 14:24:48 +01:00
Mamy Ratsimbazafy	9ac9862401	Optimize Miller Loop and prepare Multi-pairing (#159 ) * Pairing with affine: align API to BLST and Gurvy and common use-case. * Implement multi-pairing / aggregate verif for BLS12-381 (+2% pairing perf) * Generalize the optimized miller loop for single pairing * Immplement the miller loop addchain for BLS12-377 * Miller addition chain for BN254-Nogami * no Miller adchain for BN254-Snarks * Update the line test with new tower https://github.com/mratsim/constantine/pull/153 * Somewhat sparse for Fp2 M-Twist * Implement line by line multiplication for Fp12 D-Twist * Somewhat sparse Mul for Fp12 D-Twist * Finish the sparse and somewhat sparse multiplications	2021-02-14 13:06:57 +01:00
Mamy Ratsimbazafy	5806cc4638	Double-Precision towering (#155 ) * consistent naming for dbl-width * Isolate double-width Fp2 mul * Implement double-width complex multiplication * Lay out Fp4 double-width mul * Off by p in square Fp4 as well :/ * less copies and stack space in addition chains * Address https://github.com/mratsim/constantine/issues/154 partly * Fix #154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total) * Fix typo * better assembly scheduling for add/sub * Double-width -> Double-precision * Unred -> Unr * double-precision modular addition * Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits * Complete the double-precision implementation * Use double-precision path for Fp4 squaring and mul * remove mixin annotations * Lazy reduction in Fp4 prod * Fix assembly for sum2xMod * Assembly for double-precision negation * reduce white spaces in pairing benchmarks * ADX implies BMI2	2021-02-09 22:57:45 +01:00
Mamy André-Ratsimbazafy	5710a961a1	Rename ECP_ShortW_Proj -> ECP_ShortW_Prj	2021-02-06 16:29:53 +01:00
Mamy Ratsimbazafy	83dcd988b3	FpDbl revisited (#144 ) - 7% perf improvement everywhere, up to 30% in double-width primitives * reorg mul -> limbs_double_width, ConstantineASM CttASM * Implement squaring specialized scalar path (22% faster than mul) * Implement "portable" assembly for squaring * stash part of the changes * Reorg montgomery reduction - prepare to introduce Comba optimization * Implement comba Montgomery reduce (but it's slower!) * rename t -> a * 30% performance improvement by avoiding toOpenArray! * variable renaming * Fix 32-bit imports * slightly better assembly for sub2x * There is an annoying bottleneck * use out-of-place Fp assembly instead of in-place * diffAlias is unneeded now * cosmetic * speedup fpDbl sub by 20% * Fix Fp2 -> Fp6 -> Fp12 towering. It seems 5% faster * Stash ADCX/ADOX squaring	2021-02-01 03:52:27 +01:00
Mamy Ratsimbazafy	d12d5faf21	Implement Jacobian mixed addition (#142 )	2021-01-30 14:21:55 +01:00
Mamy Ratsimbazafy	82819b1b10	Square Root & Inversion addition chains - 20% perf increase (#132 ) * Addition chain for sqrt BLS12-381: 20% perf improvement * sqrt addchain for BN254_Snarks - 20% perf improvement as well * Fix operation count [skip ci] * BLS12-377 sqrt - 10% perf improvement * sqrt addition chain for BW6-761 - 6% speedup * BN254_Nogami inversion addchain * sqrt addchain for BN254_Nogami * Inversion addchain for BLS12-377 * inversion ddition chain for BW6-761	2021-01-23 20:55:40 +01:00
Mamy Ratsimbazafy	638cb71e16	Fr: Finite Field parametrized by the curve order (#115 ) * Introduce Fr type: finite field over curve order. Need workaround for https://github.com/nim-lang/Nim/issues/16774 * Split curve properties into core and derived * Attach field properties to an instantiated field instead of the curve enum * Workaround https://github.com/nim-lang/Nim/issues/14021, yet another "working with types in macros" is difficult https://github.com/nim-lang/RFCs/issues/44 * Implement finite field over prime order of a curve subgroup * skip OpenSSL tests on windows	2021-01-22 00:09:52 +01:00
Mamy Ratsimbazafy	ac6300555a	Fix test suite (#116 ) * Pin nim-serialization. Workaround #113 and https://github.com/status-im/nim-serialization/issues/33 * Need to workaround nimble installing dependency multiple times * non-interactive * UB sanitizer missing on mingw * Fix OpenSSL benchmark on non-Linux platforms * Accelerate CI: - Skip 32-bit on 64-bit tests - Only test leaf functionality. * Don't define -fstack-protector-all with MinGW * skip line functions and cyclotomic tests (already tested in pairing) + only compile the benches don't run them.	2021-01-21 21:25:42 +01:00
Mamy André-Ratsimbazafy	e89429e822	SHA256 Hash function	2020-12-15 19:18:36 +01:00
mratsim	1383aae105	Remove outdated TODOs [skip ci] - noinline consts: https://github.com/nim-lang/RFCs/issues/257	2020-10-11 21:33:59 +02:00
Mamy Ratsimbazafy	6530596032	Endomorphism acceleration for BN254-Nogami (#102 )	2020-10-10 18:53:48 +02:00
Mamy Ratsimbazafy	a2f46f77b7	Sage constants & tests codegen (#101 ) * Implement a Sage codegenerator for frobenius constants * Sage codegen for pairings * Autogen of endomorphism acceleration constants * The autogen fixed a copy-paste bug in lattice decomposition. We can use conditional negation now and save an add+dbl in scalar mul * small fixes * sage code for square root bls12-377 is not old * readme updates * Provide test suggestions for derive_frobenius * indentation + add equation form to sage * Sage test vector generator * Use the json vectors - includes type system workaround: generic sandwich https://github.com/nim-lang/Nim/issues/11225 - converting NimNode to typedesc: https://github.com/nim-lang/Nim/issues/6785 * Delete old sage code * Install nim-serialization and nim-json-serialization in CI * CI nimble install force yes	2020-10-10 16:19:23 +02:00
Mamy Ratsimbazafy	71bb4c799a	BW6-761 part 1 (#100 ) * Add Fp, Fp2, Fp6 support for BW6-761 * Add G1 for BW6-761 * Prepare to support G2 twists on the same field as G1 * Remove a useless dependent type for lines * Implement G2 for BW6-761 * Fix Line leftover	2020-10-09 07:51:47 +02:00
Mamy Ratsimbazafy	986245b5c1	Jacobian coordinates (#95 ) * Add projective-> affine bench * Add conditional copy and div2 benches * Fp4 benchmarks * Constant-time Jacobian addition * Jacobian doubling * Use a simpler Add+Dbl complete formula * Update tests * Fix conditional negate * Rollaback complete addition, we were only handling curve coef a == 0	2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy	0effd66dbd	SWei -> SHortW, weierstrass -> shortweierstrass	2020-09-27 23:02:48 +02:00
Mamy André-Ratsimbazafy	92183c8b05	Remove unused curves	2020-09-27 13:13:45 +02:00
Mamy Ratsimbazafy	0e4dbfe400	BLS12-377 (#91 ) * add Sage for constant time tonelli shanks * Fused sqrt and invsqrt via Tonelli Shanks * isolate sqrt in their own folder * Implement constant-time Tonelli Shanks for any prime * Implement Fp2 sqrt for any non-residue * Add tests for BLS12_377 * Lattice decomposition script for BLS12_377 G1 * BLS12-377 G1 GLV ok, G2 GLV issue * Proper endomorphism acceleration support for BLS12-377 * Add naive pairing support for BLS12-377 * Activate more bench for BLS12-377 * Fix MSB computation * Optimize final exponentiation + add benches	2020-09-27 09:15:14 +02:00
Mamy Ratsimbazafy	6ecbedbd09	Mixed addition (#90 ) * ptrettier comments * Implement mixed addition on G1 * Test for mixed addition in G2 and use it for Miller Loop	2020-09-26 09:16:29 +02:00
Mamy Ratsimbazafy	03ecb31c57	Pairings for BN254-Nogami and BN254-Snarks (#86 ) * Implement optimized final exponentiation for BN254-Nogami * And BN254 Snarks support * Optimize D-Twist sparse Fp12 x line multiplication * Move quadruple/octuple and add to Github issues: https://github.com/mratsim/constantine/issues/88 [skip ci]	2020-09-25 21:58:20 +02:00
Mamy Ratsimbazafy	f78ed23dad	Pairing optim (#85 ) * Fix fp12 Frobenius map * Implement cyclotomic subgroup acceleration * make cyclotomic squaring in-place * Add back out-place cycl squaring and add cyclotomic inverse * Implement state-of-the-art BLS12-381 final exponentiation * save a cyclotomic squaring * Accelerate sparse line multiplication in Miller loop * Add pairing bench * fix comments	2020-09-24 17:18:23 +02:00
Mamy Ratsimbazafy	d84edcd217	Naive pairings + Naive cofactor clearing (#82 ) * Pairing - initial commit - line functions - sparse Fp12 functions * Small fixes: - Line parametrized by twist for generic algorithm - Add a conjugate operator for quadratic extensions - Have frobenius use it - Create an Affine coordinate type for elliptic curve * Implement (failing) pairing test * Stash pairing debug session, temp switch Fp12 over Fp4 * Proper naive pairing on BLS12-381 * Frobenius map * Implement naive pairing for BN curves * Add pairing tests to CI + reduce time spent on lower-level tests * Test without assembler in Github Actions + less base layers test iterations	2020-09-21 23:24:00 +02:00
Mamy Ratsimbazafy	28e83e7b49	Faster inversion with addition chains (#80 )	2020-09-04 19:04:32 +02:00
Mamy Ratsimbazafy	85d365359d	Endomorphism G2 (#79 ) * Clear cofactor in BN254 G2 testgen and frobenius * Implement G2 endomorphism acceleration in Sage * Somewhat working accelerated scalar mul G2 (2.2x) faster - OK for BN254_Snarks - Some test failing for BLS12-381 * Fix negative miniscalars by adding an extra bit of encoding * Cleanup accel params * Small recoding optimizations	2020-09-03 23:10:48 +02:00
Mamy Ratsimbazafy	6ac974d65e	Windowed GLV acceleration - 25% faster signing on G1 (#74 ) * Fix 8x bigger than necessary encoding size of miniscalars in scalar mul * initial windowed GLV-SAC implementation * Simplify table encoding to match k0 without flipping bits	2020-08-25 00:02:30 +02:00
Mamy Ratsimbazafy	d41c653c8a	Double-width tower extension part 1 (#72 ) * Implement double-width field multiplication for double-width towering * Fp2 mul acceleration via double-width lazy reduction (pure Nim) * Inline assembly for basic add and sub * Use 2 registers instead of 12+ for ASM conditional copy * Prepare assembly for extended multiprecision multiplication support * Add assembly for mul * initial implementation of assembly reduction * stash current progress of assembly reduction * Fix clobbering issue, only P256 comparison remain buggy * Fix asm montgomery reduction for NIST P256 as well * MULX/ADCX/ADOX multi-precision multiplication * MULX/ADCX/ADOX reduction v1 * Add (deactivated) assembly for double-width substraction + rework benches * Add bench to nimble and deactivate double-width for now. slower than classic * Fix x86-32 running out of registers for mul * Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina) * 32-bit doesn't have enough registers for ASM mul * Fix again Travis Clang 9 issues * LLVM 9 is not whitelisted in travis * deactivated assembler with travis clang * syntax error * another * ... * missing space, yeah ...	2020-08-20 10:21:39 +02:00
Mamy Ratsimbazafy	d97bc9b61c	Assembly backend (#69 ) * Proof-of-Concept Assembly code generator * Tag inline per procedure so we can easily track the tradeoff on tower fields * Implement Assembly for modular addition (but very curious off-by-one) * Fix off-by one for moduli with non msb set * Stash (super fast) alternative but still off by carry * Fix GCC optimizing ASM away * Save 1 register to allow compiling for BLS12-381 (in the GMP test) * The compiler cannot find enough registers if the ASM file is not compiled with -O3 * Add modsub * Add field negation * Implement no-carry Assembly optimized field multiplication * Expose UseX86ASM to the EC benchmark * omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang * Prepare for assembly fallback * Implement fallback for CPU that don't support ADX and BMI2 * Add CPU runtime detection * Update README closes #66 * Remove commented out code	2020-07-24 22:02:30 +02:00
Mamy Ratsimbazafy	a2a2495351	Github Action CI (without GMP) (#29 ) * Github Action CI (without GMP) * Deactivate MacOS, spurious failures: https://github.com/actions/virtual-environments/issues/841 * force install with nimble * Add badge * Don"t include Nim 1.2.x https://github.com/mratsim/constantine/pull/20#issuecomment-646327952 * Action branch mistake * Add back OSX? https://github.com/actions/virtual-environments/issues/841, https://github.com/actions/virtual-environments/issues/969 * fix MacOS target * comment out RDTSC on i386 * Add initialization canaries * Add more verbose output to debug windows failures * spurious windows i386 test * For now only activate Linux and mac * missed include	2020-06-19 22:08:15 +02:00
Mamy André-Ratsimbazafy	d22d981e9e	Implement fused sqrt invsqrt on Fp: Accelerate sqrt on Fp2 by 20% (hashToG2 and property-based testing bottleneck, 4 times slower than inversion and 87 times slower than Fp2 multiplication)	2020-06-17 22:44:52 +02:00
Mamy Ratsimbazafy	d376f08d1b	G2 / Operations on the twisted curve E'(Fp2) (#51 ) * Split elliptic curve tests to better use parallel testing * Add support for printing points on G2 * Implement multiplication and division by optimal sextic non-residue (BLS12-381) * Implement modular square root in 𝔽p2 * Support EC add and EC double on G2 (for BLS12-381) * Support G2 divisive twists with non-unit sextic-non-residue like BN254 snarks * Add EC G2 bench * cleanup some unused warnings * Reorg the tests for parallelization and to avoid instantiating huge files	2020-06-15 22:58:56 +02:00
Mamy Ratsimbazafy	2613356281	Endomorphism acceleration for Scalar Multiplication (#44 ) * Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al * precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585 * Add the φ-accelerated lookup table builder * Add a dedicated bithacks file * cosmetic import consistency * Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions * remove binary * Add the GLV precomputations to the sage scripts * You can't avoid it, bigint multiplication is needed at one point * Add bigint multiplication discarding some low words * Implement the lattice decomposition in sage * Proper decomposition for BN254 * Prepare the code for a new scalar mul * We compile, and now debugging hunt * More helpers to debug GLV scalar Mul * Fix conditional negation * Endomorphism accelerated scalar mul working for BN254 curve * Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point) * fix nimble test script after bench rename	2020-06-14 15:39:06 +02:00
Mamy Ratsimbazafy	3d1b1fab98	Fix benchmark on ARM (#31 )	2020-06-04 22:09:30 +02:00
Mamy Ratsimbazafy	82ceca6e3b	Scalar mul tests (#28 ) * Add sage script for BN254 * Implement (failing) scalar multiplication tests * Add a first test against sagemath * Finish the tests against SAGE for BN254 * Add significant test coverage of scalar multiplication with reference checks for BN254_Snarks and BLS12_381	2020-06-04 20:37:29 +02:00
Mamy André-Ratsimbazafy	44350d08af	Add elliptic doubling in projective coordinates	2020-04-15 22:23:46 +02:00
Mamy André-Ratsimbazafy	7ae0f51000	benchmarking skips cycle counting for ARM	2020-04-15 21:24:18 +02:00
Mamy André-Ratsimbazafy	e0c1e0b1c8	Add EC bench on G1 + Add throughput to benches	2020-04-15 19:38:02 +02:00
Mamy André-Ratsimbazafy	aff44f4d8e	Implement constant-time `div2` on finite and extension fields	2020-04-15 02:12:45 +02:00

1 2

70 Commits