constantine

Commit Graph

Author	SHA1	Message	Date
Mamy Ratsimbazafy	39a8a413de	Pasta curves (#191 ) * Pasta curves field arithmetic * implement elliptic curve arith for the Pasta curves	2022-04-27 00:58:48 +02:00
Mamy Ratsimbazafy	ffacf61e8a	Don't dump all in "backend" (#184 ) * backend -> math * towers -> extension fields * move ISA and compiler specific code out of math/ * fix export	2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy	fe500a6a79	Productionize: move protocols top-level vs backend (#179 ) * Productionize: move protocols top-level vs backend * fix path * import fix * the last one * benches as well	2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy	dc73c71801	Pairings optimizations (#178 ) * bench for cyclotomic square, exp and rename cyclotomic exp + multipairings for BLS12-377 * refactor/unify lines and cyclotomic functions * Add Karabina's compressed squaring * Use compressed squarings in final exponentiation * Weighted addchain for bn254_snarks * Add new towering options and cost functions * Rearrange bench summaries * fix BW6-761	2022-02-20 20:15:20 +01:00
Mamy Ratsimbazafy	14af7e8724	Low-level refactoring (#175 ) * Add specific fromMont conversion routine. Rename montyResidue to getMont * missed test file * Add x86_64 ASM for fromMont * Add x86_64 MULX/ADCX/ADOX for fromMont * rework Montgomery Multiplication with prefetch/latency hiding techniques * Fix ADX autodetection, closes #174. Rollback faster mul_mont attempt, no improvement and debug pain. * finalSub in fromMont & adx_bmi -> adx * Some {.noInit.} to avoid Nim zeroMem (which should be optimized away but who knows) * Uniformize name 'op+domain': mulmod - mulmont * Fix asm codegen bug "0x0000555555565930 <+896>: sbb 0x20(%r8),%r8" with Clang in final substraction * Prepare for skipping final substraction * Don't forget to copy the result when we skip the final substraction * Seems like we need to stash the idea of skipping the final substraction for now, needs bounds analysis https://eprint.iacr.org/2017/1057.pdf * fix condition for ASM 32-bit * optim modular addition when sparebit is available	2022-02-14 00:16:55 +01:00
Mamy Ratsimbazafy	53c4db7ead	Fast modular inversion (#172 ) * split modular inversion in its own file * Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf * Stash Pornin's bingcd -> issue with inner modular reduction * Implement Bernstein-Yang inversion * Avoid Nim checks on signed integers (32-bit runtime issue) * cleanup: remove old inversion impls * cleanup: static moduli, move div2 * small comments (skip ci) * comment cleanup (skip ci) * fix total iterations on 32-bit * Add batch conversion to affine coordinates using simultaneous inversion trick * fix conditional setZero and batchAffine conversion * cleanup unneeded branches following affine conversion unification * Fix batchAffine with zero inputs and add fuzz failure to test suite	2022-02-10 14:05:07 +01:00
Mamy Ratsimbazafy	bea798e27c	Field sqrt optimization (#168 ) * add more Fp tests for Twisted Edwards curves * add fused sqrt+division bench * Significant fused sqrt+division improvement for any prime field over algorithm described in "High-Speed High-Security Signature", Bernstein et al, p15 "Fast decompression", https://ed25519.cr.yp.to/ed25519-20110705.pdf * Activate secp256k1 field benches + spring renaming of field multiplication * addition chains for inversion and sqrt of Curve25519 * Make isSquare use addition chains * add double-prec mul/square bench for <256-bit prime fields.	2022-01-01 16:19:35 +01:00
Mamy Ratsimbazafy	82819b1b10	Square Root & Inversion addition chains - 20% perf increase (#132 ) * Addition chain for sqrt BLS12-381: 20% perf improvement * sqrt addchain for BN254_Snarks - 20% perf improvement as well * Fix operation count [skip ci] * BLS12-377 sqrt - 10% perf improvement * sqrt addition chain for BW6-761 - 6% speedup * BN254_Nogami inversion addchain * sqrt addchain for BN254_Nogami * Inversion addchain for BLS12-377 * inversion ddition chain for BW6-761	2021-01-23 20:55:40 +01:00
Mamy Ratsimbazafy	986245b5c1	Jacobian coordinates (#95 ) * Add projective-> affine bench * Add conditional copy and div2 benches * Fp4 benchmarks * Constant-time Jacobian addition * Jacobian doubling * Use a simpler Add+Dbl complete formula * Update tests * Fix conditional negate * Rollaback complete addition, we were only handling curve coef a == 0	2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy	92183c8b05	Remove unused curves	2020-09-27 13:13:45 +02:00
Mamy Ratsimbazafy	0e4dbfe400	BLS12-377 (#91 ) * add Sage for constant time tonelli shanks * Fused sqrt and invsqrt via Tonelli Shanks * isolate sqrt in their own folder * Implement constant-time Tonelli Shanks for any prime * Implement Fp2 sqrt for any non-residue * Add tests for BLS12_377 * Lattice decomposition script for BLS12_377 G1 * BLS12-377 G1 GLV ok, G2 GLV issue * Proper endomorphism acceleration support for BLS12-377 * Add naive pairing support for BLS12-377 * Activate more bench for BLS12-377 * Fix MSB computation * Optimize final exponentiation + add benches	2020-09-27 09:15:14 +02:00
Mamy Ratsimbazafy	28e83e7b49	Faster inversion with addition chains (#80 )	2020-09-04 19:04:32 +02:00
Mamy Ratsimbazafy	d97bc9b61c	Assembly backend (#69 ) * Proof-of-Concept Assembly code generator * Tag inline per procedure so we can easily track the tradeoff on tower fields * Implement Assembly for modular addition (but very curious off-by-one) * Fix off-by one for moduli with non msb set * Stash (super fast) alternative but still off by carry * Fix GCC optimizing ASM away * Save 1 register to allow compiling for BLS12-381 (in the GMP test) * The compiler cannot find enough registers if the ASM file is not compiled with -O3 * Add modsub * Add field negation * Implement no-carry Assembly optimized field multiplication * Expose UseX86ASM to the EC benchmark * omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang * Prepare for assembly fallback * Implement fallback for CPU that don't support ADX and BMI2 * Add CPU runtime detection * Update README closes #66 * Remove commented out code	2020-07-24 22:02:30 +02:00
Mamy André-Ratsimbazafy	d22d981e9e	Implement fused sqrt invsqrt on Fp: Accelerate sqrt on Fp2 by 20% (hashToG2 and property-based testing bottleneck, 4 times slower than inversion and 87 times slower than Fp2 multiplication)	2020-06-17 22:44:52 +02:00
Mamy André-Ratsimbazafy	e0c1e0b1c8	Add EC bench on G1 + Add throughput to benches	2020-04-15 19:38:02 +02:00
Mamy Ratsimbazafy	c04721a04e	Refactor: Higher-Kinded Tower of Extension Fields (#25 ) * Mention that the inverse of 0 is 0 (TODO tests) * Introduce "Higher-Kinded tower extensions" * rename isCOmplexExtension -> fromComplexExtension * update benchmarks with the new tower scheme * Try to recover some speed on mul/squaring for an optimal tower (but this was not it)	2020-04-14 02:05:42 +02:00
Mamy André-Ratsimbazafy	33314fe725	Properly distinguish between Nogami and Snark/Ethereum BN254 closes #19	2020-04-12 03:01:50 +02:00
Mamy André-Ratsimbazafy	8b7374f405	Cleanup in Montgomery Mul, Square, Pow	2020-03-22 13:24:37 +01:00
Mamy André-Ratsimbazafy	1855d14497	Add more curves for testing: Curve25519, BLS12-377, BN446, FKM-447, BLS12-461, BN462	2020-03-21 13:05:58 +01:00
Mamy André-Ratsimbazafy	9e78cd5d6d	Benchmark template for 𝔽p, 𝔽p2, 𝔽p6	2020-03-21 02:31:31 +01:00

20 Commits