constantine

Commit Graph

Author	SHA1	Message	Date
mratsim	1383aae105	Remove outdated TODOs [skip ci] - noinline consts: https://github.com/nim-lang/RFCs/issues/257	2020-10-11 21:33:59 +02:00
Mamy Ratsimbazafy	6530596032	Endomorphism acceleration for BN254-Nogami (#102 )	2020-10-10 18:53:48 +02:00
Mamy Ratsimbazafy	a2f46f77b7	Sage constants & tests codegen (#101 ) * Implement a Sage codegenerator for frobenius constants * Sage codegen for pairings * Autogen of endomorphism acceleration constants * The autogen fixed a copy-paste bug in lattice decomposition. We can use conditional negation now and save an add+dbl in scalar mul * small fixes * sage code for square root bls12-377 is not old * readme updates * Provide test suggestions for derive_frobenius * indentation + add equation form to sage * Sage test vector generator * Use the json vectors - includes type system workaround: generic sandwich https://github.com/nim-lang/Nim/issues/11225 - converting NimNode to typedesc: https://github.com/nim-lang/Nim/issues/6785 * Delete old sage code * Install nim-serialization and nim-json-serialization in CI * CI nimble install force yes	2020-10-10 16:19:23 +02:00
Mamy Ratsimbazafy	71bb4c799a	BW6-761 part 1 (#100 ) * Add Fp, Fp2, Fp6 support for BW6-761 * Add G1 for BW6-761 * Prepare to support G2 twists on the same field as G1 * Remove a useless dependent type for lines * Implement G2 for BW6-761 * Fix Line leftover	2020-10-09 07:51:47 +02:00
Mamy Ratsimbazafy	986245b5c1	Jacobian coordinates (#95 ) * Add projective-> affine bench * Add conditional copy and div2 benches * Fp4 benchmarks * Constant-time Jacobian addition * Jacobian doubling * Use a simpler Add+Dbl complete formula * Update tests * Fix conditional negate * Rollaback complete addition, we were only handling curve coef a == 0	2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy	0effd66dbd	SWei -> SHortW, weierstrass -> shortweierstrass	2020-09-27 23:02:48 +02:00
Mamy André-Ratsimbazafy	92183c8b05	Remove unused curves	2020-09-27 13:13:45 +02:00
Mamy Ratsimbazafy	0e4dbfe400	BLS12-377 (#91 ) * add Sage for constant time tonelli shanks * Fused sqrt and invsqrt via Tonelli Shanks * isolate sqrt in their own folder * Implement constant-time Tonelli Shanks for any prime * Implement Fp2 sqrt for any non-residue * Add tests for BLS12_377 * Lattice decomposition script for BLS12_377 G1 * BLS12-377 G1 GLV ok, G2 GLV issue * Proper endomorphism acceleration support for BLS12-377 * Add naive pairing support for BLS12-377 * Activate more bench for BLS12-377 * Fix MSB computation * Optimize final exponentiation + add benches	2020-09-27 09:15:14 +02:00
Mamy Ratsimbazafy	6ecbedbd09	Mixed addition (#90 ) * ptrettier comments * Implement mixed addition on G1 * Test for mixed addition in G2 and use it for Miller Loop	2020-09-26 09:16:29 +02:00
Mamy Ratsimbazafy	03ecb31c57	Pairings for BN254-Nogami and BN254-Snarks (#86 ) * Implement optimized final exponentiation for BN254-Nogami * And BN254 Snarks support * Optimize D-Twist sparse Fp12 x line multiplication * Move quadruple/octuple and add to Github issues: https://github.com/mratsim/constantine/issues/88 [skip ci]	2020-09-25 21:58:20 +02:00
Mamy Ratsimbazafy	f78ed23dad	Pairing optim (#85 ) * Fix fp12 Frobenius map * Implement cyclotomic subgroup acceleration * make cyclotomic squaring in-place * Add back out-place cycl squaring and add cyclotomic inverse * Implement state-of-the-art BLS12-381 final exponentiation * save a cyclotomic squaring * Accelerate sparse line multiplication in Miller loop * Add pairing bench * fix comments	2020-09-24 17:18:23 +02:00
Mamy Ratsimbazafy	d84edcd217	Naive pairings + Naive cofactor clearing (#82 ) * Pairing - initial commit - line functions - sparse Fp12 functions * Small fixes: - Line parametrized by twist for generic algorithm - Add a conjugate operator for quadratic extensions - Have frobenius use it - Create an Affine coordinate type for elliptic curve * Implement (failing) pairing test * Stash pairing debug session, temp switch Fp12 over Fp4 * Proper naive pairing on BLS12-381 * Frobenius map * Implement naive pairing for BN curves * Add pairing tests to CI + reduce time spent on lower-level tests * Test without assembler in Github Actions + less base layers test iterations	2020-09-21 23:24:00 +02:00
Mamy Ratsimbazafy	28e83e7b49	Faster inversion with addition chains (#80 )	2020-09-04 19:04:32 +02:00
Mamy Ratsimbazafy	85d365359d	Endomorphism G2 (#79 ) * Clear cofactor in BN254 G2 testgen and frobenius * Implement G2 endomorphism acceleration in Sage * Somewhat working accelerated scalar mul G2 (2.2x) faster - OK for BN254_Snarks - Some test failing for BLS12-381 * Fix negative miniscalars by adding an extra bit of encoding * Cleanup accel params * Small recoding optimizations	2020-09-03 23:10:48 +02:00
Mamy Ratsimbazafy	6ac974d65e	Windowed GLV acceleration - 25% faster signing on G1 (#74 ) * Fix 8x bigger than necessary encoding size of miniscalars in scalar mul * initial windowed GLV-SAC implementation * Simplify table encoding to match k0 without flipping bits	2020-08-25 00:02:30 +02:00
Mamy Ratsimbazafy	d41c653c8a	Double-width tower extension part 1 (#72 ) * Implement double-width field multiplication for double-width towering * Fp2 mul acceleration via double-width lazy reduction (pure Nim) * Inline assembly for basic add and sub * Use 2 registers instead of 12+ for ASM conditional copy * Prepare assembly for extended multiprecision multiplication support * Add assembly for mul * initial implementation of assembly reduction * stash current progress of assembly reduction * Fix clobbering issue, only P256 comparison remain buggy * Fix asm montgomery reduction for NIST P256 as well * MULX/ADCX/ADOX multi-precision multiplication * MULX/ADCX/ADOX reduction v1 * Add (deactivated) assembly for double-width substraction + rework benches * Add bench to nimble and deactivate double-width for now. slower than classic * Fix x86-32 running out of registers for mul * Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina) * 32-bit doesn't have enough registers for ASM mul * Fix again Travis Clang 9 issues * LLVM 9 is not whitelisted in travis * deactivated assembler with travis clang * syntax error * another * ... * missing space, yeah ...	2020-08-20 10:21:39 +02:00
Mamy Ratsimbazafy	d97bc9b61c	Assembly backend (#69 ) * Proof-of-Concept Assembly code generator * Tag inline per procedure so we can easily track the tradeoff on tower fields * Implement Assembly for modular addition (but very curious off-by-one) * Fix off-by one for moduli with non msb set * Stash (super fast) alternative but still off by carry * Fix GCC optimizing ASM away * Save 1 register to allow compiling for BLS12-381 (in the GMP test) * The compiler cannot find enough registers if the ASM file is not compiled with -O3 * Add modsub * Add field negation * Implement no-carry Assembly optimized field multiplication * Expose UseX86ASM to the EC benchmark * omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang * Prepare for assembly fallback * Implement fallback for CPU that don't support ADX and BMI2 * Add CPU runtime detection * Update README closes #66 * Remove commented out code	2020-07-24 22:02:30 +02:00
Mamy Ratsimbazafy	a2a2495351	Github Action CI (without GMP) (#29 ) * Github Action CI (without GMP) * Deactivate MacOS, spurious failures: https://github.com/actions/virtual-environments/issues/841 * force install with nimble * Add badge * Don"t include Nim 1.2.x https://github.com/mratsim/constantine/pull/20#issuecomment-646327952 * Action branch mistake * Add back OSX? https://github.com/actions/virtual-environments/issues/841, https://github.com/actions/virtual-environments/issues/969 * fix MacOS target * comment out RDTSC on i386 * Add initialization canaries * Add more verbose output to debug windows failures * spurious windows i386 test * For now only activate Linux and mac * missed include	2020-06-19 22:08:15 +02:00
Mamy André-Ratsimbazafy	d22d981e9e	Implement fused sqrt invsqrt on Fp: Accelerate sqrt on Fp2 by 20% (hashToG2 and property-based testing bottleneck, 4 times slower than inversion and 87 times slower than Fp2 multiplication)	2020-06-17 22:44:52 +02:00
Mamy Ratsimbazafy	d376f08d1b	G2 / Operations on the twisted curve E'(Fp2) (#51 ) * Split elliptic curve tests to better use parallel testing * Add support for printing points on G2 * Implement multiplication and division by optimal sextic non-residue (BLS12-381) * Implement modular square root in 𝔽p2 * Support EC add and EC double on G2 (for BLS12-381) * Support G2 divisive twists with non-unit sextic-non-residue like BN254 snarks * Add EC G2 bench * cleanup some unused warnings * Reorg the tests for parallelization and to avoid instantiating huge files	2020-06-15 22:58:56 +02:00
Mamy Ratsimbazafy	2613356281	Endomorphism acceleration for Scalar Multiplication (#44 ) * Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al * precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585 * Add the φ-accelerated lookup table builder * Add a dedicated bithacks file * cosmetic import consistency * Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions * remove binary * Add the GLV precomputations to the sage scripts * You can't avoid it, bigint multiplication is needed at one point * Add bigint multiplication discarding some low words * Implement the lattice decomposition in sage * Proper decomposition for BN254 * Prepare the code for a new scalar mul * We compile, and now debugging hunt * More helpers to debug GLV scalar Mul * Fix conditional negation * Endomorphism accelerated scalar mul working for BN254 curve * Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point) * fix nimble test script after bench rename	2020-06-14 15:39:06 +02:00
Mamy Ratsimbazafy	3d1b1fab98	Fix benchmark on ARM (#31 )	2020-06-04 22:09:30 +02:00
Mamy Ratsimbazafy	82ceca6e3b	Scalar mul tests (#28 ) * Add sage script for BN254 * Implement (failing) scalar multiplication tests * Add a first test against sagemath * Finish the tests against SAGE for BN254 * Add significant test coverage of scalar multiplication with reference checks for BN254_Snarks and BLS12_381	2020-06-04 20:37:29 +02:00
Mamy André-Ratsimbazafy	44350d08af	Add elliptic doubling in projective coordinates	2020-04-15 22:23:46 +02:00
Mamy André-Ratsimbazafy	7ae0f51000	benchmarking skips cycle counting for ARM	2020-04-15 21:24:18 +02:00
Mamy André-Ratsimbazafy	e0c1e0b1c8	Add EC bench on G1 + Add throughput to benches	2020-04-15 19:38:02 +02:00
Mamy André-Ratsimbazafy	aff44f4d8e	Implement constant-time `div2` on finite and extension fields	2020-04-15 02:12:45 +02:00
Mamy Ratsimbazafy	c04721a04e	Refactor: Higher-Kinded Tower of Extension Fields (#25 ) * Mention that the inverse of 0 is 0 (TODO tests) * Introduce "Higher-Kinded tower extensions" * rename isCOmplexExtension -> fromComplexExtension * update benchmarks with the new tower scheme * Try to recover some speed on mul/squaring for an optimal tower (but this was not it)	2020-04-14 02:05:42 +02:00
Mamy André-Ratsimbazafy	33314fe725	Properly distinguish between Nogami and Snark/Ethereum BN254 closes #19	2020-04-12 03:01:50 +02:00
Mamy André-Ratsimbazafy	a6e4517be2	Implement 𝔽p12 inversion, enable 𝔽p12 tests and bench	2020-04-09 14:28:01 +02:00
Mamy André-Ratsimbazafy	8b7374f405	Cleanup in Montgomery Mul, Square, Pow	2020-03-22 13:24:37 +01:00
Mamy André-Ratsimbazafy	c40bc1977d	Inverse in cubic extension field 𝔽p6 = 𝔽p2[∛(1 + 𝑖)]	2020-03-21 23:47:43 +01:00
Mamy André-Ratsimbazafy	ff4a54daba	Add multiplication in 𝔽p6 = 𝔽p2[∛(1+𝑖)]	2020-03-21 19:03:57 +01:00
Mamy André-Ratsimbazafy	1855d14497	Add more curves for testing: Curve25519, BLS12-377, BN446, FKM-447, BLS12-461, BN462	2020-03-21 13:05:58 +01:00
Mamy André-Ratsimbazafy	9e78cd5d6d	Benchmark template for 𝔽p, 𝔽p2, 𝔽p6	2020-03-21 02:31:31 +01:00
Mamy André-Ratsimbazafy	bde619155b	30% faster constant-time inversion	2020-03-20 23:03:52 +01:00
Mamy Ratsimbazafy	4ff0e3d90b	Internals refactor + renewed focus on perf (#17 ) * Lay out the refactoring objectives and tradeoffs * Refactor the 32 and 64-bit primitives [skip ci] * BigInts and Modular BigInts compile * Make the bigints test compile * Fix modular reduction * Fix reduction tests vs GMP * Implement montegomery mul, pow, inverse, WIP finite field compilation * Make FiniteField compile * Fix exponentiation compilation * Fix Montgomery magic constant computation for 2^64 words * Fix typo in non-optimized CIOS - passing finite fields IO tests * Add limbs comparisons [skip ci] * Fix on precomputation of the Montgomery magic constant * Passing all tests including 𝔽p2 * modular addition, the test for mersenne prime was wrong * update benches * Fix "nimble test" + typo on out-of-place field addition * bigint division, normalization is needed: https://travis-ci.com/github/mratsim/constantine/jobs/298359743 * missing conversion in subborrow non-x86 fallback - https://travis-ci.com/github/mratsim/constantine/jobs/298359744 * Fix little-endian serialization * Constantine32 flag to run 32-bit constantine on 64-bit machines * IO Field test, ensure that BaseType is used instead of uint64 when the prime can field in uint32 * Implement proper addcarry and subborrow fallback for the compile-time VM * Fix export issue when the logical wordbitwidth == physical wordbitwidth - passes all tests (32-bit and 64-bit) * Fix uint128 on ARM * Fix C++ conditional copy and ARM addcarry/subborrow * Add investigation for SIGFPE in Travis * Fix debug display for unsafeDiv2n1n * multiplexer typo * moveMem bug in glibc of Ubuntu 16.04? * Was probably missing an early clobbered register annotation on conditional mov * Note on Montgomery-friendly moduli * Strongly suspect a GCC before GCC 7 codegen bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87139) * hex conversion was (for debugging) not taking requested order into account + inlining comment * Use 32-bit limbs on ARM64, uint128 builtin __udivti4 bug? * Revert "Use 32-bit limbs on ARM64, uint128 builtin __udivti4 bug?" This reverts commit 087f9aa7fb40bbd058d05cbd8eec7fc082911f49. * Fix subborrow fallback for non-x86 (need to maks the borrow)	2020-03-16 16:33:51 +01:00
Mamy André-Ratsimbazafy	191bb7710c	Add a warmup to the Fp bench to deal with CPU scaling	2020-03-15 21:02:17 +01:00
Mamy André-Ratsimbazafy	b810422486	Add benchmark for Ethereum 1 and Ethereum 2 curves	2020-03-15 20:54:14 +01:00
Mamy André-Ratsimbazafy	dc0c1c181c	enable substraction benchmarks	2020-03-07 12:23:46 +01:00
Mamy André-Ratsimbazafy	472823b749	more comprehensive benchmark of Fp	2020-03-06 17:44:30 +01:00
Mamy André-Ratsimbazafy	1fdb1df80a	Add benchmark clock timers	2020-02-29 19:36:35 +01:00
Mamy André-Ratsimbazafy	ca817fcb69	Use Assembly cmov on x86	2020-02-29 18:27:20 +01:00
Mamy André-Ratsimbazafy	05bce529b4	1st experiment at accelerating montgomery multiplication (665 lines of specialized duplicated ASM code for some reason, monomorphization is probably better than that)	2020-02-28 22:46:20 +01:00
Mamy André-Ratsimbazafy	ddce056bb4	make bench compile	2020-02-25 03:07:42 +01:00
Mamy André-Ratsimbazafy	8cbbd40a0c	Add benchmark of constant-time vs unsafe powmod	2020-02-22 18:39:29 +01:00
Mamy André-Ratsimbazafy	10346d83a4	Benchmark: BigInt -> Montgomery conversion: - shlAddMod (with assembly division) is already 4x slower than Montgomery Multiplication based. - constant-time division will be even slower - use montgomery-multiplication based conversion	2020-02-16 01:43:17 +01:00

47 Commits