diff --git a/README.md b/README.md index 9a7c4fb..90efd2a 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,20 @@ This library provides constant-time implementation of elliptic curve cryptography. -> Warning ⚠️: The library is in development state and cannot be used at the moment -> except as a showcase or to start a discussion on modular big integers internals. +The implementation is accompanied with SAGE code used as reference implementation and test vectors generators before high speed implementation. + +> The library is in development state and high-level wrappers or example protocols are not available yet. + +## Target audience + +The library aims to be a portable, compact and hardened library for elliptic curve cryptography needs, in particular for blockchain protocols and zero-knowledge proofs system. + +The library focuses on following properties: +- constant-time (not leaking secret data via side-channels) +- performance +- generated code size, datatype size and stack usage + +in this order ## Installation @@ -31,17 +43,6 @@ This can be deactivated with `"-d:ConstantineASM=false"`: - at misssed opportunity on recent CPUs that support MULX/ADCX/ADOX instructions (~60% faster than Clang). - There is a 2.4x perf ratio between using plain GCC vs GCC with inline assembly. -## Target audience - -The library aims to be a portable, compact and hardened library for elliptic curve cryptography needs, in particular for blockchain protocols and zero-knowledge proofs system. - -The library focuses on following properties: -- constant-time (not leaking secret data via side-channels) -- performance -- generated code size, datatype size and stack usage - -in this order - ## Curves supported At the moment the following curves are supported, adding a new curve only requires adding the prime modulus @@ -49,11 +50,9 @@ and its bitsize in [constantine/config/curves.nim](constantine/config/curves_dec The following curves are configured: -> Note: At the moment, finite field arithmetic is fully supported -> but elliptic curve arithmetic is work-in-progress. - -### ECDH / ECDSA curves +### ECDH / ECDSA / EdDSA curves +WIP: - NIST P-224 - Curve25519 - NIST P-256 / Secp256r1 @@ -61,20 +60,22 @@ The following curves are configured: ### Pairing-Friendly curves +Supports: +- [x] Field arithmetics +- [x] Curve arithmetic +- [x] Pairing +- [ ] Multi-Pairing +- [ ] Hash-To-Curve + Families: -- BN: Barreto-Naerig +- BN: Barreto-Naehrig - BLS: Barreto-Lynn-Scott -- FKM: Fotiadis-Konstantinou-Martindale Curves: - BN254_Nogami - BN254_Snarks (Zero-Knowledge Proofs, Snarks, Starks, Zcash, Ethereum 1) - BLS12-377 (Zexe) - BLS12-381 (Algorand, Chia Networks, Dfinity, Ethereum 2, Filecoin, Zcash Sapling) -- BN446 -- FKM12-447 -- BLS12-461 -- BN462 ## Security @@ -141,73 +142,72 @@ The previous implementation was 15x slower and one of the key optimizations was changing the elliptic curve cryptography backend. It had a direct implication on hardware cost and/or cloud computing resources required. -## Measuring performance +### Measuring performance To measure the performance of Constantine ```bash git clone https://github.com/mratsim/constantine -nimble bench_fp # Using Assembly (+ GCC) -nimble bench_fp_clang # Using Clang only -nimble bench_fp_gcc # Using Clang only (very slow) +nimble bench_fp # Using default compiler + Assembly +nimble bench_fp_clang # Using Clang + Assembly (recommended) +nimble bench_fp_gcc # Using GCC + Assembly (very slow) +nimble bench_fp_clang_noasm # Using Clang only +nimble bench_fp_gcc # Using GCC only (slowest) nimble bench_fp2 # ... nimble bench_ec_g1 nimble bench_ec_g2 +nimble bench_pairing_bn254_nogami +nimble bench_pairing_bn254_snarks +nimble bench_pairing_bls12_377 +nimble bench_pairing_bls12_381 ``` +"Unsafe" lines uses a non-constant-time algorithm. + As mentioned in the [Compiler caveats](#compiler-caveats) section, GCC is up to 2x slower than Clang due to mishandling of carries and register usage. -On my machine, for selected benchmarks on the prime field for popular pairing-friendly curves. +On my machine i9-9980XE, for selected benchmarks with Clang + Assembly ``` -Compiled with GCC -Optimization level => - no optimization: false - release: true - danger: true - inline assembly: true -Using Constantine with 64-bit limbs -Running on Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz - -⚠️ Cycles measurements are approximate and use the CPU nominal clock: Turbo-Boost and overclocking will skew them. -i.e. a 20% overclock will be about 20% off (assuming no dynamic frequency scaling) - -================================================================================================================= - -------------------------------------------------------------------------------------------------------------------------------------------------- -Addition Fp[BN254_Snarks] 333333333.333 ops/s 3 ns/op 9 CPU cycles (approx) -Substraction Fp[BN254_Snarks] 500000000.000 ops/s 2 ns/op 8 CPU cycles (approx) -Negation Fp[BN254_Snarks] 1000000000.000 ops/s 1 ns/op 3 CPU cycles (approx) -Multiplication Fp[BN254_Snarks] 71428571.429 ops/s 14 ns/op 44 CPU cycles (approx) -Squaring Fp[BN254_Snarks] 71428571.429 ops/s 14 ns/op 44 CPU cycles (approx) -Inversion (constant-time Euclid) Fp[BN254_Snarks] 122579.063 ops/s 8158 ns/op 24474 CPU cycles (approx) -Inversion via exponentiation p-2 (Little Fermat) Fp[BN254_Snarks] 153822.489 ops/s 6501 ns/op 19504 CPU cycles (approx) -Square Root + square check (constant-time) Fp[BN254_Snarks] 153491.942 ops/s 6515 ns/op 19545 CPU cycles (approx) -Exp curve order (constant-time) - 254-bit Fp[BN254_Snarks] 104580.632 ops/s 9562 ns/op 28687 CPU cycles (approx) -Exp curve order (Leak exponent bits) - 254-bit Fp[BN254_Snarks] 153798.831 ops/s 6502 ns/op 19506 CPU cycles (approx) -------------------------------------------------------------------------------------------------------------------------------------------------- -Addition Fp[BLS12_381] 250000000.000 ops/s 4 ns/op 14 CPU cycles (approx) -Substraction Fp[BLS12_381] 250000000.000 ops/s 4 ns/op 13 CPU cycles (approx) -Negation Fp[BLS12_381] 1000000000.000 ops/s 1 ns/op 4 CPU cycles (approx) -Multiplication Fp[BLS12_381] 35714285.714 ops/s 28 ns/op 84 CPU cycles (approx) -Squaring Fp[BLS12_381] 35714285.714 ops/s 28 ns/op 85 CPU cycles (approx) -Inversion (constant-time Euclid) Fp[BLS12_381] 43763.676 ops/s 22850 ns/op 68552 CPU cycles (approx) -Inversion via exponentiation p-2 (Little Fermat) Fp[BLS12_381] 63983.620 ops/s 15629 ns/op 46889 CPU cycles (approx) -Square Root + square check (constant-time) Fp[BLS12_381] 63856.960 ops/s 15660 ns/op 46982 CPU cycles (approx) -Exp curve order (constant-time) - 255-bit Fp[BLS12_381] 68535.399 ops/s 14591 ns/op 43775 CPU cycles (approx) -Exp curve order (Leak exponent bits) - 255-bit Fp[BLS12_381] 93222.709 ops/s 10727 ns/op 32181 CPU cycles (approx) -------------------------------------------------------------------------------------------------------------------------------------------------- -Notes: - - Compilers: - Compilers are severely limited on multiprecision arithmetic. - Inline Assembly is used by default (nimble bench_fp). - Bench without assembly can use "nimble bench_fp_gcc" or "nimble bench_fp_clang". - GCC is significantly slower than Clang on multiprecision arithmetic due to catastrophic handling of carries. - - The simplest operations might be optimized away by the compiler. - - Fast Squaring and Fast Multiplication are possible if there are spare bits in the prime representation (i.e. the prime uses 254 bits out of 256 bits) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +Line double BLS12_381 649350.649 ops/s 1540 ns/op 4617 CPU cycles (approx) +Line add BLS12_381 482858.522 ops/s 2071 ns/op 6211 CPU cycles (approx) +Mul 𝔽p12 by line xy000z BLS12_381 543478.261 ops/s 1840 ns/op 5518 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +Final Exponentiation Easy BLS12_381 39411.973 ops/s 25373 ns/op 76119 CPU cycles (approx) +Final Exponentiation Hard BLS12 BLS12_381 2141.603 ops/s 466940 ns/op 1400833 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +Miller Loop BLS12 BLS12_381 2731.576 ops/s 366089 ns/op 1098278 CPU cycles (approx) +Final Exponentiation BLS12 BLS12_381 2033.045 ops/s 491873 ns/op 1475634 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +Pairing BLS12 BLS12_381 1131.391 ops/s 883868 ns/op 2651631 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` +``` +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +EC Add G1 ECP_SWei_Proj[Fp[BLS12_381]] 2118644.068 ops/s 472 ns/op 1416 CPU cycles (approx) +EC Mixed Addition G1 ECP_SWei_Proj[Fp[BLS12_381]] 2439024.390 ops/s 410 ns/op 1232 CPU cycles (approx) +EC Double G1 ECP_SWei_Proj[Fp[BLS12_381]] 3448275.862 ops/s 290 ns/op 871 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +EC ScalarMul G1 (unsafe reference DoubleAdd) ECP_SWei_Proj[Fp[BLS12_381]] 7147.094 ops/s 139917 ns/op 419756 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +EC ScalarMul Generic G1 (window = 2, scratchsize = 4) ECP_SWei_Proj[Fp[BLS12_381]] 5048.975 ops/s 198060 ns/op 594188 CPU cycles (approx) +EC ScalarMul Generic G1 (window = 3, scratchsize = 8) ECP_SWei_Proj[Fp[BLS12_381]] 7148.269 ops/s 139894 ns/op 419685 CPU cycles (approx) +EC ScalarMul Generic G1 (window = 4, scratchsize = 16) ECP_SWei_Proj[Fp[BLS12_381]] 8112.735 ops/s 123263 ns/op 369791 CPU cycles (approx) +EC ScalarMul Generic G1 (window = 5, scratchsize = 32) ECP_SWei_Proj[Fp[BLS12_381]] 8464.534 ops/s 118140 ns/op 354424 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +EC ScalarMul G1 (endomorphism accelerated) ECP_SWei_Proj[Fp[BLS12_381]] 9679.418 ops/s 103312 ns/op 309939 CPU cycles (approx) +EC ScalarMul Window-2 G1 (endomorphism accelerated) ECP_SWei_Proj[Fp[BLS12_381]] 13089.348 ops/s 76398 ns/op 229195 CPU cycles (approx) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +``` + + + + ### Compiler caveats Unfortunately compilers and in particular GCC are not very good at optimizing big integers and/or cryptographic code even when using intrinsics like `addcarry_u64`. diff --git a/benchmarks/bench_ec_g1.nim b/benchmarks/bench_ec_g1.nim index 6376d1a..6f090cc 100644 --- a/benchmarks/bench_ec_g1.nim +++ b/benchmarks/bench_ec_g1.nim @@ -37,10 +37,6 @@ const AvailableCurves = [ # Secp256k1, BLS12_377, BLS12_381, - # BN446, - # FKM12_447, - # BLS12_461, - # BN462 ] proc main() = diff --git a/benchmarks/bench_ec_g2.nim b/benchmarks/bench_ec_g2.nim index 3054107..dee5bd5 100644 --- a/benchmarks/bench_ec_g2.nim +++ b/benchmarks/bench_ec_g2.nim @@ -38,10 +38,6 @@ const AvailableCurves = [ # Secp256k1, BLS12_377, BLS12_381, - # BN446, - # FKM12_447, - # BLS12_461, - # BN462 ] proc main() = diff --git a/benchmarks/bench_fp.nim b/benchmarks/bench_fp.nim index 2da3501..08a1dd8 100644 --- a/benchmarks/bench_fp.nim +++ b/benchmarks/bench_fp.nim @@ -35,10 +35,6 @@ const AvailableCurves = [ # Secp256k1, BLS12_377, BLS12_381, - # BN446, - # FKM12_447, - # BLS12_461, - # BN462 ] proc main() = diff --git a/benchmarks/bench_fp12.nim b/benchmarks/bench_fp12.nim index 0d0b7da..2a35eed 100644 --- a/benchmarks/bench_fp12.nim +++ b/benchmarks/bench_fp12.nim @@ -31,10 +31,6 @@ const AvailableCurves = [ BN254_Snarks, BLS12_377, BLS12_381 - # BN446, - # FKM12_447, - # BLS12_461, - # BN462 ] proc main() = diff --git a/benchmarks/bench_fp2.nim b/benchmarks/bench_fp2.nim index cce0550..a128bd2 100644 --- a/benchmarks/bench_fp2.nim +++ b/benchmarks/bench_fp2.nim @@ -31,10 +31,6 @@ const AvailableCurves = [ BN254_Snarks, BLS12_377, BLS12_381 - # BN446, - # FKM12_447, - # BLS12_461, - # BN462 ] proc main() = diff --git a/benchmarks/bench_fp6.nim b/benchmarks/bench_fp6.nim index 715c90e..6693860 100644 --- a/benchmarks/bench_fp6.nim +++ b/benchmarks/bench_fp6.nim @@ -30,11 +30,7 @@ const AvailableCurves = [ BN254_Nogami, BN254_Snarks, BLS12_377, - BLS12_381 - # BN446, - # FKM12_447, - # BLS12_461, - # BN462 + BLS12_381, ] proc main() = diff --git a/constantine/config/curves_declaration.nim b/constantine/config/curves_declaration.nim index 3e860d2..b54539f 100644 --- a/constantine/config/curves_declaration.nim +++ b/constantine/config/curves_declaration.nim @@ -175,53 +175,3 @@ declareCurves: sexticTwist: M_Twist sexticNonResidue_fp2: (1, 1) # 1+𝑖 - - curve BN446: - bitwidth: 446 - modulus: "0x2400000000000000002400000002d00000000d800000021c0000001800000000870000000b0400000057c00000015c000000132000000067" - family: BarretoNaehrig - # u = 2^110 + 2^36 + 1 - curve FKM12_447: # Fotiadis-Konstantinou-Martindale - bitwidth: 447 - modulus: "0x4ce300001338c00001c08180000f20cfffffe5a8bffffd08a000000f228000007e8ffffffaddfffffffdc00000009efffffffca000000007" - # TNFS Resistant Families of Pairing-Friendly Elliptic Curves - # Georgios Fotiadis and Elisavet Konstantinou, 2018 - # https://eprint.iacr.org/2018/1017 - # - # Family 17 choice b of - # Optimal TNFS-secure pairings on elliptic curves with composite embedding degree - # Georgios Fotiadis and Chloe Martindale, 2019 - # https://eprint.iacr.org/2019/555 - # - # A short-list of pairing-friendly curves resistant toSpecial TNFS at the 128-bit security level - # Aurore Guillevic - # https://hal.inria.fr/hal-02396352v2/document - # - # p(x) = 1728x^6 + 2160x^5 + 1548x^4 + 756x^3 + 240x^2 + 54x + 7 - # t(x) = −6x² + 1, r(x) = 36x^4 + 36x^3 + 18x^2 + 6x + 1. - # Choice (b):u=−2^72 − 2^71 − 2^36 - # - # Note the paper mentions 446-bit but it's 447 - curve BLS12_461: - # Updating Key Size Estimations for Pairings - # Barbulescu, R. and S. Duquesne, 2018 - # https://hal.archives-ouvertes.fr/hal-01534101/file/main.pdf - bitwidth: 461 - modulus: "0x15555545554d5a555a55d69414935fbd6f1e32d8bacca47b14848b42a8dffa5c1cc00f26aa91557f00400020000555554aaaaaac0000aaaaaaab" - # u = −2^77 + 2^50 + 2^33 - # p = (u - 1)^2 (u^4 - u^2 + 1)/3 + u - - # Note there is another BLS12-461 proposed here: - # https://tools.ietf.org/id/draft-yonezawa-pairing-friendly-curves-00.html#rfc.section.4.2 - curve BN462: - # Pairing-Friendly Curves - # IETF Draft - # https://tools.ietf.org/id/draft-irtf-cfrg-pairing-friendly-curves-02.html - - # Updating Key Size Estimations for Pairings - # Barbulescu, R. and S. Duquesne, 2018 - # https://hal.archives-ouvertes.fr/hal-01534101/file/main.pdf - bitwidth: 462 - modulus: "0x240480360120023ffffffffff6ff0cf6b7d9bfca0000000000d812908f41c8020ffffffffff6ff66fc6ff687f640000000002401b00840138013" - family: BarretoNaehrig - # u = 2^114 + 2^101 - 2^14 - 1 diff --git a/tests/t_bigints_mod_vs_gmp.nim b/tests/t_bigints_mod_vs_gmp.nim index 6a9c654..241df7c 100644 --- a/tests/t_bigints_mod_vs_gmp.nim +++ b/tests/t_bigints_mod_vs_gmp.nim @@ -33,10 +33,15 @@ const CryptoModSizes = [ # Barreto-Naehrig 254, # BN254 # Barreto-Lynn-Scott + 377, # BLS12-377 381, # BLS12-381 - 383, # BLS12-383 - 461, # BLS12-461 - 480, # BLS24-480 + # Brezing-Weng + 761, # BW6-761 + # Cocks-Pinch + 782, # CP6-782 + # Miyaji-Nakabayashi-Takano + 298, # MNT4-298, MNT6-298 + 753, # MNT4-753, MNT6-753 # NIST recommended curves for US Federal Government (FIPS) # https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf 192, diff --git a/tests/t_finite_fields_powinv.nim b/tests/t_finite_fields_powinv.nim index 2608b91..9271c78 100644 --- a/tests/t_finite_fields_powinv.nim +++ b/tests/t_finite_fields_powinv.nim @@ -198,10 +198,6 @@ proc main() = testRandomDiv2 Secp256k1 testRandomDiv2 BLS12_377 testRandomDiv2 BLS12_381 - testRandomDiv2 BN446 - testRandomDiv2 FKM12_447 - testRandomDiv2 BLS12_461 - testRandomDiv2 BN462 suite "Modular inversion over prime fields" & " [" & $WordBitwidth & "-bit mode]": test "Specific tests on Fp[BLS12_381]": @@ -289,10 +285,6 @@ proc main() = testRandomInv Secp256k1 testRandomInv BLS12_377 testRandomInv BLS12_381 - testRandomInv BN446 - testRandomInv FKM12_447 - testRandomInv BLS12_461 - testRandomInv BN462 main() diff --git a/tests/t_finite_fields_sqrt.nim b/tests/t_finite_fields_sqrt.nim index 9387f3b..93436e3 100644 --- a/tests/t_finite_fields_sqrt.nim +++ b/tests/t_finite_fields_sqrt.nim @@ -129,17 +129,13 @@ proc main() = randomSqrtCheck Secp256k1 randomSqrtCheck BLS12_377 # p ≢ 3 (mod 4) randomSqrtCheck BLS12_381 - randomSqrtCheck BN446 - randomSqrtCheck FKM12_447 - randomSqrtCheck BLS12_461 - randomSqrtCheck BN462 suite "Modular square root - 32-bit bugs highlighted by property-based testing " & " [" & $WordBitwidth & "-bit mode]": - test "FKM12_447 - #30": - var a: Fp[FKM12_447] - a.fromHex"0x406e5e74ee09c84fa0c59f2db3ac814a4937e2f57ecd3c0af4265e04598d643c5b772a6549a2d9b825445c34b8ba100fe8d912e61cfda43d" - a.square() - check: bool a.isSquare() + # test "FKM12_447 - #30": - Deactivated, we don't support the curve as no one uses it. + # var a: Fp[FKM12_447] + # a.fromHex"0x406e5e74ee09c84fa0c59f2db3ac814a4937e2f57ecd3c0af4265e04598d643c5b772a6549a2d9b825445c34b8ba100fe8d912e61cfda43d" + # a.square() + # check: bool a.isSquare() test "Fused modular square root on 32-bit - inconsistent with isSquare - #42": var a: Fp[BLS12_381] diff --git a/tests/t_fp12_frobenius.nim b/tests/t_fp12_frobenius.nim index c858bf1..b0d6126 100644 --- a/tests/t_fp12_frobenius.nim +++ b/tests/t_fp12_frobenius.nim @@ -18,10 +18,6 @@ const TestCurves = [ BN254_Snarks, BLS12_377, BLS12_381, - # BN446 - # FKM12_447 - # BLS12_461 - # BN462 ] runFrobeniusTowerTests( diff --git a/tests/t_fp2.nim b/tests/t_fp2.nim index 6f97a87..6d942cb 100644 --- a/tests/t_fp2.nim +++ b/tests/t_fp2.nim @@ -14,14 +14,10 @@ import ./t_fp_tower_template const TestCurves = [ - # BN254_Nogami + BN254_Nogami, BN254_Snarks, BLS12_377, BLS12_381, - # BN446 - # FKM12_447 - # BLS12_461 - # BN462 ] runTowerTests( diff --git a/tests/t_fp2_frobenius.nim b/tests/t_fp2_frobenius.nim index 2123959..8b5847e 100644 --- a/tests/t_fp2_frobenius.nim +++ b/tests/t_fp2_frobenius.nim @@ -18,10 +18,6 @@ const TestCurves = [ BN254_Snarks, BLS12_377, BLS12_381, - # BN446 - # FKM12_447 - # BLS12_461 - # BN462 ] runFrobeniusTowerTests( diff --git a/tests/t_fp4_frobenius.nim b/tests/t_fp4_frobenius.nim index c07efbd..fd505ab 100644 --- a/tests/t_fp4_frobenius.nim +++ b/tests/t_fp4_frobenius.nim @@ -16,12 +16,8 @@ import const TestCurves = [ BN254_Nogami, BN254_Snarks, - # BLS12_377, + BLS12_377, BLS12_381, - # BN446 - # FKM12_447 - # BLS12_461 - # BN462 ] runFrobeniusTowerTests( diff --git a/tests/t_fp6_frobenius.nim b/tests/t_fp6_frobenius.nim index 0c54354..99b0e45 100644 --- a/tests/t_fp6_frobenius.nim +++ b/tests/t_fp6_frobenius.nim @@ -16,12 +16,8 @@ import const TestCurves = [ BN254_Nogami, BN254_Snarks, - # BLS12_377, + BLS12_377, BLS12_381, - # BN446 - # FKM12_447 - # BLS12_461 - # BN462 ] runFrobeniusTowerTests(