From a02dd19d36c5253175a4bc9d4243ec95e2d04fe6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mamy=20Andr=C3=A9-Ratsimbazafy?= Date: Sat, 23 Jan 2021 15:46:41 +0100 Subject: [PATCH] Compendium of pairing-based cryptography optimizations --- docs/optimizations.md | 206 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 206 insertions(+) create mode 100644 docs/optimizations.md diff --git a/docs/optimizations.md b/docs/optimizations.md new file mode 100644 index 0000000..eeba714 --- /dev/null +++ b/docs/optimizations.md @@ -0,0 +1,206 @@ +# Optimizations + +This document lists the optimizations relevant to an elliptic curve or pairing-based cryptography library and whether Constantine has them implemented. + +The optimizations can be of algebraic, algorithmic or "implementation details" nature. Using non-constant time code is always possible, it is listed if the speedup is significant. + +## Big Integers + +- Conditional copy + - [x] Loop unrolling + - [x] x86: Conditional Mov + - [x] x86: Full Assembly implementation + - [ ] SIMD instructions +- Add/Sub + - [x] int128 + - [x] add-with-carry, sub-with-borrow intrinsics + - [x] loop unrolling + - [x] x86: Full Assembly implementation +- Multiplication + - [x] int128 + - [x] loop unrolling + - [x] Comba multiplication / product Scanning + - [ ] Karatsuba + - [ ] Karatsuba + Comba + - [x] x86: Full Assembly implementation + - [x] x86: MULX, ADCX, ADOX instructions + - [x] Fused Multiply + Shift-right by word (for Barrett Reduction and approximating multiplication by fractional constant) +- Squaring + - [ ] Dedicated squaring functions + - [ ] int128 + - [ ] loop unrolling + - [ ] x86: Full Assembly implementation + - [ ] x86: MULX, ADCX, ADOX instructions + +## Finite Fields & Modular Arithmetic + +- Representation + - [x] Montgomery Representation + - [ ] Barret Reduction + - [ ] Unsaturated Representation + - [ ] Mersenne Prime (2^k - 1), + - [ ] Generalized Mersenne Prime (NIST Prime P256: 2^256 - 2^224 + 2^192 + 2^96 - 1) + - [ ] Pseudo-Mersenne Prime (2^m - k for example Curve25519: 2^255 - 19) + - [ ] Golden Primes (φ^2 - φ - 1 with φ = 2^k for example Ed448-Goldilocks: 2^448 - 2^224 - 1) + - [ ] any prime modulus (lazy carry) + +- Montgomery Reduction + - [x] int128 + - [x] loop unrolling + - [x] x86: Full Assembly implementation + - [x] x86: MULX, ADCX, ADOX instructions + +- Addition/substraction + - [x] int128 + - [x] add-with-carry, sub-with-borrow intrinsics + - [x] loop unrolling + - [x] x86: Full Assembly implementation + - [x] Addition-chain for small constants + +- Montgomery Multiplication + - [x] Fused multiply + reduce + - [x] int128 + - [x] loop unrolling + - [x] x86: Full Assembly implementation + - [x] x86: MULX, ADCX, ADOX instructions + - [x] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning) + - [x] FIPS (Finely Integrated Operand Scanning) + +- Montgomery Squaring + - [ ] Dedicated squaring functions + - [ ] Fused multiply + reduce + - [ ] int128 + - [ ] loop unrolling + - [ ] x86: Full Assembly implementation + - [ ] x86: MULX, ADCX, ADOX instructions + - [ ] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning) + +- Exponentiation + - [x] variable-time exponentiation + - [x] fixed window optimization _(sliding windows are not constant-time)_ + - [ ] NAF recoding + - [ ] windowed-NAF recoding + - [ ] SIMD vectorized select in window algorithm + - [ ] Almost Montgomery Multiplication, https://eprint.iacr.org/2011/239.pdf + - [ ] Pippenger multi-exponentiation (variable-time) + - [ ] parallelized Pippenger + +- Inversion (constant-time baseline, Little-Fermat inversion via a^(p-2)) + - [x] Constant-time binary GCD algorithm by Möller, algorithm 5 in https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf + - [x] Addition-chain for a^(p-2) + - [ ] Constant-time binary GCD algorithm by Bernstein-Young, https://eprint.iacr.org/2019/266 + - [ ] Constant-time binary GCD algorithm by Pornin, https://eprint.iacr.org/2020/972 + - [ ] Simultaneous inversion + +- Square Root (constant-time) + - [x] baseline sqrt via Little-Fermat for `p ≡ 3 (mod 4)` + - [ ] baseline sqrt via Little-Fermat for `p ≡ 5 (mod 8)` + - [ ] baseline sqrt via Little-Fermat for `p ≡ 9 (mod 16)` + - [x] baseline sqrt via Tonelli-Shanks for any prime. + - [x] sqrt via addition-chain + - [x] Fused sqrt + testIfSquare (Euler Criterion or Legendre symbol or Kronecker symbol) + - [x] Fused sqrt + 1/sqrt + - [x] Fused sqrt + 1/sqrt + testIfSquare + +## Extension Fields + +- [ ] Lazy reduction via double-width base fields +- [x] Sparse multiplication +- Fp2 + - [x] complex multiplication + - [x] complex squaring + - [x] sqrt via the constant-time complex method (Adj et al) + - [ ] sqrt using addition chain + - [x] fused complex method sqrt by rotating in complex plane +- Cubic extension fields + - [x] Toom-Cook polynomial multiplication (Chung-Hasan) + +## Elliptic curve + +- Weierstrass curves: + - [x] Affine coordinates + - [x] Homogeneous projective coordinates + - [x] Projective complete formulae + - [x] Mixed addition + - [x] Jacobian projective coordinates + - [x] Jacobian complete formulae + - [x] Mixed addition + - [ ] Conjugate Mixed Addition + - [ ] Composites Double-Add 2P+Q, tripling, quadrupling, quintupling, octupling + +- [x] scalar multiplication + - [x] fixed window optimization + - [ ] constant-time NAF recoding + - [ ] constant-time windowed-NAF recoding + - [ ] SIMD vectorized select in window algorithm + - [x] constant-time endomorphism acceleration + - [ ] using NAF recoding + - [x] using GLV-SAC recoding + - [x] constant-time windowed-endomorphism acceleration + - [ ] using wNAF recoding + - [x] using windowed GLV-SAC recoding + - [ ] SIMD vectorized select in window algorithm + - [ ] Fixed-base scalar mul + +- [ ] Multi-scalar-mul + - [ ] Strauss multi-scalar-mul + - [ ] Bos-Coster multi-scalar-mul + - [ ] Pippenger multi-scalar-mul (variable-time) + - [ ] parallelized Pippenger + +## Pairings + +- Frobenius maps + - [x] Sparse Frobenius coefficients + - [x] Coalesced Frobenius in towered Fields + - [x] Coalesced Frobenius powers + +- Line functions + - [x] Homogeneous projective coordinates + - [x] D-Twist + - [x] M-Twist + - [x] Fused line add + elliptic curve add + - [x] Fused line double + elliptic curve double + - [ ] Jacobian projective coordinates + - [ ] D-Twist + - [ ] M-Twist + - [ ] Fused line add + elliptic curve add + - [ ] Fused line double + elliptic curve double + - [x] Sparse multiplication line * Gₜ element + - [x] 6-way sparse + - [ ] Pseudo 8-sparse + - [x] D-Twist + - [x] M-Twist + +- Miller Loop + - [x] NAF recoding + - [ ] Quadruple-and-add and Octuple-and-add + - [ ] addition chain + +- Final exponentiation + - [x] Cyclotomic squaring + - [ ] Karabina's compressed cyclotomic squarings + - [x] Addition-chain for exponentiation by curve parameter + - [x] BN curves: Fuentes-Castañeda + - [ ] BN curves: Duquesne, Ghammam + - [ ] BLS curves: Ghamman, Fouotsa + - [x] BLS curves: Hayashida, Hayasaka, Teruya + +- [ ] Multi-pairing + - [ ] Line accumulation + - [ ] Parallel Multi-Pairing + +## Hash-to-curve + +- Clear cofactor + - [x] BLS G1: Wahby-Boneh + - [ ] BLS G2: Scott et al + - [ ] BLS G2: Fuentes-Castañeda + - [x] BLS G2: Budroni et al, endomorphism accelerated + - [ ] BN G2 + - [ ] BW6-761 G1 + - [ ] BW6-761 G2 + +- Subgroup check + - [ ] BLS G1: Bowe, endomorphism accelerated + - [ ] BLS G2: Bowe, endomorphism accelerated