# Optimizations This document lists the optimizations relevant to an elliptic curve or pairing-based cryptography library and whether Constantine has them implemented. The optimizations can be of algebraic, algorithmic or "implementation details" nature. Using non-constant time code is always possible, it is listed if the speedup is significant. ## Big Integers - Conditional copy - [x] Loop unrolling - [x] x86: Conditional Mov - [x] x86: Full Assembly implementation - [ ] SIMD instructions - Add/Sub - [x] int128 - [x] add-with-carry, sub-with-borrow intrinsics - [x] loop unrolling - [x] x86: Full Assembly implementation - Multiplication - [x] int128 - [x] loop unrolling - [x] Comba multiplication / product Scanning - [ ] Karatsuba - [ ] Karatsuba + Comba - [x] x86: Full Assembly implementation - [x] x86: MULX, ADCX, ADOX instructions - [x] Fused Multiply + Shift-right by word (for Barrett Reduction and approximating multiplication by fractional constant) - Squaring - [ ] Dedicated squaring functions - [ ] int128 - [ ] loop unrolling - [ ] x86: Full Assembly implementation - [ ] x86: MULX, ADCX, ADOX instructions ## Finite Fields & Modular Arithmetic - Representation - [x] Montgomery Representation - [ ] Barret Reduction - [ ] Unsaturated Representation - [ ] Mersenne Prime (2^k - 1), - [ ] Generalized Mersenne Prime (NIST Prime P256: 2^256 - 2^224 + 2^192 + 2^96 - 1) - [ ] Pseudo-Mersenne Prime (2^m - k for example Curve25519: 2^255 - 19) - [ ] Golden Primes (φ^2 - φ - 1 with φ = 2^k for example Ed448-Goldilocks: 2^448 - 2^224 - 1) - [ ] any prime modulus (lazy carry) - Montgomery Reduction - [x] int128 - [x] loop unrolling - [x] x86: Full Assembly implementation - [x] x86: MULX, ADCX, ADOX instructions - Addition/substraction - [x] int128 - [x] add-with-carry, sub-with-borrow intrinsics - [x] loop unrolling - [x] x86: Full Assembly implementation - [x] Addition-chain for small constants - Montgomery Multiplication - [x] Fused multiply + reduce - [x] int128 - [x] loop unrolling - [x] x86: Full Assembly implementation - [x] x86: MULX, ADCX, ADOX instructions - [x] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning) - [x] FIPS (Finely Integrated Operand Scanning) - Montgomery Squaring - [ ] Dedicated squaring functions - [ ] Fused multiply + reduce - [ ] int128 - [ ] loop unrolling - [ ] x86: Full Assembly implementation - [ ] x86: MULX, ADCX, ADOX instructions - [ ] no-carry optimization for CIOS (Coarsely Integrated Operand Scanning) - Exponentiation - [x] variable-time exponentiation - [x] fixed window optimization _(sliding windows are not constant-time)_ - [ ] NAF recoding - [ ] windowed-NAF recoding - [ ] SIMD vectorized select in window algorithm - [ ] Almost Montgomery Multiplication, https://eprint.iacr.org/2011/239.pdf - [ ] Pippenger multi-exponentiation (variable-time) - [ ] parallelized Pippenger - Inversion (constant-time baseline, Little-Fermat inversion via a^(p-2)) - [x] Constant-time binary GCD algorithm by Möller, algorithm 5 in https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf - [x] Addition-chain for a^(p-2) - [ ] Constant-time binary GCD algorithm by Bernstein-Young, https://eprint.iacr.org/2019/266 - [ ] Constant-time binary GCD algorithm by Pornin, https://eprint.iacr.org/2020/972 - [ ] Simultaneous inversion - Square Root (constant-time) - [x] baseline sqrt via Little-Fermat for `p ≡ 3 (mod 4)` - [ ] baseline sqrt via Little-Fermat for `p ≡ 5 (mod 8)` - [ ] baseline sqrt via Little-Fermat for `p ≡ 9 (mod 16)` - [x] baseline sqrt via Tonelli-Shanks for any prime. - [x] sqrt via addition-chain - [x] Fused sqrt + testIfSquare (Euler Criterion or Legendre symbol or Kronecker symbol) - [x] Fused sqrt + 1/sqrt - [x] Fused sqrt + 1/sqrt + testIfSquare ## Extension Fields - [ ] Lazy reduction via double-width base fields - [x] Sparse multiplication - Fp2 - [x] complex multiplication - [x] complex squaring - [x] sqrt via the constant-time complex method (Adj et al) - [ ] sqrt using addition chain - [x] fused complex method sqrt by rotating in complex plane - Cubic extension fields - [x] Toom-Cook polynomial multiplication (Chung-Hasan) ## Elliptic curve - Weierstrass curves: - [x] Affine coordinates - [x] Homogeneous projective coordinates - [x] Projective complete formulae - [x] Mixed addition - [x] Jacobian projective coordinates - [x] Jacobian complete formulae - [x] Mixed addition - [ ] Conjugate Mixed Addition - [ ] Composites Double-Add 2P+Q, tripling, quadrupling, quintupling, octupling - [x] scalar multiplication - [x] fixed window optimization - [ ] constant-time NAF recoding - [ ] constant-time windowed-NAF recoding - [ ] SIMD vectorized select in window algorithm - [x] constant-time endomorphism acceleration - [ ] using NAF recoding - [x] using GLV-SAC recoding - [x] constant-time windowed-endomorphism acceleration - [ ] using wNAF recoding - [x] using windowed GLV-SAC recoding - [ ] SIMD vectorized select in window algorithm - [ ] Fixed-base scalar mul - [ ] Multi-scalar-mul - [ ] Strauss multi-scalar-mul - [ ] Bos-Coster multi-scalar-mul - [ ] Pippenger multi-scalar-mul (variable-time) - [ ] parallelized Pippenger ## Pairings - Frobenius maps - [x] Sparse Frobenius coefficients - [x] Coalesced Frobenius in towered Fields - [x] Coalesced Frobenius powers - Line functions - [x] Homogeneous projective coordinates - [x] D-Twist - [x] M-Twist - [x] Fused line add + elliptic curve add - [x] Fused line double + elliptic curve double - [ ] Jacobian projective coordinates - [ ] D-Twist - [ ] M-Twist - [ ] Fused line add + elliptic curve add - [ ] Fused line double + elliptic curve double - [x] Sparse multiplication line * Gₜ element - [x] 6-way sparse - [ ] Pseudo 8-sparse - [x] D-Twist - [x] M-Twist - Miller Loop - [x] NAF recoding - [ ] Quadruple-and-add and Octuple-and-add - [ ] addition chain - Final exponentiation - [x] Cyclotomic squaring - [ ] Karabina's compressed cyclotomic squarings - [x] Addition-chain for exponentiation by curve parameter - [x] BN curves: Fuentes-Castañeda - [ ] BN curves: Duquesne, Ghammam - [ ] BLS curves: Ghamman, Fouotsa - [x] BLS curves: Hayashida, Hayasaka, Teruya - [ ] Multi-pairing - [ ] Line accumulation - [ ] Parallel Multi-Pairing ## Hash-to-curve - Clear cofactor - [x] BLS G1: Wahby-Boneh - [ ] BLS G2: Scott et al - [ ] BLS G2: Fuentes-Castañeda - [x] BLS G2: Budroni et al, endomorphism accelerated - [ ] BN G2 - [ ] BW6-761 G1 - [ ] BW6-761 G2 - Subgroup check - [ ] BLS G1: Bowe, endomorphism accelerated - [ ] BLS G2: Bowe, endomorphism accelerated