constantine/docs/optimizations.md
Mamy Ratsimbazafy 9ac9862401
Optimize Miller Loop and prepare Multi-pairing (#159)
* Pairing with affine: align API to BLST and Gurvy and common use-case.

* Implement multi-pairing / aggregate verif for BLS12-381 (+2% pairing perf)

* Generalize the optimized miller loop for single pairing

* Immplement the miller loop addchain for BLS12-377

* Miller addition chain for BN254-Nogami

* no Miller adchain for BN254-Snarks

* Update the line test with new tower https://github.com/mratsim/constantine/pull/153

* Somewhat sparse for Fp2 M-Twist

* Implement line by line multiplication for Fp12 D-Twist

* Somewhat sparse Mul for Fp12 D-Twist

* Finish the sparse and somewhat sparse multiplications
2021-02-14 13:06:57 +01:00

7.2 KiB

Optimizations

This document lists the optimizations relevant to an elliptic curve or pairing-based cryptography library and whether Constantine has them implemented.

The optimizations can be of algebraic, algorithmic or "implementation details" nature. Using non-constant time code is always possible, it is listed if the speedup is significant.

Big Integers

  • Conditional copy
    • Loop unrolling
    • x86: Conditional Mov
    • x86: Full Assembly implementation
    • SIMD instructions
  • Add/Sub
    • int128
    • add-with-carry, sub-with-borrow intrinsics
    • loop unrolling
    • x86: Full Assembly implementation
  • Multiplication
    • int128
    • loop unrolling
    • Comba multiplication / product Scanning
    • Karatsuba
    • Karatsuba + Comba
    • x86: Full Assembly implementation
    • x86: MULX, ADCX, ADOX instructions
    • Fused Multiply + Shift-right by word (for Barrett Reduction and approximating multiplication by fractional constant)
  • Squaring
    • Dedicated squaring functions
    • int128
    • loop unrolling
    • x86: Full Assembly implementation
    • x86: MULX, ADCX, ADOX instructions

Finite Fields & Modular Arithmetic

  • Representation

    • Montgomery Representation
    • Barret Reduction
    • Unsaturated Representation
      • Mersenne Prime (2^k - 1),
      • Generalized Mersenne Prime (NIST Prime P256: 2^256 - 2^224 + 2^192 + 2^96 - 1)
      • Pseudo-Mersenne Prime (2^m - k for example Curve25519: 2^255 - 19)
      • Golden Primes (φ^2 - φ - 1 with φ = 2^k for example Ed448-Goldilocks: 2^448 - 2^224 - 1)
      • any prime modulus (lazy carry)
  • Montgomery Reduction

    • int128
    • loop unrolling
    • x86: Full Assembly implementation
    • x86: MULX, ADCX, ADOX instructions
  • Addition/substraction

    • int128
    • add-with-carry, sub-with-borrow intrinsics
    • loop unrolling
    • x86: Full Assembly implementation
    • Addition-chain for small constants
  • Montgomery Multiplication

    • Fused multiply + reduce
    • int128
    • loop unrolling
    • x86: Full Assembly implementation
    • x86: MULX, ADCX, ADOX instructions
    • no-carry optimization for CIOS (Coarsely Integrated Operand Scanning)
    • FIPS (Finely Integrated Operand Scanning)
  • Montgomery Squaring

    • Dedicated squaring functions
    • Fused multiply + reduce
    • int128
    • loop unrolling
    • x86: Full Assembly implementation
    • x86: MULX, ADCX, ADOX instructions
    • no-carry optimization for CIOS (Coarsely Integrated Operand Scanning)
  • Addition chains

    • unreduced squarings/multiplications in addition chains
  • Exponentiation

    • variable-time exponentiation
    • fixed window optimization (sliding windows are not constant-time)
    • NAF recoding
    • windowed-NAF recoding
    • SIMD vectorized select in window algorithm
    • Almost Montgomery Multiplication, https://eprint.iacr.org/2011/239.pdf
    • Pippenger multi-exponentiation (variable-time)
      • parallelized Pippenger
  • Inversion (constant-time baseline, Little-Fermat inversion via a^(p-2))

  • Square Root (constant-time)

    • baseline sqrt via Little-Fermat for p ≡ 3 (mod 4)
    • baseline sqrt via Little-Fermat for p ≡ 5 (mod 8)
    • baseline sqrt via Little-Fermat for p ≡ 9 (mod 16)
    • baseline sqrt via Tonelli-Shanks for any prime.
    • sqrt via addition-chain
    • Fused sqrt + testIfSquare (Euler Criterion or Legendre symbol or Kronecker symbol)
    • Fused sqrt + 1/sqrt
    • Fused sqrt + 1/sqrt + testIfSquare

Extension Fields

  • Lazy reduction via double-precision base fields
  • Sparse multiplication
  • Fp2
    • complex multiplication
    • complex squaring
    • sqrt via the constant-time complex method (Adj et al)
    • sqrt using addition chain
    • fused complex method sqrt by rotating in complex plane
  • Cubic extension fields
    • Toom-Cook polynomial multiplication (Chung-Hasan)

Elliptic curve

  • Weierstrass curves:

    • Affine coordinates
    • Homogeneous projective coordinates
      • Projective complete formulae
      • Mixed addition
    • Jacobian projective coordinates
      • Jacobian complete formulae
      • Mixed addition
      • Conjugate Mixed Addition
      • Composites Double-Add 2P+Q, tripling, quadrupling, quintupling, octupling
  • scalar multiplication

    • fixed window optimization
    • constant-time NAF recoding
    • constant-time windowed-NAF recoding
      • SIMD vectorized select in window algorithm
    • constant-time endomorphism acceleration
      • using NAF recoding
      • using GLV-SAC recoding
    • constant-time windowed-endomorphism acceleration
      • using wNAF recoding
      • using windowed GLV-SAC recoding
      • SIMD vectorized select in window algorithm
    • Fixed-base scalar mul
  • Multi-scalar-mul

    • Strauss multi-scalar-mul
    • Bos-Coster multi-scalar-mul
    • Pippenger multi-scalar-mul (variable-time)
      • parallelized Pippenger

Pairings

  • Frobenius maps

    • Sparse Frobenius coefficients
    • Coalesced Frobenius in towered Fields
    • Coalesced Frobenius powers
  • Line functions

    • Homogeneous projective coordinates
      • D-Twist
        • Fused line add + elliptic curve add
        • Fused line double + elliptic curve double
      • M-Twist
        • Fused line add + elliptic curve add
        • Fused line double + elliptic curve double
      • 6-way sparse multiplication line * Gₜ element
    • Jacobian projective coordinates
      • D-Twist
        • Fused line add + elliptic curve add
        • Fused line double + elliptic curve double
      • M-Twist
        • Fused line add + elliptic curve add
        • Fused line double + elliptic curve double
      • 6-way sparse multiplication line * Gₜ element
    • Affine coordinates
      • 7-way sparse multiplication line * Gₜ element
      • Pseudo-8 sparse multiplication line * Gₜ element
  • Miller Loop

    • NAF recoding
    • Quadruple-and-add and Octuple-and-add
    • addition chains
  • Final exponentiation

    • Cyclotomic squaring
      • Karabina's compressed cyclotomic squarings
    • Addition-chain for exponentiation by curve parameter
    • BN curves: Fuentes-Castañeda
    • BN curves: Duquesne, Ghammam
    • BLS curves: Ghamman, Fouotsa
    • BLS curves: Hayashida, Hayasaka, Teruya
  • Multi-pairing

    • Line accumulation
    • Parallel Multi-Pairing

Hash-to-curve

  • Clear cofactor

    • BLS G1: Wahby-Boneh
    • BLS G2: Scott et al
    • BLS G2: Fuentes-Castañeda
    • BLS G2: Budroni et al, endomorphism accelerated
    • BN G2
    • BW6-761 G1
    • BW6-761 G2
  • Subgroup check

    • BLS G1: Bowe, endomorphism accelerated
    • BLS G2: Bowe, endomorphism accelerated