mirror of
https://github.com/codex-storage/constantine.git
synced 2025-01-15 21:44:15 +00:00
1f4bb174a3
* Add PoC of JIT exec on Nvidia GPUs [skip ci] * Split GPU bindings into low-level (ABI) and high-level [skip ci] * small typedef reorg [skip ci] * refine LLVM IR/Nvidia GPU hello worlds * [Nvidia GPU] PoC implementation of field addition [skip ci] * prod-ready field addition + tests on Nvidia GPUs via LLVM codegen
7.7 KiB
7.7 KiB
Optimizations
This document lists the optimizations relevant to an elliptic curve or pairing-based cryptography library and whether Constantine has them implemented.
The optimizations can be of algebraic, algorithmic or "implementation details" nature. Using non-constant time code is always possible, it is listed if the speedup is significant.
Big Integers
- Conditional copy
- Loop unrolling
- x86: Conditional Mov
- x86: Full Assembly implementation
- SIMD instructions
- Add/Sub
- int128
- add-with-carry, sub-with-borrow intrinsics
- loop unrolling
- x86: Full Assembly implementation
- Multiplication
- int128
- loop unrolling
- Comba multiplication / product Scanning
- Karatsuba
- Karatsuba + Comba
- x86: Full Assembly implementation
- x86: MULX, ADCX, ADOX instructions
- Fused Multiply + Shift-right by word (for Barrett Reduction and approximating multiplication by fractional constant)
- Squaring
- Dedicated squaring functions
- int128
- loop unrolling
- x86: Full Assembly implementation
- x86: MULX, ADCX, ADOX instructions
Finite Fields & Modular Arithmetic
-
Representation
- Montgomery Representation
- Barret Reduction
- Unsaturated Representation
- Mersenne Prime (2ᵏ - 1),
- Generalized Mersenne Prime (NIST Prime P256: 2^256 - 2^224 + 2^192 + 2^96 - 1)
- Pseudo-Mersenne Prime (2^m - k for example Edwards25519: 2^255 - 19)
- Golden Primes (φ^2 - φ - 1 with φ = 2ᵏ for example Ed448-Goldilocks: 2^448 - 2^224 - 1)
- any prime modulus (lazy carry)
-
Montgomery Reduction
- int128
- loop unrolling
- x86: Full Assembly implementation
- x86: MULX, ADCX, ADOX instructions
-
Addition/substraction
- int128
- add-with-carry, sub-with-borrow intrinsics
- loop unrolling
- x86: Full Assembly implementation
- Addition-chain for small constants
-
Montgomery Multiplication
- Fused multiply + reduce
- int128
- loop unrolling
- x86: Full Assembly implementation
- x86: MULX, ADCX, ADOX instructions
- no-carry optimization for CIOS (Coarsely Integrated Operand Scanning)
- FIPS (Finely Integrated Operand Scanning)
-
Montgomery Squaring
- Dedicated squaring functions
- Fused multiply + reduce
- int128
- loop unrolling
- x86: Full Assembly implementation
- x86: MULX, ADCX, ADOX instructions
- no-carry optimization for CIOS (Coarsely Integrated Operand Scanning)
-
Addition chains
- unreduced squarings/multiplications in addition chains
-
Exponentiation
- variable-time exponentiation
- fixed window optimization (sliding windows are not constant-time)
- NAF recoding
- windowed-NAF recoding
- SIMD vectorized select in window algorithm
- Montgomery Multiplication with no final substraction,
- Bos and Montgomery, https://eprint.iacr.org/2017/1057.pdf
- Colin D Walter, https://colinandmargaret.co.uk/Research/CDW_ELL_99.pdf
- Hachez and Quisquater, https://link.springer.com/content/pdf/10.1007%2F3-540-44499-8_23.pdf
- Gueron, https://eprint.iacr.org/2011/239.pdf
- Bos and Montgomery, https://eprint.iacr.org/2017/1057.pdf
- Pippenger multi-exponentiation (variable-time)
- parallelized Pippenger
-
Inversion (constant-time baseline, Little-Fermat inversion via a^(p-2))
- Constant-time binary GCD algorithm by Möller, algorithm 5 in https://link.springer.com/content/pdf/10.1007%2F978-3-642-40588-4_10.pdf
- Addition-chain for a^(p-2)
- Constant-time binary GCD algorithm by Bernstein-Yang, https://eprint.iacr.org/2019/266
- Constant-time binary GCD algorithm by Pornin, https://eprint.iacr.org/2020/972
- Constant-time binary GCD algorithm by BY with half-delta optimization by libsecp256k1, formally verified, https://eprint.iacr.org/2021/549
- Simultaneous inversion
-
Square Root (constant-time)
- baseline sqrt via Little-Fermat for
p ≡ 3 (mod 4)
- baseline sqrt via Little-Fermat for
p ≡ 5 (mod 8)
- baseline sqrt via Little-Fermat for
p ≡ 9 (mod 16)
- baseline sqrt via Tonelli-Shanks for any prime.
- sqrt via addition-chain
- Fused sqrt + testIfSquare (Euler Criterion or Legendre symbol or Kronecker symbol)
- Fused sqrt + 1/sqrt
- Fused sqrt + 1/sqrt + testIfSquare
- baseline sqrt via Little-Fermat for
Extension Fields
- Lazy reduction via double-precision base fields
- Sparse multiplication
- Fp2
- complex multiplication
- complex squaring
- sqrt via the constant-time complex method (Adj et al)
- sqrt using addition chain
- fused complex method sqrt by rotating in complex plane
- Cubic extension fields
- Toom-Cook polynomial multiplication (Chung-Hasan)
Elliptic curve
-
Weierstrass curves:
- Affine coordinates
- Homogeneous projective coordinates
- Projective complete formulae
- Mixed addition
- Jacobian projective coordinates
- Jacobian complete formulae
- Mixed addition
- Conjugate Mixed Addition
- Composites Double-Add 2P+Q, tripling, quadrupling, quintupling, octupling
-
scalar multiplication
- fixed window optimization
- constant-time NAF recoding
- constant-time windowed-NAF recoding
- SIMD vectorized select in window algorithm
- constant-time endomorphism acceleration
- using NAF recoding
- using GLV-SAC recoding
- constant-time windowed-endomorphism acceleration
- using wNAF recoding
- using windowed GLV-SAC recoding
- SIMD vectorized select in window algorithm
- Fixed-base scalar mul
-
Multi-scalar-mul
- Strauss multi-scalar-mul
- Bos-Coster multi-scalar-mul
- Pippenger multi-scalar-mul (variable-time)
- parallelized Pippenger
Pairings
-
Frobenius maps
- Sparse Frobenius coefficients
- Coalesced Frobenius in towered Fields
- Coalesced Frobenius powers
-
Line functions
- Homogeneous projective coordinates
- D-Twist
- Fused line add + elliptic curve add
- Fused line double + elliptic curve double
- M-Twist
- Fused line add + elliptic curve add
- Fused line double + elliptic curve double
- 6-way sparse multiplication line * Gₜ element
- D-Twist
- Jacobian projective coordinates
- D-Twist
- Fused line add + elliptic curve add
- Fused line double + elliptic curve double
- M-Twist
- Fused line add + elliptic curve add
- Fused line double + elliptic curve double
- 6-way sparse multiplication line * Gₜ element
- D-Twist
- Affine coordinates
- 7-way sparse multiplication line * Gₜ element
- Pseudo-8 sparse multiplication line * Gₜ element
- Homogeneous projective coordinates
-
Miller Loop
- NAF recoding
- Quadruple-and-add and Octuple-and-add
- addition chains
-
Final exponentiation
- Cyclotomic squaring
- Karabina's compressed cyclotomic squarings
- Addition-chain for exponentiation by curve parameter
- BN curves: Fuentes-Castañeda
- BN curves: Duquesne, Ghammam
- BLS curves: Ghamman, Fouotsa
- BLS curves: Hayashida, Hayasaka, Teruya
- Cyclotomic squaring
-
Multi-pairing
- Line accumulation
- Parallel Multi-Pairing
Hash-to-curve
-
Clear cofactor
- BLS G1: Wahby-Boneh
- BLS G2: Scott et al
- BLS G2: Fuentes-Castañeda
- BLS G2: Budroni et al, endomorphism accelerated
- BN G2: Fuentes-Castañeda
- BW6-761 G1
- BW6-761 G2
-
Subgroup check
- BLS G1: Bowe, endomorphism accelerated
- BLS G2: Bowe, endomorphism accelerated
- BLS G1: Scott, endomorphism accelerated
- BLS G2: Scott, endomorphism accelerated