Commit Graph

15 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy 82819b1b10
Square Root & Inversion addition chains - 20% perf increase (#132)
* Addition chain for sqrt BLS12-381: 20% perf improvement

* sqrt addchain for BN254_Snarks - 20% perf improvement as well

* Fix operation count [skip ci]

* BLS12-377 sqrt - 10% perf improvement

* sqrt addition chain for BW6-761 - 6% speedup

* BN254_Nogami inversion addchain

* sqrt addchain for BN254_Nogami

* Inversion addchain for BLS12-377

* inversion ddition chain for BW6-761
2021-01-23 20:55:40 +01:00
Mamy Ratsimbazafy 638cb71e16
Fr: Finite Field parametrized by the curve order (#115)
* Introduce Fr type: finite field over curve order. Need workaround for https://github.com/nim-lang/Nim/issues/16774

* Split curve properties into core and derived

* Attach field properties to an instantiated field instead of the curve enum

* Workaround https://github.com/nim-lang/Nim/issues/14021, yet another "working with types in macros" is difficult https://github.com/nim-lang/RFCs/issues/44

* Implement finite field over prime order of a curve subgroup

* skip OpenSSL tests on windows
2021-01-22 00:09:52 +01:00
Mamy André-Ratsimbazafy e89429e822
SHA256 Hash function 2020-12-15 19:18:36 +01:00
Mamy Ratsimbazafy 986245b5c1
Jacobian coordinates (#95)
* Add projective-> affine bench

* Add conditional copy and div2 benches

* Fp4 benchmarks

* Constant-time Jacobian addition

* Jacobian doubling

* Use a simpler Add+Dbl complete formula

* Update tests

* Fix conditional negate

* Rollaback complete addition, we were only handling curve coef a == 0
2020-10-02 00:01:09 +02:00
Mamy Ratsimbazafy 28e83e7b49
Faster inversion with addition chains (#80) 2020-09-04 19:04:32 +02:00
Mamy Ratsimbazafy 85d365359d
Endomorphism G2 (#79)
* Clear cofactor in BN254 G2 testgen and frobenius

* Implement G2 endomorphism acceleration in Sage

* Somewhat working accelerated scalar mul G2 (2.2x) faster
- OK for BN254_Snarks
- Some test failing for BLS12-381

* Fix negative miniscalars by adding an extra bit of encoding

* Cleanup accel params

* Small recoding optimizations
2020-09-03 23:10:48 +02:00
Mamy Ratsimbazafy d41c653c8a
Double-width tower extension part 1 (#72)
* Implement double-width field multiplication for double-width towering

* Fp2 mul acceleration via double-width lazy reduction (pure Nim)

* Inline assembly for basic add and sub

* Use 2 registers instead of 12+ for ASM conditional copy

* Prepare assembly for extended multiprecision multiplication support

* Add assembly for mul

* initial implementation of assembly reduction

* stash current progress of assembly reduction

* Fix clobbering issue, only P256 comparison remain buggy

* Fix asm montgomery reduction for NIST P256 as well

* MULX/ADCX/ADOX multi-precision multiplication

* MULX/ADCX/ADOX reduction v1

* Add (deactivated) assembly for double-width substraction + rework benches

* Add bench to nimble and deactivate double-width for now. slower than classic

* Fix x86-32 running out of registers for mul

* Clang needs to be at v9 to support flag output constraints (Xcode 11.4.2 / OSX Catalina)

* 32-bit doesn't have enough registers for ASM mul

* Fix again Travis Clang 9 issues

* LLVM 9 is not whitelisted in travis

* deactivated assembler with travis clang

* syntax error

* another

* ...

* missing space, yeah ...
2020-08-20 10:21:39 +02:00
Mamy Ratsimbazafy d97bc9b61c
Assembly backend (#69)
* Proof-of-Concept Assembly code generator

* Tag inline per procedure so we can easily track the tradeoff on tower fields

* Implement Assembly for modular addition (but very curious off-by-one)

* Fix off-by one for moduli with non msb set

* Stash (super fast) alternative but still off by carry

* Fix GCC optimizing ASM away

* Save 1 register to allow compiling for BLS12-381 (in the GMP test)

* The compiler cannot find enough registers if the ASM file is not compiled with -O3

* Add modsub

* Add field negation

* Implement no-carry Assembly optimized field multiplication

* Expose UseX86ASM to the EC benchmark

* omit frame pointer to save registers instead of hardcoding -O3. Also ensure early clobber constraints for Clang

* Prepare for assembly fallback

* Implement fallback for CPU that don't support ADX and BMI2

* Add CPU runtime detection

* Update README closes #66

* Remove commented out code
2020-07-24 22:02:30 +02:00
Mamy André-Ratsimbazafy d22d981e9e
Implement fused sqrt invsqrt on Fp: Accelerate sqrt on Fp2 by 20% (hashToG2 and property-based testing bottleneck, 4 times slower than inversion and 87 times slower than Fp2 multiplication) 2020-06-17 22:44:52 +02:00
Mamy Ratsimbazafy 3d1b1fab98
Fix benchmark on ARM (#31) 2020-06-04 22:09:30 +02:00
Mamy André-Ratsimbazafy 7ae0f51000
benchmarking skips cycle counting for ARM 2020-04-15 21:24:18 +02:00
Mamy André-Ratsimbazafy e0c1e0b1c8
Add EC bench on G1 + Add throughput to benches 2020-04-15 19:38:02 +02:00
Mamy André-Ratsimbazafy aff44f4d8e
Implement constant-time `div2` on finite and extension fields 2020-04-15 02:12:45 +02:00
Mamy Ratsimbazafy c04721a04e
Refactor: Higher-Kinded Tower of Extension Fields (#25)
* Mention that the inverse of 0 is 0 (TODO tests)

* Introduce "Higher-Kinded tower extensions"

* rename isCOmplexExtension -> fromComplexExtension

* update benchmarks with the new tower scheme

* Try to recover some speed on mul/squaring for an optimal tower (but this was not it)
2020-04-14 02:05:42 +02:00
Mamy André-Ratsimbazafy 9e78cd5d6d
Benchmark template for 𝔽p, 𝔽p2, 𝔽p6 2020-03-21 02:31:31 +01:00