Commit Graph

42 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy bf32c2d408
Parallel for (#222)
* introduce reserve threads to minimize latency and maximize throughput when awaiting a future

* introduce a ceilDiv proc

* threadpool: implement parallel-for loops

* 10x perf improvement by not waking reserveBackoff on syncAll

* bench overhead: new reserve system might introduce too much wakeup latency, 2x slower, for fine-grained parallelism

* add parallelForStrided

* Threadpool: Implement parallel reductions

* refactor parallel loop codegen: introduce descriptor, parsing and codegen stages

* parallel strided, test transpose bench

* tight loop is faster when backoff is not inline

* no POSIX stuff on windows, larger types for histogram bench

* fix tests

* max RSS overflow?

* missed an undefined var

* exit histogram on 32-bit

* forgot to return early dor 32-bit
2023-02-24 09:47:36 +01:00
Mamy Ratsimbazafy e5612f5705
Multi-Scalar-Multiplication / Linear combination (#220)
* unoptimized msm

* MSM: reorder loops

* add a signed windowed recoding technique

* improve wNAF table access

* use batchAffine

* revamp EC tests

* MSM signed digit support

* refactor MSM: recode signed ahead of time

* missing test vector

* refactor allocs and Alloca sideeffect

* add an endomorphism threshold

* Add Jacobian extended coordinates

* refactor recodings, prepare for parallelizable on-the-fly signed recoding

* recoding changes, introduce proper NAF for pairings

* more pairings refactoring, introduce miller accumulator for EVM

* some optim to the addchain miller loop

* start optimizing multi-pairing

* finish multi-miller loop refactoring

* minor tuning

* MSM: signed encoding suitable for parallelism (no precompute)

* cleanup signed window encoding

* add prefetching

* add metering

* properly init result to infinity

* comment on prefetching

* introduce vartime inversion for batch additions

* fix JacExt infinity conversion

* add batchAffine for MSM, though slower than JacExtended at the moment

* add a batch affine scheduler for MSM

* Add Multi-Scalar-Multiplication endomorphism acceleration

* some tuning

* signed integer fixes + 32-bit + tuning

* Some more tuning

* common msm bench + don't use affine for c < 9

* nit
2023-02-16 12:45:05 +01:00
Mamy Ratsimbazafy 7c5421ffdc
move staticFor to the inner repo, not helpers/ for unblocking nimble install (#216) 2023-02-07 13:11:44 +01:00
Mamy Ratsimbazafy 2931913b67
Add a threadpool (#213)
* Implement a threadpool

* int and SomeUnsignedInt ...

* Type conversion for windows SynchronizationBarrier

* Use the latest MacOS 11, Big Sur API (jan 2021) for MacOS futexes, Github action offers MacOS 12 and can test them

* bench need posix timer not available on windows and darwin futex

* Windows: nimble exec empty line is an error, Mac: use defined(osx) instead of defined(macos)

* file rename

* okay, that's the last one hopefully

* deactivate stealHalf for now
2023-01-24 02:32:28 +01:00
Mamy Ratsimbazafy 4052a07611
chore: cleanup TODOs, unused constants 2023-01-12 01:27:23 +01:00
Mamy Ratsimbazafy 928f515582
Batch additions (#207)
* Batch elliptic curve addition

* accelerate chained muls

* jac mixed add handle doubling. jac additions handle aliasing when adding infinity

* properly skip sanitizer on BLS signature test

* properly skip sanitizer² on BLS signature test
2022-10-29 22:43:40 +02:00
Mamy Ratsimbazafy 93654d580e
pararun: Ignore error #259, sha256: add back a paper 2022-09-19 09:11:16 +02:00
Mamy Ratsimbazafy d515bebdba
pararun: MacOS, weird error 259 when accumulating pipes or processes 2022-09-19 03:14:44 +02:00
Mamy Ratsimbazafy 351a3f6bd2
Sha256 refactor (#206)
* sha256: separate message scheduling and state updates to help implement specific use-cases like #205; also implement SSSE3 acceleration (2006, Intel Core 2 Duo)

* sha256: simplify update flow, store less metadata in context

* sha256: Fix reworked update function

* Implement x86 hardware SHA acceleration

* typo
2022-09-19 02:02:57 +02:00
Mamy Ratsimbazafy 495d5fa9fd
don't run afoul of pipe limits 2022-09-19 01:47:18 +02:00
Mamy Ratsimbazafy 7c7290115f
nimble: fix bench_poly1305, improve reporting in pararun 2022-09-19 00:41:37 +02:00
Mamy Ratsimbazafy cc47c27cca
pararun: don't swallow failures 2022-09-18 23:37:35 +02:00
Mamy Ratsimbazafy d4e202ead5
Don't use array[^1], it can throw and cannot be locally turn off 2022-09-17 18:52:52 +02:00
Mamy Ratsimbazafy 962e7ccf49
CI: enable GMP tests on Windows and Linux 32-bit and fix caching (#204)
* Try to compile with GMP on windows and 32-bit linux

* remove leftover msys shell

* Don't use GMP Mersenne Twister, bad randomness and untested Nim wrapper

* properly cache nim

* fix path after cache

* run pacman in msys2 env

* rework msys2 ... again

* shell compat for file clearing

* shell compat try-again for file clearing

* force bash for clearing parallel builds on windows

* Use nimscript directly (why didn't it work last time?)

* Avoid IO redirection to support any shell

* Avoid IO redirection v2 to support any shell

* add debug data

* add debug again

* Introduce pararun, a parallel test runner to remove need of GNU parallel

* pararun: style
2022-09-15 09:33:34 +02:00
Mamy Ratsimbazafy 094445482b
Eip2333 (#202)
* HMAC-SHA256

* EIP2333

* activate EIP2333 tests and faster random test case generation
2022-08-16 12:07:57 +02:00
Mamy Ratsimbazafy 26954f905a
Constant time (#185)
* Implement fully constant-time division closes #2 closes #9

* constant-time hex parsing

* prevent cache timing attacks in toHex() conversion (which is only for test/debug purposes anyway)
2022-02-28 09:23:26 +01:00
Mamy Ratsimbazafy ffacf61e8a
Don't dump all in "backend" (#184)
* backend -> math

* towers -> extension fields

* move ISA and compiler specific code out of math/

* fix export
2022-02-27 01:49:08 +01:00
Mamy Ratsimbazafy fe500a6a79
Productionize: move protocols top-level vs backend (#179)
* Productionize: move protocols top-level vs backend

* fix path

* import fix

* the last one

* benches as well
2022-02-21 01:04:53 +01:00
Mamy Ratsimbazafy 14af7e8724
Low-level refactoring (#175)
* Add specific fromMont conversion routine. Rename montyResidue to getMont

* missed test file

* Add x86_64 ASM for fromMont

* Add x86_64 MULX/ADCX/ADOX for fromMont

* rework Montgomery Multiplication with prefetch/latency hiding techniques

* Fix ADX autodetection, closes #174. Rollback faster mul_mont attempt, no improvement and debug pain.

* finalSub in fromMont & adx_bmi -> adx

* Some {.noInit.} to avoid Nim zeroMem (which should be optimized away but who knows)

* Uniformize name 'op+domain': mulmod - mulmont

* Fix asm codegen bug "0x0000555555565930 <+896>:   sbb    0x20(%r8),%r8" with Clang in final substraction

* Prepare for skipping final substraction

* Don't forget to copy the result when we skip the final substraction

* Seems like we need to stash the idea of skipping the final substraction for now, needs bounds analysis https://eprint.iacr.org/2017/1057.pdf

* fix condition for ASM 32-bit

* optim modular addition when sparebit is available
2022-02-14 00:16:55 +01:00
Mamy Ratsimbazafy 53f9708c2b
Initial support for Twisted Edwards curves (#167)
* Point decoding: optimized sqrt for p ≡ 5 (mod 8) (Curve25519)

* Implement fused sqrt(u/v) for twisted edwards point deserialization

* Introduce twisted edwards affine

* Allow declaration of curve field elements (and fight against recursive dependencies

* Twisted edwards group law + tests

* Add support for jubjub and bandersnatch #162

* test twisted edwards scalar mul
2021-12-29 01:54:17 +01:00
Mamy Ratsimbazafy 5806cc4638
Double-Precision towering (#155)
* consistent naming for dbl-width

* Isolate double-width Fp2 mul

* Implement double-width complex multiplication

* Lay out Fp4 double-width mul

* Off by p in square Fp4 as well :/

* less copies and stack space in addition chains

* Address https://github.com/mratsim/constantine/issues/154 partly

* Fix #154, faster Fp4 square: less non-residue, no Mul, only square (bit more ops total)

* Fix typo

* better assembly scheduling for add/sub

* Double-width -> Double-precision

* Unred -> Unr

* double-precision modular addition

* Replace canUseNoCarryMontyMul and canUseNoCarryMontySquare by getSpareBits

* Complete the double-precision implementation

* Use double-precision path for Fp4 squaring and mul

* remove mixin annotations

* Lazy reduction in Fp4 prod

* Fix assembly for sum2xMod

* Assembly for double-precision negation

* reduce white spaces in pairing benchmarks

* ADX implies BMI2
2021-02-09 22:57:45 +01:00
Mamy Ratsimbazafy e23f990280
Tower drop concepts (#153)
* Fix affine instantiation

* drop concept from the codebase

* Remove alignment requirement, this cases problem in sequences on 32-bit for t_fp12_anti_regression

* slight sparse optim
2021-02-07 14:03:56 +01:00
Mamy André-Ratsimbazafy 5710a961a1
Rename ECP_ShortW_Proj -> ECP_ShortW_Prj 2021-02-06 16:29:53 +01:00
Mamy Ratsimbazafy c312210878
Rework towering (#148)
* naive removal of out-of-place mul by non residue

* Use {.inline.} in a consistent manner across the codebase

* Handle aliasing for quadratic multiplication

* reorg optimization

* Handle aliasing for quadratic squaring

* handle aliasing in mul_sparse_complex_by_0y

* Rework multiplication by nonresidue, assume tower and twist use same non-residue

* continue rework

* continue on non-residues

* Remove "NonResidue *" calls

* handle aliasing in Chung-Hasan SQR2

* Handla aliasing in Chung-Hasan SQR3

* Use one less temporary in Chung Hasan sqr2

* handle aliasing in cubic extensions

* merge extension tower in the same file to reduce duplicate proc and allow better inlining

* handle aliasing in cubic inversion

* drop out-of-place proc from BigInt and finite fields as well

* less copies in line_projective

* remove a copy in fp12 by lines
2021-02-06 16:28:38 +01:00
Mamy Ratsimbazafy 95e23339b2
Decimal conversion (#139)
* Add constant-time fromDecimal conversion. Add warnings on intended purposes of hex/decimals

* introduce setuint + cosmetic fixes Wordbitsize -> Wordbitwidth in comments

* Add decimal conversion (non-constant-time)

* fix comments [skip ci]
2021-01-29 20:42:36 +01:00
Mamy Ratsimbazafy 638cb71e16
Fr: Finite Field parametrized by the curve order (#115)
* Introduce Fr type: finite field over curve order. Need workaround for https://github.com/nim-lang/Nim/issues/16774

* Split curve properties into core and derived

* Attach field properties to an instantiated field instead of the curve enum

* Workaround https://github.com/nim-lang/Nim/issues/14021, yet another "working with types in macros" is difficult https://github.com/nim-lang/RFCs/issues/44

* Implement finite field over prime order of a curve subgroup

* skip OpenSSL tests on windows
2021-01-22 00:09:52 +01:00
Mamy André-Ratsimbazafy e89429e822
SHA256 Hash function 2020-12-15 19:18:36 +01:00
Mamy Ratsimbazafy 71bb4c799a
BW6-761 part 1 (#100)
* Add Fp, Fp2, Fp6 support for BW6-761

* Add G1 for BW6-761

* Prepare to support G2 twists on the same field as G1

* Remove a useless dependent type for lines

* Implement G2 for BW6-761

* Fix Line leftover
2020-10-09 07:51:47 +02:00
Mamy Ratsimbazafy 986245b5c1
Jacobian coordinates (#95)
* Add projective-> affine bench

* Add conditional copy and div2 benches

* Fp4 benchmarks

* Constant-time Jacobian addition

* Jacobian doubling

* Use a simpler Add+Dbl complete formula

* Update tests

* Fix conditional negate

* Rollaback complete addition, we were only handling curve coef a == 0
2020-10-02 00:01:09 +02:00
Mamy André-Ratsimbazafy 0effd66dbd
SWei -> SHortW, weierstrass -> shortweierstrass 2020-09-27 23:02:48 +02:00
Mamy Ratsimbazafy d84edcd217
Naive pairings + Naive cofactor clearing (#82)
* Pairing - initial commit
- line functions
- sparse Fp12 functions

* Small fixes:
- Line parametrized by twist for generic algorithm
- Add a conjugate operator for quadratic extensions
- Have frobenius use it
- Create an Affine coordinate type for elliptic curve

* Implement (failing) pairing test

* Stash pairing debug session, temp switch Fp12 over Fp4

* Proper naive pairing on BLS12-381

* Frobenius map

* Implement naive pairing for BN curves

* Add pairing tests to CI + reduce time spent on lower-level tests

* Test without assembler in Github Actions + less base layers test iterations
2020-09-21 23:24:00 +02:00
Mamy Ratsimbazafy e491f3b91d
[WIP] Skewed RNGs that trigger corner cases (#59)
* Add a RNG skewed to high hamming weights

* Add libsecp256k1 skewed RNG that found a CVE in OpenSSL

* Add initial skewed RNGs tests to finite fields

* Add Fp towers skewed tests

* Add ellptic curve skewed tests
2020-06-20 18:55:27 +02:00
Mamy Ratsimbazafy 2613356281
Endomorphism acceleration for Scalar Multiplication (#44)
* Add MultiScalar recoding from "Efficient and Secure Algorithms for GLV-Based Scalar Multiplication" by Faz et al

* precompute cube root of unity - Add VM precomputation of Fp - workaround upstream bug https://github.com/nim-lang/Nim/issues/14585

* Add the φ-accelerated lookup table builder

* Add a dedicated bithacks file

* cosmetic import consistency

* Build the φ precompute table with n-1 EC additions instead of 2^(n-1) additions

* remove binary

* Add the GLV precomputations to the sage scripts

* You can't avoid it, bigint multiplication is needed at one point

* Add bigint multiplication discarding some low words

* Implement the lattice decomposition in sage

* Proper decomposition for BN254

* Prepare the code for a new scalar mul

* We compile, and now debugging hunt

* More helpers to debug GLV scalar Mul

* Fix conditional negation

* Endomorphism accelerated scalar mul working for BN254 curve

* Implement endomorphism acceleration for BLS12-381 (needed cofactor clearing of the point)

* fix nimble test script after bench rename
2020-06-14 15:39:06 +02:00
Mamy Ratsimbazafy 82ceca6e3b
Scalar mul tests (#28)
* Add sage script for BN254

* Implement (failing) scalar multiplication tests

* Add a first test against sagemath

* Finish the tests against SAGE for BN254

* Add significant test coverage of scalar multiplication with reference checks for BN254_Snarks and BLS12_381
2020-06-04 20:37:29 +02:00
Mamy André-Ratsimbazafy 7ae0f51000
benchmarking skips cycle counting for ARM 2020-04-15 21:24:18 +02:00
Mamy André-Ratsimbazafy 8a9cb9287c Highlight that bools and words are "Secret" in the codebase 2020-04-15 00:04:44 +02:00
Mamy André-Ratsimbazafy 75557d88d8 Generalize the tower extensions tests 1000+ lines saved 2020-04-15 00:04:44 +02:00
Mamy André-Ratsimbazafy 1559bda56c Use our prng through most of the test suite 2020-04-15 00:04:44 +02:00
Mamy André-Ratsimbazafy 0115d3fd8e Rename the test PRNG to unsafe and prepare random number generation for integer ranges to not depend on the stdlib and have a single unified seed. 2020-04-15 00:04:44 +02:00
Mamy Ratsimbazafy 2f839cb1bf
Initial support for Elliptic Curve (#24)
* Elliptic curve and Twisted curve templates - initial commit

* Support EC Add on G2 (Sextic Twisted curve for BN and BLS12 families)

* Refactor the config parser to prepare for elliptic coefficient support

* Add elliptic curve parameter for BN254 (Snarks), BLS12-381 and Zexe curve BLS12-377

* Add accessors to curve parameters

* Allow computing the right-hand-side of of Weierstrass equation "y² = x³ + a x + b"

* Randomized test infrastructure for elliptic curves

* Start a testing suite on ellptic curve addition (failing)

* detail projective addition

* Fix EC addition test (forgot initializing Z=1 and that there ar emultiple infinity points)

* Test with random Z coordinate + add elliptic curve test to test suite

* fix reference to the (deactivated) addchain inversion for BN curves [skip ci]

* .nims file leftover [skip ci]
2020-04-13 19:25:59 +02:00
Mamy André-Ratsimbazafy 2d5b173a39
Less magics, les macros, faster compile-times (or not, Fp6 starts to get really slow, like 5s) + some cleanups in curve families + test 𝔽p6 on 32-bit 2020-03-22 12:28:53 +01:00
Mamy André-Ratsimbazafy bde619155b
30% faster constant-time inversion 2020-03-20 23:03:52 +01:00