449 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy
c3b76cd420
32-bit fixes (#288)
* fix the new div2n1n_vartime on 32-bit - regression from #286

* remove unnecessary defensive programming

* reactivate 32-bit CI to check on #244

* 32-bit: centralize OS, ISA and env variable config

* enable assemble on x86 32-bit
2023-10-22 03:54:09 +02:00
Mamy Ratsimbazafy
07f96ec259
Move metering report and tracer primitive to inner lib (#289)
* move inner metering to inner lib

* remove duplicate getTicks from benchmarks folder
2023-10-22 03:53:56 +02:00
Mamy Ratsimbazafy
67fbd8c699
Nvidia JIT fixes (#290)
* lib org change, need serialization/io_limbs import

* fix unused variable and useless conversion warnings

* Update LLVM bindings to LLVM-16
2023-10-22 01:15:46 +02:00
Mamy Ratsimbazafy
4ccd8aaab8
EVM modexp: solve DOS vectors (#286)
* stash prep for Barret Reduction

* benches lost in rebase

* fix vartime reduction

* some improvement and fixes on reduce_vartime

* Fuse reductions when converting to Montgomery + use window=1 in powMont for small exponents. ~2.7x to 3.3x accel

* modexp: Introduce a no-reduction path for small base+exponent compared to modulus. Fix DOS

* optim for padded exponents

* remove commented out code [skip ci]

* Missing noInline for allocStackArray
2023-10-19 01:20:52 +02:00
Mamy Ratsimbazafy
34baa74bc0
Allow installation / import through nimble (#281)
* try to make nimble work

* try to make nimble work v2
2023-10-16 15:21:10 +02:00
Mamy Ratsimbazafy
4dd0a02f1a
BLS12-381 serialization: fix edge case 2023-10-10 21:49:06 +02:00
Mamy Ratsimbazafy
6489053da9
Fix another even modulus pow uninitialized mem (#280) 2023-10-10 07:57:03 +02:00
Mamy Ratsimbazafy
977b6eef42
nit: test ordering 2023-10-06 18:25:14 +02:00
Advaita Saha
c97036d1df
MapToScalarField() added for Banderwagon points (#278)
* feat: MapToScalarField added for Banderwagon points

* fix: syntax + NRVO

* fix: comments added for function & spec files linked

* feat: batchAffine + batchInversion

* feat: batchMapToScalarField + tests

* fix: comments and formatting

* fix: static to openArray changed

---------

Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co>
2023-10-06 10:03:42 +02:00
Mamy Ratsimbazafy
0f9b9e9606
Parallel Ethereum protocols (BLS signature and KZG) (#279)
* BLS sig: parallel batch verification

* BLS: speedup parallel batch verify with Miller loops on local threads

* shutdown bench

* nit: import style

* implement parallel KZG

* Parallel KZG commitments

* add benchmarks of KZG

* rename protocol file

* small optim: reorder await

* fix rebase

* Faster parallel BLS verification

* fix commitment status replacing previous error in verify_blob_kzg_proof_batch_parallel

* 2x faster parallel EC sum for less than 8192 points
2023-10-06 09:58:20 +02:00
Advaita Saha
f9258531f9
feat: add banderwagon (#271)
* banderwagon curve declaration added

* equality for banderwagon implemented

* subgroup check added

* map_to_field added

* feat: banderwagon serialization

* fix: imported codecs_status_codes into bls_signature

* fix: spec links added in comments

* fix: typo in curve declaration

* fix: banderwagon subgroup check shifted to subgroups file + map_to_field removed

* feat:  new equality re-exported

* fix: codecs_status_codes imported

* fix: equality check removed from banderwagon.nim to twistedEdwards implementation

* Update constantine/math/elliptic/ec_twistededwards_affine.nim

Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co>

* Update constantine/math/elliptic/ec_twistededwards_projective.nim

Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co>

* adding and doubling tests with minor fixes

* feat: banderwagon & bandersnatch generators added

* fix: doubling point error for twisted edwards projective

* fix: negation of x co-ordinate in spec

* fix: negetion of x in serialization

* fix: negetion in deserializarion

* feat: banderwagon tests

* fix: comments added for tests and serialization

* Update suggestion constantine/math/config/precompute.nim

---------

Co-authored-by: Mamy Ratsimbazafy <mamy_github@numforge.co>
2023-09-23 16:59:52 +02:00
Mamy Ratsimbazafy
7b64f85a29
KZG followup - Batch verification (#272)
* KZG: add batch verification

* workaround Clang and empty {.goto.} branches

* Apply suggestions from code review
2023-09-17 11:05:09 +02:00
Mamy Ratsimbazafy
153b37b77f
Ethereum KZG / EIP-4844 / Proto-danksharding followup (#270)
* Pass all verify_kzg_proof test cases

* pass blob_to_commitment tests

* move tests

* KZG: WIP on compute_proof

* eip4844: Pass all compute_kzg_proof tests

* pass compute_blob_kzg_proof tests

* pass all verify_blob_kzg_proof tests

* CI needs yaml

* fix memory leaks and add effect tags

* CI: lock yaml version too pre Nim 2.0
2023-09-15 08:21:04 +02:00
Mamy Ratsimbazafy
d51699248d
Ethereum KZG: big endian test vectors (#269)
* Ethereum KZG: big endian test vectors

* KZG: endianness changes
2023-09-09 14:17:47 +02:00
Mamy Ratsimbazafy
121334be79
#255: revive AT&T syntax, unfortunately cannot be combined with LTO for Clang 2023-09-09 11:27:06 +02:00
Mamy Ratsimbazafy
3ed57d3690
add modexp/modmul benches vs GMP 2023-09-09 10:09:47 +02:00
Mamy Ratsimbazafy
15757557b4
modexp: 2.5x accel on small exponent (#268)
* add metering to modexp

* modexp: accel exponent = 1

* modexp: improve runtime Montgomery constants compute. 2.49x faster on DOS vectors
2023-09-09 09:21:05 +02:00
Mamy Ratsimbazafy
f3a5f352b8
fuzz failure 5-3: Nim inclusive stops :/ (#267) 2023-09-09 09:20:01 +02:00
Mamy Ratsimbazafy
1ad8499ae5
fix fuzz 5 reloaded: modexp - endianness issue for exponent MSB (#266)
* fix fuzz 5 reloaded: endianness issue for exponent MSB

* refactoring typo in test vs gmp
2023-09-06 20:01:35 +02:00
Mamy Ratsimbazafy
b645d68e1a
update bench for modexp (#265) 2023-09-06 17:18:07 +02:00
Mamy Ratsimbazafy
c85ffb069a
fix fuzz 18: modexp - handling of infinitely right-padded inputs leading to buffer overflow or stack overflow (#264) 2023-09-06 15:00:29 +02:00
Mamy Ratsimbazafy
4e0ca43af1
Use vartime impl to accelerate the BN254 EVM precompiles 2023-09-05 01:02:01 +02:00
Mamy Ratsimbazafy
b9c911ba37
Accelerate FFT - endomorphism + wNAF vartime scalar mul (#258)
* accel FFT by 30+% with vartime endomorphism support

* silly error fix

* endomorphism + wNAF, closes #253, FFT 20% speedup

* vartime EC addition for all repr

* implement vartime EC add

* finishing touches, renam to fft_vartime
2023-09-04 10:19:14 +02:00
Advaita Saha
4981c383bb
fix: support for ECP_TwEdwards in toHex() (#261) 2023-08-31 12:07:31 +02:00
Mamy Ratsimbazafy
ad04e6ea57
Expose OS-provided cryptographically secure RNG (#257)
* Expose OS-provided cryptographically secure RNG

* small fixes

* some more csprngs fixes
2023-08-27 20:50:09 +02:00
Mamy Ratsimbazafy
8b43b55345
FFT + Trusted setup fixes (#254)
* FFT fixes

* trusted setup fixes
2023-08-27 20:49:55 +02:00
Mamy Ratsimbazafy
f57d071f11
Ethereum KZG polynomial commitments / EIP-4844 (part 1) (#239)
* common error model for serialization of BLS signatures and KZG objects

* [KZG] add Ethereum's test vectors [skip ci]

* dump progress on KZG

* Stash: trusted setup generator

* implement cache optimized bit-reversal-permutation

* Add generator for the Ethereum test trusted setups

* implement naive deserialization for the trusted setup interchange format

* implement verify_kzg_proof

* Add test skeleton of verify KZG proof

* rebase import fixes
2023-08-13 15:08:04 +02:00
Mamy Ratsimbazafy
47b4f48dfb
fix overflow when truncating in submod2k, fix Guido fuzzing failure 8 (#251) 2023-07-11 09:06:46 +02:00
Mamy Ratsimbazafy
cb038bb515
fix bigint mul non-compilation after #231 2023-07-09 18:57:12 +02:00
Mamy Ratsimbazafy
d69c7bf8e9
Fuzz Fix - Hash-To-Curve - Isogeny EC add non-fully-reduced input (#250)
* H2C: fix fuzz failure 2, non-fully reduced in isogeny EC addition

* faster hashToG2 by using sparsity
2023-07-03 06:57:22 +02:00
Mamy Ratsimbazafy
b7687ddc4a
Accelerate eth_evm_modexp by 25x by dividing input size by 8 (#249)
* Accelerate eth_evm_modexp by 25x by dividing input size by 8 (scales quadratically)

* instant exponentiation by power of 2 depending on trailing zeroes

* improve bench report

* rename

* rewrite the pow2k even/trailingZero accel

* eth_evm_modexp: remove leftover TimeEffect
2023-07-03 01:45:36 +02:00
Mamy Ratsimbazafy
d0f4ad8cda
Fix fuzz #1 failure: incorrect reduction of BigInt (#246) 2023-07-02 17:15:02 +02:00
Mamy Ratsimbazafy
72f36530ba
Fix Fuzz 5: off-by-1 in even modexp (#247) 2023-07-02 17:14:50 +02:00
Mamy Ratsimbazafy
151f284da6
Add C API for BN254 snarks 2023-06-08 22:13:31 +02:00
Mamy Ratsimbazafy
0eba593951
Pasta / Halo2 MSM bench (#243)
* Pasta bench

* cleanup env variables

* [MSM]: generate benchmark coef-points pairs in parallel

* try to fix windows Ci

* add diagnostic info

* fix old test for new codecs/io primitives

* Ensure the projective point at infinity is not all zeros, but (0, 1, 0)
2023-06-04 17:41:54 +02:00
Mamy Ratsimbazafy
1325d249ce
deactivate 32-bit CI, package management woes, see #244 2023-06-02 09:01:00 +02:00
Mamy Ratsimbazafy
b1ef2682d6
Modular exponentiation (arbitrary output) and EIP-198 (#242)
* implement arbitrary precision modular exponentiation (prerequisite EIP-198)

* [modexp] implement exponentiation modulo 2ᵏ

* add inversion (mod 2ᵏ)

* [modexp] High-level wrapper for powmod with odd modulus

* [modexp] faster exponentiation (mod 2ᵏ) for even case and Euler's totient function odd case

* [modexp] implement general fast modular exponentiation

* Fix modular reduction with 64-bit modulus + fuzz powmod vs GMP

* add benchmark

* add EIP-198 support

* fixups following self review

* fix test paths
2023-06-01 23:38:41 +02:00
Mamy Ratsimbazafy
d996ccd5d8
Path reorgs (#240)
* move tests

* move threadpool to root path

* fix hints and warnings, print nim versions for tests for debugging the new strange issue in CI

* print nim version

* mixup on branches

* mixup on branches reloaded
2023-05-29 20:14:30 +02:00
Mamy Ratsimbazafy
1c5341fd7e
Perf quick wins - 10% Fp12 mul (#235)
* improve FP12_mul  perf by 10%

* update README [skip ci]
2023-04-28 11:31:17 +02:00
Mamy Ratsimbazafy
33c3a2e8c4
[Research] x86 code generator (#234)
* rename compilers -> intrinsics, math_gpu -> math_codegen

* stash x86 codegen in research
2023-04-27 21:52:51 +02:00
Mamy Ratsimbazafy
c6d9a213f2
Rework assembly to be compatible with LTO (#231)
* rework assembler register/mem and constraint declarations

* Introduce constraint UnmutatedPointerToWriteMem

* Create invidual memory cell operands

* [Assembly] fully support indirect memory addressing

* fix calling convention for exported procs

* Prepare for switch to intel syntax to avoid clang constant propagation asm symbol name interfering OR pointer+offset addressing

* use modifiers to prevent bad string mixin fo assembler to linker of propagated consts

* Assembly: switch to intel syntax

* with working memory operand - now works with LTO on both GCC and clang and constant folding

* use memory operand in more places

* remove some inline now that we have lto

* cleanup compiler config and benches

* tracer shouldn't force dependencies when unused

* fix cc on linux

* nimble fixes

* update README [skip CI]

* update MacOS CI with Homebrew Clang

* oops nimble bindings disappeared

* more nimble fixes

* fix sha256 exported symbol

* improve constraints on modular addition

* Add extra constraint to force reloading of pointer in reg inputs

* Fix LLVM gold linker running out of registers

* workaround MinGW64 GCC 12.2 bad codegen in t_pairing_cyclotomic_subgroup with LTO
2023-04-26 06:58:31 +02:00
Mamy Ratsimbazafy
9a7137466e
C API for Ethereum BLS signatures (#228)
* [testsuite] Rework parallel test runner to buffer beyond 65536 chars and properly wait for process exit

* [testsuite] improve error reporting

* rework openArray[byte/char] for BLS signature C API

* Prepare for optimized library and bindings

* properly link to constantine

* Compiler fixes, global sanitizers, GCC bug with --opt:size

* workaround/fix #229: don't inline field reduction in Fp2

* fix clang running out of registers with LTO

* [C API] missed length parameters for ctt_eth_bls_fast_aggregate_verify

* double-precision asm is too large for inlining, try to fix Linux and MacOS woes at https://github.com/mratsim/constantine/pull/228#issuecomment-1512773460

* Use FORTIFY_SOURCE for testing

* Fix #230 - gcc miscompiles Fp6 mul with LTO

* disable LTO for now, PR is too long
2023-04-18 22:02:23 +02:00
Mamy Ratsimbazafy
93dac2503c
MSM tuning for high core count (#227)
* tune for high core count

* reentrancy: allow nesting of parallel functions by introducing precise scoped barriers

* increase collision queue depth
2023-04-14 20:02:59 +02:00
Mamy Ratsimbazafy
6c48975aee
Parallel Multi-Scalar-Multiplication (#226)
* try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores

* initial impl of parallel MSM - scaling to debug, threads not woken fast enough

* improve comment [skip ci]

* skip top window when c divides the number of bits

* for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue?

* Add counters and timers to audit threadpool bottlenecks

* metrics and profiling fixes, (slower) latency hiding, activate tests

* fix thief thread trying to wake another before canceling its own sleep

* easier to sort metrics and parallel endomorphism application

* selective endomorphism acceleration

* some tuning

* spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs

* improve MSM overview [skip ci]

* bench cleanup
2023-04-10 23:30:14 +02:00
Mamy Ratsimbazafy
4dc2610557
Bindings "filesystem" (#225)
* bindings structure

* missed some renaming

* add back the headers

* path fixes

* need to sleep at night

* windows path mystery is unfathomable
2023-03-01 12:59:06 +01:00
Mamy Ratsimbazafy
1cb6c3d9e1
[Threadpool] Backoff revamp (#224)
* Threadpool: eventcount didn't put threads to actual sleep :/

* rework task awaiter sleep to prevent use-after-free race condition after task completion

* Need memory fence for StoreLoad synchronization ordering

* update design doc

* set memory order in sleep of eventcount

* cleanup debug logs

* comment cleanup [skip ci]
2023-02-25 17:11:33 +01:00
Mamy Ratsimbazafy
1dfbb8bd4f
[Threadpool] Remove reserve threads (#223)
* remove reserve threads

* recover last perf diff: 1. don't import primitives, cpu features detection globals are noticeable, 2. noinit + conditional zeroMem are unnecessary when sync is inline 3. inline 'newSpawn' and don't init the loop part

* avoid syscalls if possible if thred is awake but idle

* renaming eventLoop

* remove unused code: steal-half

* renaming

* no need for 0-init sync, T can be large in cryptography
2023-02-24 17:36:04 +01:00
Mamy Ratsimbazafy
bf32c2d408
Parallel for (#222)
* introduce reserve threads to minimize latency and maximize throughput when awaiting a future

* introduce a ceilDiv proc

* threadpool: implement parallel-for loops

* 10x perf improvement by not waking reserveBackoff on syncAll

* bench overhead: new reserve system might introduce too much wakeup latency, 2x slower, for fine-grained parallelism

* add parallelForStrided

* Threadpool: Implement parallel reductions

* refactor parallel loop codegen: introduce descriptor, parsing and codegen stages

* parallel strided, test transpose bench

* tight loop is faster when backoff is not inline

* no POSIX stuff on windows, larger types for histogram bench

* fix tests

* max RSS overflow?

* missed an undefined var

* exit histogram on 32-bit

* forgot to return early dor 32-bit
2023-02-24 09:47:36 +01:00
Mamy Ratsimbazafy
8993789ddf
fix #221 2023-02-16 13:54:21 +01:00
Mamy Ratsimbazafy
e5612f5705
Multi-Scalar-Multiplication / Linear combination (#220)
* unoptimized msm

* MSM: reorder loops

* add a signed windowed recoding technique

* improve wNAF table access

* use batchAffine

* revamp EC tests

* MSM signed digit support

* refactor MSM: recode signed ahead of time

* missing test vector

* refactor allocs and Alloca sideeffect

* add an endomorphism threshold

* Add Jacobian extended coordinates

* refactor recodings, prepare for parallelizable on-the-fly signed recoding

* recoding changes, introduce proper NAF for pairings

* more pairings refactoring, introduce miller accumulator for EVM

* some optim to the addchain miller loop

* start optimizing multi-pairing

* finish multi-miller loop refactoring

* minor tuning

* MSM: signed encoding suitable for parallelism (no precompute)

* cleanup signed window encoding

* add prefetching

* add metering

* properly init result to infinity

* comment on prefetching

* introduce vartime inversion for batch additions

* fix JacExt infinity conversion

* add batchAffine for MSM, though slower than JacExtended at the moment

* add a batch affine scheduler for MSM

* Add Multi-Scalar-Multiplication endomorphism acceleration

* some tuning

* signed integer fixes + 32-bit + tuning

* Some more tuning

* common msm bench + don't use affine for c < 9

* nit
2023-02-16 12:45:05 +01:00