* first jab at Rust bindings
* stash C library and header generation
* Create a single big library with multiple headers
* remove ctt_pure, people will not call crypto proc twice with unchanged parameter and extra noise when reading header
* fix MacOS and Windows builds
* fix cross-lang ThinLTO, require LLD
* Remove NimMain need, cleanup CPU features and detect them on library load
* fix the new div2n1n_vartime on 32-bit - regression from #286
* remove unnecessary defensive programming
* reactivate 32-bit CI to check on #244
* 32-bit: centralize OS, ISA and env variable config
* enable assemble on x86 32-bit
* stash prep for Barret Reduction
* benches lost in rebase
* fix vartime reduction
* some improvement and fixes on reduce_vartime
* Fuse reductions when converting to Montgomery + use window=1 in powMont for small exponents. ~2.7x to 3.3x accel
* modexp: Introduce a no-reduction path for small base+exponent compared to modulus. Fix DOS
* optim for padded exponents
* remove commented out code [skip ci]
* Missing noInline for allocStackArray
* Pass all verify_kzg_proof test cases
* pass blob_to_commitment tests
* move tests
* KZG: WIP on compute_proof
* eip4844: Pass all compute_kzg_proof tests
* pass compute_blob_kzg_proof tests
* pass all verify_blob_kzg_proof tests
* CI needs yaml
* fix memory leaks and add effect tags
* CI: lock yaml version too pre Nim 2.0
* common error model for serialization of BLS signatures and KZG objects
* [KZG] add Ethereum's test vectors [skip ci]
* dump progress on KZG
* Stash: trusted setup generator
* implement cache optimized bit-reversal-permutation
* Add generator for the Ethereum test trusted setups
* implement naive deserialization for the trusted setup interchange format
* implement verify_kzg_proof
* Add test skeleton of verify KZG proof
* rebase import fixes
* Pasta bench
* cleanup env variables
* [MSM]: generate benchmark coef-points pairs in parallel
* try to fix windows Ci
* add diagnostic info
* fix old test for new codecs/io primitives
* Ensure the projective point at infinity is not all zeros, but (0, 1, 0)
* move tests
* move threadpool to root path
* fix hints and warnings, print nim versions for tests for debugging the new strange issue in CI
* print nim version
* mixup on branches
* mixup on branches reloaded
* rework assembler register/mem and constraint declarations
* Introduce constraint UnmutatedPointerToWriteMem
* Create invidual memory cell operands
* [Assembly] fully support indirect memory addressing
* fix calling convention for exported procs
* Prepare for switch to intel syntax to avoid clang constant propagation asm symbol name interfering OR pointer+offset addressing
* use modifiers to prevent bad string mixin fo assembler to linker of propagated consts
* Assembly: switch to intel syntax
* with working memory operand - now works with LTO on both GCC and clang and constant folding
* use memory operand in more places
* remove some inline now that we have lto
* cleanup compiler config and benches
* tracer shouldn't force dependencies when unused
* fix cc on linux
* nimble fixes
* update README [skip CI]
* update MacOS CI with Homebrew Clang
* oops nimble bindings disappeared
* more nimble fixes
* fix sha256 exported symbol
* improve constraints on modular addition
* Add extra constraint to force reloading of pointer in reg inputs
* Fix LLVM gold linker running out of registers
* workaround MinGW64 GCC 12.2 bad codegen in t_pairing_cyclotomic_subgroup with LTO
* [testsuite] Rework parallel test runner to buffer beyond 65536 chars and properly wait for process exit
* [testsuite] improve error reporting
* rework openArray[byte/char] for BLS signature C API
* Prepare for optimized library and bindings
* properly link to constantine
* Compiler fixes, global sanitizers, GCC bug with --opt:size
* workaround/fix #229: don't inline field reduction in Fp2
* fix clang running out of registers with LTO
* [C API] missed length parameters for ctt_eth_bls_fast_aggregate_verify
* double-precision asm is too large for inlining, try to fix Linux and MacOS woes at https://github.com/mratsim/constantine/pull/228#issuecomment-1512773460
* Use FORTIFY_SOURCE for testing
* Fix#230 - gcc miscompiles Fp6 mul with LTO
* disable LTO for now, PR is too long
* try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores
* initial impl of parallel MSM - scaling to debug, threads not woken fast enough
* improve comment [skip ci]
* skip top window when c divides the number of bits
* for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue?
* Add counters and timers to audit threadpool bottlenecks
* metrics and profiling fixes, (slower) latency hiding, activate tests
* fix thief thread trying to wake another before canceling its own sleep
* easier to sort metrics and parallel endomorphism application
* selective endomorphism acceleration
* some tuning
* spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs
* improve MSM overview [skip ci]
* bench cleanup
* Threadpool: eventcount didn't put threads to actual sleep :/
* rework task awaiter sleep to prevent use-after-free race condition after task completion
* Need memory fence for StoreLoad synchronization ordering
* update design doc
* set memory order in sleep of eventcount
* cleanup debug logs
* comment cleanup [skip ci]
* remove reserve threads
* recover last perf diff: 1. don't import primitives, cpu features detection globals are noticeable, 2. noinit + conditional zeroMem are unnecessary when sync is inline 3. inline 'newSpawn' and don't init the loop part
* avoid syscalls if possible if thred is awake but idle
* renaming eventLoop
* remove unused code: steal-half
* renaming
* no need for 0-init sync, T can be large in cryptography