* try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores
* initial impl of parallel MSM - scaling to debug, threads not woken fast enough
* improve comment [skip ci]
* skip top window when c divides the number of bits
* for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue?
* Add counters and timers to audit threadpool bottlenecks
* metrics and profiling fixes, (slower) latency hiding, activate tests
* fix thief thread trying to wake another before canceling its own sleep
* easier to sort metrics and parallel endomorphism application
* selective endomorphism acceleration
* some tuning
* spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs
* improve MSM overview [skip ci]
* bench cleanup
* introduce reserve threads to minimize latency and maximize throughput when awaiting a future
* introduce a ceilDiv proc
* threadpool: implement parallel-for loops
* 10x perf improvement by not waking reserveBackoff on syncAll
* bench overhead: new reserve system might introduce too much wakeup latency, 2x slower, for fine-grained parallelism
* add parallelForStrided
* Threadpool: Implement parallel reductions
* refactor parallel loop codegen: introduce descriptor, parsing and codegen stages
* parallel strided, test transpose bench
* tight loop is faster when backoff is not inline
* no POSIX stuff on windows, larger types for histogram bench
* fix tests
* max RSS overflow?
* missed an undefined var
* exit histogram on 32-bit
* forgot to return early dor 32-bit
* unoptimized msm
* MSM: reorder loops
* add a signed windowed recoding technique
* improve wNAF table access
* use batchAffine
* revamp EC tests
* MSM signed digit support
* refactor MSM: recode signed ahead of time
* missing test vector
* refactor allocs and Alloca sideeffect
* add an endomorphism threshold
* Add Jacobian extended coordinates
* refactor recodings, prepare for parallelizable on-the-fly signed recoding
* recoding changes, introduce proper NAF for pairings
* more pairings refactoring, introduce miller accumulator for EVM
* some optim to the addchain miller loop
* start optimizing multi-pairing
* finish multi-miller loop refactoring
* minor tuning
* MSM: signed encoding suitable for parallelism (no precompute)
* cleanup signed window encoding
* add prefetching
* add metering
* properly init result to infinity
* comment on prefetching
* introduce vartime inversion for batch additions
* fix JacExt infinity conversion
* add batchAffine for MSM, though slower than JacExtended at the moment
* add a batch affine scheduler for MSM
* Add Multi-Scalar-Multiplication endomorphism acceleration
* some tuning
* signed integer fixes + 32-bit + tuning
* Some more tuning
* common msm bench + don't use affine for c < 9
* nit
* create a codecs.nim file for hex/base64 and other encoding conversions
* improve maintenance/readability of hex conversion
* add skeleton of constant-time base64 decoding
* use raw casts
* use raw casts only for same size types
* [Threadpool] Fix syncAll releasing while a thread was attempting to steal + force no exception in tasks
* fix unguarded access on MacOS barriers
* parallel batchadd
* moved import
* Try to compile with GMP on windows and 32-bit linux
* remove leftover msys shell
* Don't use GMP Mersenne Twister, bad randomness and untested Nim wrapper
* properly cache nim
* fix path after cache
* run pacman in msys2 env
* rework msys2 ... again
* shell compat for file clearing
* shell compat try-again for file clearing
* force bash for clearing parallel builds on windows
* Use nimscript directly (why didn't it work last time?)
* Avoid IO redirection to support any shell
* Avoid IO redirection v2 to support any shell
* add debug data
* add debug again
* Introduce pararun, a parallel test runner to remove need of GNU parallel
* pararun: style
* First draft at bindings generation
* finite field bindings PoC
* support openarray, export NimMain
* PoC extension fields and elliptic curve bindings
* Pasta
* expose more bindings, remove nimZeroMem, remove tracer when unused, codegen name_mangling`gensym issue
* workaround bad C gensym codegen with {.inline.} pragma in non-dirty template nested in generic proc instantiated by template
* Skeleton of hash to curve for BLS12-381 G1
* Remove isodegree parameter
* Fix polynomial evaluation of hashToG1
* Optimize hash_to_curve and add bench for hash to G1
* slight optim of jacobian isomap + v7 test vectors