* introduce reserve threads to minimize latency and maximize throughput when awaiting a future
* introduce a ceilDiv proc
* threadpool: implement parallel-for loops
* 10x perf improvement by not waking reserveBackoff on syncAll
* bench overhead: new reserve system might introduce too much wakeup latency, 2x slower, for fine-grained parallelism
* add parallelForStrided
* Threadpool: Implement parallel reductions
* refactor parallel loop codegen: introduce descriptor, parsing and codegen stages
* parallel strided, test transpose bench
* tight loop is faster when backoff is not inline
* no POSIX stuff on windows, larger types for histogram bench
* fix tests
* max RSS overflow?
* missed an undefined var
* exit histogram on 32-bit
* forgot to return early dor 32-bit
* Try to compile with GMP on windows and 32-bit linux
* remove leftover msys shell
* Don't use GMP Mersenne Twister, bad randomness and untested Nim wrapper
* properly cache nim
* fix path after cache
* run pacman in msys2 env
* rework msys2 ... again
* shell compat for file clearing
* shell compat try-again for file clearing
* force bash for clearing parallel builds on windows
* Use nimscript directly (why didn't it work last time?)
* Avoid IO redirection to support any shell
* Avoid IO redirection v2 to support any shell
* add debug data
* add debug again
* Introduce pararun, a parallel test runner to remove need of GNU parallel
* pararun: style