constantine/tests/gpu
Mamy Ratsimbazafy bf32c2d408
Parallel for (#222)
* introduce reserve threads to minimize latency and maximize throughput when awaiting a future

* introduce a ceilDiv proc

* threadpool: implement parallel-for loops

* 10x perf improvement by not waking reserveBackoff on syncAll

* bench overhead: new reserve system might introduce too much wakeup latency, 2x slower, for fine-grained parallelism

* add parallelForStrided

* Threadpool: Implement parallel reductions

* refactor parallel loop codegen: introduce descriptor, parsing and codegen stages

* parallel strided, test transpose bench

* tight loop is faster when backoff is not inline

* no POSIX stuff on windows, larger types for histogram bench

* fix tests

* max RSS overflow?

* missed an undefined var

* exit histogram on 32-bit

* forgot to return early dor 32-bit
2023-02-24 09:47:36 +01:00
..
hello_world_llvm.nim [Backend] Add support for Nvidia GPUs (#210) 2023-01-12 01:01:57 +01:00
hello_world_nvidia.nim chore: remove stew/byteutils dependencies and unneeded imports 2023-01-12 20:25:57 +01:00
t_nvidia_fp.nim Parallel for (#222) 2023-02-24 09:47:36 +01:00