Mamy Ratsimbazafy 2931913b67
Add a threadpool (#213)
* Implement a threadpool

* int and SomeUnsignedInt ...

* Type conversion for windows SynchronizationBarrier

* Use the latest MacOS 11, Big Sur API (jan 2021) for MacOS futexes, Github action offers MacOS 12 and can test them

* bench need posix timer not available on windows and darwin futex

* Windows: nimble exec empty line is an error, Mac: use defined(osx) instead of defined(macos)

* file rename

* okay, that's the last one hopefully

* deactivate stealHalf for now
2023-01-24 02:32:28 +01:00

2.2 KiB

Random permutations

Work-stealing is more efficient when the thread we steal from is randomized. If all threads steal in the same order, we increase contention on the start victims task queues.

The randomness quality is not important besides distributing potential contention, i.e. randomly trying thread i, then i+1, then i+n-1 (mod n) is good enough.

Hence for efficiency, so that a thread can go to sleep faster, we want to reduce calls to to the RNG as:

  • Getting a random value itself can be expensive, especially if we use a CSPRNG (not a requirement)
  • a CSPRNG can be starved of entropy as with small tasks, threads might make millions of calls.
  • If we want unbiaised thread ID generation in a range, rejection sampling is costly (not a requirement).

Instead of using Fisher-Yates

  • generates the victim set eagerly, inefficient if the first steal attempts are successful
  • needs a RNG call when sampling a victim
  • memory usage: numThreads per thread so numthreads² uint8 (255 threads max) or uint32

or a sparseset

  • 1 RNG call when sampling a victim
  • memory usage: 2*numThreads per thread so 2*numthreads² uint8 (255 threads max) or uint32

we can use Linear Congruential Generators, a recurrence relation of the form Xₙ₊₁ = aXₙ+c (mod m) If we respect the Hull-Dobell theorem requirements, we can generate pseudo-random permutations in [0, m) with fixed memory usage whatever the number of potential victims: just 4 registers for a, x, c, m

References:

And if we want cryptographic strength: