* Expose Pippenger multiplication for combining multiple sigs of same msg
In many use cases, there are multiple signatures of the same message,
e.g., Ethereum attestations often share the signed `AttestationData`.
For that situation, `blst` started exposing Pippenger multiplication
to accelerate this use case. Multiscalar multiplication is much faster
than individual scalar multiplication of each signature / pubkey.
Further optimizations may be achieved with parallel tiling, see the Rust
binding code in the `npoints >= 32` situation:
- https://github.com/supranational/blst/blob/v0.3.13/bindings/rust/src/pippenger.rs
Likewise, multiple pubkeys / signatures may be loaded simultaneously
using the new `blst` APIs.
We don't do either of these additional optimizations as our architecture
does not readily support them. Pippenger multiplication alone already
offers a significant speedup until prioritizing further optimizations.
```
------------------------------------------------------------------------------------------------------------------------------------
BLS verif of 6 msgs by 6 pubkeys 117.232 ops/s 8530098 ns/op 20471994 cycles
BLS verif of 6 sigs of same msg by 6 pubkeys (with blinding) 553.186 ops/s 1807711 ns/op 4338371 cycles
BLS verif of 6 sigs of same msg by 6 pubkeys 724.279 ops/s 1380683 ns/op 3313617 cycles
------------------------------------------------------------------------------------------------------------------------------------
BLS verif of 60 msgs by 60 pubkeys 11.131 ops/s 89839743 ns/op 215615251 cycles
BLS verif of 60 sigs of same msg by 60 pubkeys (with blinding) 238.059 ops/s 4200634 ns/op 10081380 cycles
BLS verif of 60 sigs of same msg by 60 pubkeys 680.634 ops/s 1469219 ns/op 3526031 cycles
------------------------------------------------------------------------------------------------------------------------------------
BLS verif of 180 msgs by 180 pubkeys 3.887 ops/s 257298895 ns/op 617517127 cycles
BLS verif of 180 sigs of same msg by 180 pubkeys (with blinding) 166.340 ops/s 6011785 ns/op 14428186 cycles
BLS verif of 180 sigs of same msg by 180 pubkeys 536.938 ops/s 1862413 ns/op 4469689 cycles
------------------------------------------------------------------------------------------------------------------------------------
```
* Suppress `const` warning for Windows build
* Different approach for dealing with [-Wincompatible-pointer-types]
* Extend documentation
EIP-4844 requires BLST via `nim-kzg4844`; MIRACL Core is not supported.
Furthermore, BLST now has fallback for generic CPU architectures.
Therefore, remove support for the MIRACL Core backend.
Followup from #66 where we switched from Milagro to MIRACL Core.
The main module was not yet renamed. Doing now. Should not break
clients as the module is not usually directly imported.
* avoid blocking batchVerifyParallel
The current version of `batchVerifyParallel` calls `syncAll` which syncs
on all executing tasks.
This PR changes this to syncing a Flowvar instead thus allowing
`batchVerifyParallel` to be called as a task itself.
Requires https://github.com/status-im/nim-taskpools/pull/33
* autoselect too
---------
Co-authored-by: zah <zahary@gmail.com>
* Use taskpools instead of OpenMP
* actually parallelize the partial pairings
* update benches
* Actually make processing parallel
* Import taskpools only with --threads:on
* renaming
* Fix thread partitioning: split on numBatches and not numThreads
* missed input len in chunking
* slight optim to reduce number of tasks created
* Add test for batch signature forgery
* update nimble requirements
* Fix benchmark
* don't test multithreading on Windows 32-bit: no SynchronizationBarrier in MinGW
* add comment about elliptic affine coordinates negation [skip CI]
* Parallel API update
* Update blscurve/bls_batch_verifier.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
Co-authored-by: Jacek Sieka <jacek@status.im>
* initial multithreaded BLS batch verification support
* Add benchmarks, fix misuse of omp_get_num_threads, init, ptr
* Change parallel algo: fix tests, better scalability, parallelism cutoff
* remove leftover debugging attachGC
* No need to tune, the bench was doing only one iter
* Rebase indentation bug
* Fix stacktraces in threaded calls
* Fix Nim stacktraces crashing OpenMP
* Properly use the thread separation tag at context init
* Workaround linker static inline vec_zero visibility
* Address review comments:
- specialize to 32 bytes digest like SHA256
- init of BatchBLSVerifier
- mention that `incl` might not return true if we do subgroup checks there in the future instead of at deserialization
- also done in previous PR, mention that if openarray[PublicKeys].len == 0 it is a spec violation
* use openmp in nimble test for benchmark
* Update blscurve/blst/blst_min_pubkey_sig_core.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
* Address review comments
* update tests as well
Co-authored-by: Jacek Sieka <jacek@status.im>
* revive Miracl primitives benchmark
* Revive BLST benchmarks
* Bench hash-to-curve
* Add benchmark of BLS sign, verify and fastAggregateVerify
* Bench all + add benchmarks to CI
* don't bench on 32-bit, inline ASM issue with low-level calls (but high level calls are fine)
* Actually it's the SHA256 tests on 32-bit that causes ASM issue due to inlined headers
* don't bench at all on 32-bit for now
* fix: don't test SH1256 on PowerPC
* Update BLST
* Remove aliasing workaround (to be confirmed in NBC CI)
* Use the new SHA256 portable instead of Milagro fallback for non SSE3 CPU (ref da9ae49dab)
* Expose SHA256 from BLST
* add SHA256 to test
* On Azure MacOS it seems like the compiler aliases array[32, byte] at init :?
* reactivate commented out generators
* initial benchmark of scalar multiplication
* Warning in 32-bit mode + Add elliptic curve addition
* Hash to curve bench
* put benches behind when isMainModule
* Add pairing bench (2 impl) and bench_all
* mention that cycles measurements are approximate
* Fallback without cycle count for ARM (and MIPS, Sparc, ...)
* Add a nimble task