constantine

Commit Graph

Author	SHA1	Message	Date
Mamy Ratsimbazafy	33c3a2e8c4	[Research] x86 code generator (#234 ) * rename compilers -> intrinsics, math_gpu -> math_codegen * stash x86 codegen in research	2023-04-27 21:52:51 +02:00
Mamy Ratsimbazafy	c6d9a213f2	Rework assembly to be compatible with LTO (#231 ) * rework assembler register/mem and constraint declarations * Introduce constraint UnmutatedPointerToWriteMem * Create invidual memory cell operands * [Assembly] fully support indirect memory addressing * fix calling convention for exported procs * Prepare for switch to intel syntax to avoid clang constant propagation asm symbol name interfering OR pointer+offset addressing * use modifiers to prevent bad string mixin fo assembler to linker of propagated consts * Assembly: switch to intel syntax * with working memory operand - now works with LTO on both GCC and clang and constant folding * use memory operand in more places * remove some inline now that we have lto * cleanup compiler config and benches * tracer shouldn't force dependencies when unused * fix cc on linux * nimble fixes * update README [skip CI] * update MacOS CI with Homebrew Clang * oops nimble bindings disappeared * more nimble fixes * fix sha256 exported symbol * improve constraints on modular addition * Add extra constraint to force reloading of pointer in reg inputs * Fix LLVM gold linker running out of registers * workaround MinGW64 GCC 12.2 bad codegen in t_pairing_cyclotomic_subgroup with LTO	2023-04-26 06:58:31 +02:00
Mamy Ratsimbazafy	9a7137466e	C API for Ethereum BLS signatures (#228 ) * [testsuite] Rework parallel test runner to buffer beyond 65536 chars and properly wait for process exit * [testsuite] improve error reporting * rework openArray[byte/char] for BLS signature C API * Prepare for optimized library and bindings * properly link to constantine * Compiler fixes, global sanitizers, GCC bug with --opt:size * workaround/fix #229: don't inline field reduction in Fp2 * fix clang running out of registers with LTO * [C API] missed length parameters for ctt_eth_bls_fast_aggregate_verify * double-precision asm is too large for inlining, try to fix Linux and MacOS woes at https://github.com/mratsim/constantine/pull/228#issuecomment-1512773460 * Use FORTIFY_SOURCE for testing * Fix #230 - gcc miscompiles Fp6 mul with LTO * disable LTO for now, PR is too long	2023-04-18 22:02:23 +02:00
Mamy Ratsimbazafy	93dac2503c	MSM tuning for high core count (#227 ) * tune for high core count * reentrancy: allow nesting of parallel functions by introducing precise scoped barriers * increase collision queue depth	2023-04-14 20:02:59 +02:00
Mamy Ratsimbazafy	6c48975aee	Parallel Multi-Scalar-Multiplication (#226 ) * try parallel reduction in batch add, but alas it's slower than custom chunking. Except maybe on arch with performance/efficiency cores * initial impl of parallel MSM - scaling to debug, threads not woken fast enough * improve comment [skip ci] * skip top window when c divides the number of bits * for some reason parallel-for loops scale on 5+ threads while spawn only on 2x threads. Thread wakeup issue? * Add counters and timers to audit threadpool bottlenecks * metrics and profiling fixes, (slower) latency hiding, activate tests * fix thief thread trying to wake another before canceling its own sleep * easier to sort metrics and parallel endomorphism application * selective endomorphism acceleration * some tuning * spawn can handle compile-time literals, static and type parameters. Also introduce spawnAwaitable to await void procs * improve MSM overview [skip ci] * bench cleanup	2023-04-10 23:30:14 +02:00
Mamy Ratsimbazafy	4dc2610557	Bindings "filesystem" (#225 ) * bindings structure * missed some renaming * add back the headers * path fixes * need to sleep at night * windows path mystery is unfathomable	2023-03-01 12:59:06 +01:00
Mamy Ratsimbazafy	1cb6c3d9e1	[Threadpool] Backoff revamp (#224 ) * Threadpool: eventcount didn't put threads to actual sleep :/ * rework task awaiter sleep to prevent use-after-free race condition after task completion * Need memory fence for StoreLoad synchronization ordering * update design doc * set memory order in sleep of eventcount * cleanup debug logs * comment cleanup [skip ci]	2023-02-25 17:11:33 +01:00
Mamy Ratsimbazafy	1dfbb8bd4f	[Threadpool] Remove reserve threads (#223 ) * remove reserve threads * recover last perf diff: 1. don't import primitives, cpu features detection globals are noticeable, 2. noinit + conditional zeroMem are unnecessary when sync is inline 3. inline 'newSpawn' and don't init the loop part * avoid syscalls if possible if thred is awake but idle * renaming eventLoop * remove unused code: steal-half * renaming * no need for 0-init sync, T can be large in cryptography	2023-02-24 17:36:04 +01:00
Mamy Ratsimbazafy	bf32c2d408	Parallel for (#222 ) * introduce reserve threads to minimize latency and maximize throughput when awaiting a future * introduce a ceilDiv proc * threadpool: implement parallel-for loops * 10x perf improvement by not waking reserveBackoff on syncAll * bench overhead: new reserve system might introduce too much wakeup latency, 2x slower, for fine-grained parallelism * add parallelForStrided * Threadpool: Implement parallel reductions * refactor parallel loop codegen: introduce descriptor, parsing and codegen stages * parallel strided, test transpose bench * tight loop is faster when backoff is not inline * no POSIX stuff on windows, larger types for histogram bench * fix tests * max RSS overflow? * missed an undefined var * exit histogram on 32-bit * forgot to return early dor 32-bit	2023-02-24 09:47:36 +01:00
Mamy Ratsimbazafy	8993789ddf	fix #221	2023-02-16 13:54:21 +01:00
Mamy Ratsimbazafy	e5612f5705	Multi-Scalar-Multiplication / Linear combination (#220 ) * unoptimized msm * MSM: reorder loops * add a signed windowed recoding technique * improve wNAF table access * use batchAffine * revamp EC tests * MSM signed digit support * refactor MSM: recode signed ahead of time * missing test vector * refactor allocs and Alloca sideeffect * add an endomorphism threshold * Add Jacobian extended coordinates * refactor recodings, prepare for parallelizable on-the-fly signed recoding * recoding changes, introduce proper NAF for pairings * more pairings refactoring, introduce miller accumulator for EVM * some optim to the addchain miller loop * start optimizing multi-pairing * finish multi-miller loop refactoring * minor tuning * MSM: signed encoding suitable for parallelism (no precompute) * cleanup signed window encoding * add prefetching * add metering * properly init result to infinity * comment on prefetching * introduce vartime inversion for batch additions * fix JacExt infinity conversion * add batchAffine for MSM, though slower than JacExtended at the moment * add a batch affine scheduler for MSM * Add Multi-Scalar-Multiplication endomorphism acceleration * some tuning * signed integer fixes + 32-bit + tuning * Some more tuning * common msm bench + don't use affine for c < 9 * nit	2023-02-16 12:45:05 +01:00
Mamy Ratsimbazafy	082cd1deb9	MSB-to-LSB minimum Hamming Weight Recoding (#219 ) * signed recoding * use recoding	2023-02-07 16:27:53 +01:00
Mamy Ratsimbazafy	7c5421ffdc	move staticFor to the inner repo, not helpers/ for unblocking nimble install (#216 )	2023-02-07 13:11:44 +01:00
Mamy Ratsimbazafy	a11fca9c60	panics:on (#218 )	2023-02-07 13:11:15 +01:00
Mamy Ratsimbazafy	cbb454fff1	Codecs (#217 ) * create a codecs.nim file for hex/base64 and other encoding conversions * improve maintenance/readability of hex conversion * add skeleton of constant-time base64 decoding * use raw casts * use raw casts only for same size types	2023-02-07 13:10:17 +01:00
Mamy Ratsimbazafy	95114bf707	move research sanity check to research/ [skip ci]	2023-01-30 20:57:12 +01:00
Mamy Ratsimbazafy	495ef4497b	Parallel batchadd (#215 ) * [Threadpool] Fix syncAll releasing while a thread was attempting to steal + force no exception in tasks * fix unguarded access on MacOS barriers * parallel batchadd * moved import	2023-01-29 01:06:37 +01:00
Mamy Ratsimbazafy	a385acf2b8	Fix isZeroMask in SignedSecretWord	2023-01-29 01:05:54 +01:00
Mamy Ratsimbazafy	915f89fdd6	remove static/constant constraint on Montgomery	2023-01-28 18:25:30 +01:00
Mamy Ratsimbazafy	ff8c26c1fe	BLS Aggregate and Batch verify (#214 ) * pairing -> pairings, and use alloca arrays instead of static arrays * aggregate and batched BLS signature * DLL generation broken by path changes	2023-01-27 00:42:12 +01:00
Mamy Ratsimbazafy	7c01affe24	speedup test suite, focus on "integration" tests	2023-01-25 05:47:57 +01:00
Mamy Ratsimbazafy	2931913b67	Add a threadpool (#213 ) * Implement a threadpool * int and SomeUnsignedInt ... * Type conversion for windows SynchronizationBarrier * Use the latest MacOS 11, Big Sur API (jan 2021) for MacOS futexes, Github action offers MacOS 12 and can test them * bench need posix timer not available on windows and darwin futex * Windows: nimble exec empty line is an error, Mac: use defined(osx) instead of defined(macos) * file rename * okay, that's the last one hopefully * deactivate stealHalf for now	2023-01-24 02:32:28 +01:00
Mamy Ratsimbazafy	188f3e710c	add fast_aggregate_verify	2023-01-23 01:54:40 +01:00
Mamy Ratsimbazafy	4be89d309f	chore: remove stew/byteutils dependencies and unneeded imports	2023-01-12 20:25:57 +01:00
Mamy Ratsimbazafy	4052a07611	chore: cleanup TODOs, unused constants	2023-01-12 01:27:23 +01:00
Mamy Ratsimbazafy	1f4bb174a3	[Backend] Add support for Nvidia GPUs (#210 ) * Add PoC of JIT exec on Nvidia GPUs [skip ci] * Split GPU bindings into low-level (ABI) and high-level [skip ci] * small typedef reorg [skip ci] * refine LLVM IR/Nvidia GPU hello worlds * [Nvidia GPU] PoC implementation of field addition [skip ci] * prod-ready field addition + tests on Nvidia GPUs via LLVM codegen	2023-01-12 01:01:57 +01:00
Mamy Ratsimbazafy	c0b30a08be	style: casing of WordBitWidth/WordBitwidth	2023-01-11 19:31:23 +01:00
Mamy Ratsimbazafy	53a5729442	Remove sanity checks 'when isMainModule' superceded by comprehensive tests	2023-01-10 00:23:07 +01:00
Mamy Ratsimbazafy	928f515582	Batch additions (#207 ) * Batch elliptic curve addition * accelerate chained muls * jac mixed add handle doubling. jac additions handle aliasing when adding infinity * properly skip sanitizer on BLS signature test * properly skip sanitizer² on BLS signature test	2022-10-29 22:43:40 +02:00
Mamy Ratsimbazafy	93654d580e	pararun: Ignore error #259 , sha256: add back a paper	2022-09-19 09:11:16 +02:00
Mamy Ratsimbazafy	d515bebdba	pararun: MacOS, weird error 259 when accumulating pipes or processes	2022-09-19 03:14:44 +02:00
Mamy Ratsimbazafy	351a3f6bd2	Sha256 refactor (#206 ) * sha256: separate message scheduling and state updates to help implement specific use-cases like #205; also implement SSSE3 acceleration (2006, Intel Core 2 Duo) * sha256: simplify update flow, store less metadata in context * sha256: Fix reworked update function * Implement x86 hardware SHA acceleration * typo	2022-09-19 02:02:57 +02:00
Mamy Ratsimbazafy	b901dd5878	CI: pure C can link to GMP, but Nim cannot LoadLib GMP, not found	2022-09-19 02:02:04 +02:00
Mamy Ratsimbazafy	495d5fa9fd	don't run afoul of pipe limits	2022-09-19 01:47:18 +02:00
Mamy Ratsimbazafy	fb594c5938	OpenSSL upstream: no more SHA256 public function :/, skip in Windows CI	2022-09-19 01:22:01 +02:00
Mamy Ratsimbazafy	7c7290115f	nimble: fix bench_poly1305, improve reporting in pararun	2022-09-19 00:41:37 +02:00
Mamy Ratsimbazafy	cc47c27cca	pararun: don't swallow failures	2022-09-18 23:37:35 +02:00
Mamy Ratsimbazafy	2f6144fb7a	add missing benches compilation to test suite	2022-09-18 15:26:42 +02:00
Mamy Ratsimbazafy	d4e202ead5	Don't use array[^1], it can throw and cannot be locally turn off	2022-09-17 18:52:52 +02:00
Mamy Ratsimbazafy	df048112c3	Example+Test C API vs GMP (#203 ) * Example+Test C API vs GMP * Create build directory for bindings test * --nimMainPrefix is 1.6 only * Add libdl for dynamic loading * absolute paths * add static link test * Fix man main, rename Nimmain to init_NimMain * Deal with MacOS annoying linker w.r.t. static libraries * use .exe extension to satisfy windows (?) * annoying GCC which doesn't create paths * Try skipping DLL test on windows * windows extensions ... * no lib prefix on windows	2022-09-15 17:11:57 +02:00
Mamy Ratsimbazafy	962e7ccf49	CI: enable GMP tests on Windows and Linux 32-bit and fix caching (#204 ) * Try to compile with GMP on windows and 32-bit linux * remove leftover msys shell * Don't use GMP Mersenne Twister, bad randomness and untested Nim wrapper * properly cache nim * fix path after cache * run pacman in msys2 env * rework msys2 ... again * shell compat for file clearing * shell compat try-again for file clearing * force bash for clearing parallel builds on windows * Use nimscript directly (why didn't it work last time?) * Avoid IO redirection to support any shell * Avoid IO redirection v2 to support any shell * add debug data * add debug again * Introduce pararun, a parallel test runner to remove need of GNU parallel * pararun: style	2022-09-15 09:33:34 +02:00
Mamy Ratsimbazafy	094445482b	Eip2333 (#202 ) * HMAC-SHA256 * EIP2333 * activate EIP2333 tests and faster random test case generation	2022-08-16 12:07:57 +02:00
Mamy Ratsimbazafy	9770b3108c	Fp12 over fp6 (#201 ) * introduce sumprod for direct fp6_mul * change curves -> constants * forgotten constants * Full pairing using Fp2->Fp6->Fp12 towering	2022-08-14 09:48:10 +02:00
Mamy Ratsimbazafy	37354e9ca8	faster isSquare: faster hash_to_curve (BN254) and point deserialization (BLS12-377) closes #199	2022-08-07 20:50:24 +02:00
Mamy Ratsimbazafy	74a23244d2	bench isSquare	2022-08-07 19:50:28 +02:00
Mamy Ratsimbazafy	f35257d947	camelCase in C -> snake_case	2022-08-06 22:11:03 +02:00
Mamy Ratsimbazafy	a17fb3b4c1	Fix compiler hints and warnings (unused import/variables, ...)	2022-08-06 19:55:35 +02:00
Mamy Ratsimbazafy	99c9730793	Self-contained bindings generation (#196 ) * First draft at bindings generation * finite field bindings PoC * support openarray, export NimMain * PoC extension fields and elliptic curve bindings * Pasta * expose more bindings, remove nimZeroMem, remove tracer when unused, codegen name_mangling`gensym issue * workaround bad C gensym codegen with {.inline.} pragma in non-dirty template nested in generic proc instantiated by template	2022-08-06 19:05:54 +02:00
Mamy Ratsimbazafy	7d29cb947a	Prepare for bindings generation	2022-07-16 13:34:27 +02:00
Mamy Ratsimbazafy	e29e529f18	Add multipairing for BN curves (#194 )	2022-05-08 19:01:23 +02:00

1 2 3 4 5 ...

410 Commits All Branches Search

410 Commits

All Branches