* use LTO in release builds
This significantly (40%) speeds up block replay and hashing - for example replaying first 1000
blocks, without/with LTO:
```
[arnetheduck@tempus ncli]$ ../env.sh nim c -d:release ncli_db
[arnetheduck@tempus ncli]$ ./ncli_db bench --db:db --network:medalla --slots:1000
Loaded 215006 blocks, head slot 307400
All time are ms
Average, StdDev, Min, Max, Samples, Test
Validation is turned off meaning that no BLS operations are performed
25468.481, 0.000, 25468.481, 25468.481, 1, Initialize DB
0.297, 0.516, 0.053, 13.645, 721, Load block from database
26.458, 0.000, 26.458, 26.458, 1, Load state from database
20.737, 8.288, 11.096, 199.325, 690, Apply block
333.069, 62.798, 45.225, 429.452, 31, Apply epoch block
0.000, 0.000, 0.000, 0.000, 0, Database block store
```
```
[arnetheduck@tempus ncli]$ ../env.sh nim c -d:release --passc:-flto --passl:-flto --stacktrace:off ncli_db
[arnetheduck@tempus ncli]$ ./ncli_db bench --db:db --network:medalla --slots:1000
Loaded 215006 blocks, head slot 307400
All time are ms
Average, StdDev, Min, Max, Samples, Test
Validation is turned off meaning that no BLS operations are performed
23903.006, 0.000, 23903.006, 23903.006, 1, Initialize DB
0.253, 0.122, 0.047, 0.731, 721, Load block from database
24.455, 0.000, 24.455, 24.455, 1, Load state from database
18.734, 7.062, 10.346, 167.397, 690, Apply block
194.869, 33.175, 29.311, 226.981, 31, Apply epoch block
```
Epoch processing is heavy on both arithmetics and hash caching, both of which get a
significant boost here.
This makes sense: nim creates lots of small functions spread out over many C files. A much
worse solution is to try to annotate code with `inline` - it copies functions to multiple
C files but still doesn't do intermodule optimizations significantly limiting the
compilers' ability to reason about the code, causing bloat and misrepresenting the usefulness
of a function to the call frequency analysis that drives actual (C-compiler) inlining and many
other optimizations.
In particular, many nim functions are part of `system` or the `C` backend - stack tracing,
memory allocation etc - nim's inlining system is pretty incomplete in that it does not deal
with these and many other cases.
* windows workaround
* skip LTO on windows for now
* Clean up PR + bump nimbus-build-system
* pcre on 32-bit + Improve env variable handling + cache mingw
* Add badge + fix setting env variable
* Auto cancel if commit becomes outdated
* fix shell for deriving env variable
* Add more cancellation points
* Add finalization tests to Github Action
* Fix case
* change cancel actions + fixes for windows and finalization
* have to use matrix variable for cache path/key
* ARCH_OVERRIDE=LATFORM issue rebuild cache
* Update scripts - deactivate workflows with identified issues + reactivate caching
* workaround mac getopt
* Disable all aAVX512f extensions (Error: invalid register for .seh_savexmm in Cygwin)
* Fix cross compile of libminiupmp #1723
* Cache fetch-dlls to avoid being a drag on nim-lang.org
* Fix windows downloading DLLs twice and set CFLAGS env variable for Linux32
* fix silly yaml mistake
* .
* reactivate win32 after https://github.com/status-im/nim-beacon-chain/pull/1726
* Comment out minimal tests for now
* Bump BLSCurve
* Use unified aggregation API
* use new blscurve with unified aggregate API
* bump
* fix toRaw
* replace state_sim combine with AggregateSignature
* Fix 32-bit
* Fix 32-bit for real and test deactivating ccache for fno-tree-lopp-vectorize flag
* change compilation switches to narrow down Linux issue
* Use -fno-tree-vectorize to disable both tree-loop-vectorize and tree-slp-vectorize
* blscurve now disables both Loop and SLP vectorization
* Add tests for the miracl/milagro fallback
* Travis has max log size of 4MB
* Test with Miracl in the finalization test
* fix state_sim log level
* Coment out the slow fallback tests