nimbus-eth1

Commit Graph

Author	SHA1	Message	Date
Siddarth Kumar	72d08030d9	fix: check for mismatching ranges in benchmark csv (#2914 )	2024-12-06 13:01:33 +01:00
Jacek Sieka	f034af422a	Pre-allocate vids for branches (#2882 ) Each branch node may have up to 16 sub-items - currently, these are given VertexID based when they are first needed leading to a mostly-random order of vertexid for each subitem. Here, we pre-allocate all 16 vertex ids such that when a branch subitem is filled, it already has a vertexid waiting for it. This brings several important benefits: * subitems are sorted and "close" in their id sequencing - this means that when rocksdb stores them, they are likely to end up in the same data block thus improving read efficiency * because the ids are consequtive, we can store just the starting id and a bitmap representing which subitems are in use - this reduces disk space usage for branches allowing more of them fit into a single disk read, further improving disk read and caching performance - disk usage at block 18M is down from 84 to 78gb! * the in-memory footprint of VertexRef reduced allowing more instances to fit into caches and less memory to be used overall. Because of the increased locality of reference, it turns out that we no longer need to iterate over the entire database to efficiently generate the hash key database because the normal computation is now faster - this significantly benefits "live" chain processing as well where each dirtied key must be accompanied by a read of all branch subitems next to it - most of the performance benefit in this branch comes from this locality-of-reference improvement. On a sample resync, there's already ~20% improvement with later blocks seeing increasing benefit (because the trie is deeper in later blocks leading to more benefit from branch read perf improvements) ``` blocks: 18729664, baseline: 190h43m49s, contender: 153h59m0s Time (total): -36h44m48s, -19.27% ``` Note: clients need to be resynced as the PR changes the on-disk format R.I.P. little bloom filter - your life in the repo was short but valuable	2024-12-04 11:42:04 +01:00
Jacek Sieka	01ca415721	Store keys together with node data (#2849 ) Currently, computed hash keys are stored in a separate column family with respect to the MPT data they're generated from - this has several disadvantages: * A lot of space is wasted because the lookup key (`RootedVertexID`) is repeated in both tables - this is 30% of the `AriKey` content! * rocksdb must maintain in-memory bloom filters and LRU caches for said keys, doubling its "minimal efficient cache size" * An extra disk traversal must be made to check for existence of cached hash key * Doubles the amount of files on disk due to each column family being its own set of files Here, the two CFs are joined such that both key and data is stored in `AriVtx`. This means: * we save ~30% disk space on repeated lookup keys * we save ~2gb of memory overhead that can be used to cache data instead of indices * we can skip storing hash keys for MPT leaf nodes - these are trivial to compute and waste a lot of space - previously they had to present in the `AriKey` CF to avoid having to look in two tables on the happy path. * There is a small increase in write amplification because when a hash value is updated for a branch node, we must write both key and branch data - previously we would write only the key * There's a small shift in CPU usage - instead of performing lookups in the database, hashes for leaf nodes are (re)-computed on the fly * We can return to slightly smaller on-disk SST files since there's fewer of them, which should reduce disk traffic a bit Internally, there are also other advantages: * when clearing keys, we no longer have to store a zero hash in memory - instead, we deduce staleness of the cached key from the presence of an updated VertexRef - this saves ~1gb of mem overhead during import * hash key cache becomes dedicated to branch keys since leaf keys are no longer stored in memory, reducing churn * key computation is a lot faster thanks to the skipped second disk traversal - a key computation for mainnet can be completed in 11 hours instead of ~2 days (!) thanks to better cache usage and less read amplification - with additional improvements to the on-disk format, we can probably get rid of the initial full traversal method of seeding the key cache on first start after import All in all, this PR reduces the size of a mainnet database from 160gb to 110gb and the peak memory footprint during import by ~1-2gb.	2024-11-20 09:56:27 +01:00
Jacek Sieka	b4b4d16729	speed up key computation (#2642 ) * batch database key writes during `computeKey` calls * log progress when there are many keys to update * avoid evicting the vertex cache when traversing the trie for key computation purposes * avoid storing trivial leaf hashes that directly can be loaded from the vertex	2024-09-20 07:43:53 +02:00
tersec	838c9854e7	increase Python dependencies to address urllib3 vuln and certifi root cert (#2605 )	2024-09-10 06:36:28 +00:00
Jacek Sieka	9826557184	fix make_states script dir	2024-08-19 10:16:32 +02:00
Jacek Sieka	8723a79225	add era dir to make_states	2024-08-12 14:49:32 +02:00
Jacek Sieka	3d3831dde8	Small cleanups (#2435 ) * avoid costly hike memory allocations for operations that don't need to re-traverse it * avoid unnecessary state checks (which might trigger unwanted state root computations) * disable optimize-for-hits due to the MPT no longer being complete at all times	2024-07-01 14:07:39 +02:00
Jacek Sieka	55ebd70d1e	stats: interpolate, remove some broken stats	2024-06-29 06:36:35 +02:00
tersec	2aaab1cb4a	fix Dependabot alerts (#2375 )	2024-06-17 15:30:43 +02:00
Jacek Sieka	189a20bbae	Avoid recomputing hashes when persisting data (#2350 )	2024-06-14 07:10:00 +02:00
Jacek Sieka	eb041abba7	avoid unnecessary memory allocations and lookups (#2334 ) * use `withValue` instead of `hasKey` + `[]` * avoid `@` et al * parse database data inside `onData` instead of making seq then parsing	2024-06-11 11:38:58 +02:00
web3-developer	db8c5b90bd	Cleanup stateless and block witness code. (#2295 ) * Cleanup unneeded stateless and block witness code. Keeping MultiKeys which is used in the eth_getProofsByBlockNumber RPC endpoint which is needed for the Fluffy state network bridge. * Rename generateWitness flag to collectWitnessData to better describe what the flag does. We only collect the keys of the touched accounts and storage slots but no block witness generation is supported for now. * Move remaining stateless code into nimbus directory. * Add vmstate parameter to ChainRef to fix test. * Exclude *.in from check copyright year --------- Co-authored-by: jangko <jangko128@gmail.com>	2024-06-08 15:05:00 +07:00
Jacek Sieka	32c51b14a4	keccak: improve perf a little (#2321 ) * avoid `burnMem` * avoid zeroing buffers * work around `when nimvm` issue	2024-06-07 16:48:27 +00:00
Jacek Sieka	ce80ed79a5	fix display in early blocks and time avg	2024-06-07 12:29:34 +02:00
Jacek Sieka	c5b3081828	eth: bump (#2308 ) * eth: bump Speed up basic operations like hashing and creating RLP:s - up to 25% improvement in certain block ranges! ``` 876729c.csv /data/nimbus_stats/stats-20240605_2204-ed4f6221.csv stats-20240605_2000-c876729c.csv vs stats-20240605_2204-ed4f6221.csv bps_x bps_y tps_x tps_y bpsd tpsd timed block_number (500001, 888889] 1,017.72 996.07 1,784.96 1742.438676 -2.72% -2.72% 3.31% (888889, 1277778] 528.00 536.30 2,159.79 2198.781046 1.69% 1.69% -1.44% (1277778, 1666667] 324.29 317.78 2,064.48 2008.106377 -2.82% -2.82% 3.33% (1666667, 2055556] 253.87 258.74 1,840.94 1872.935273 1.67% 1.67% -1.39% (2055556, 2444445] 175.79 178.66 1,340.61 1363.248939 0.93% 0.93% -0.74% (2444445, 2833334] 137.27 159.74 958.75 1113.323757 14.24% 14.24% -10.69% (2833334, 3222223] 170.48 228.63 1,272.70 1704.047195 34.41% 34.41% -25.17% (3222223, 3611112] 127.49 125.48 1,572.39 1548.835791 -1.19% -1.19% 1.47% (3611112, 4000001] 37.25 40.42 1,100.65 1184.740493 9.58% 9.58% -7.04% blocks: 3501696, baseline: 11h59m40s, contender: 11h21m38s bpsd (mean): 6.18% tpsd (mean): 6.18% Time (sum): -38m1s, -4.26% bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-) + = more is better, - = less is better ``` * ignore gitignore	2024-06-06 23:39:09 +00:00
Jacek Sieka	e9f2608cd0	no cr for scripts	2024-06-06 14:38:58 +02:00
Jacek Sieka	0c6c84f2ce	Script for comparing csv outputs from block import	2024-06-06 14:33:49 +02:00
Jacek Sieka	a375720c16	import: read from era files (#2254 ) This PR extends the `nimbus import` command to also allow reading from era files - this command allows creating or topping up an existing database with data coming from era files instead of network sync. * add `--era1-dir` and `--max-blocks` options to command line * make `persistBlocks` report basic stats like transactions and gas * improve error reporting in several API * allow importing multiple RLP files in one go * clean up logging options to match nimbus-eth2 * make sure database is closed properly on shutdown	2024-05-31 09:13:56 +02:00
Jordan Hrycaj	de0388919f	Unified mode for undumping gzip-ed or era1-ed encoded block dumps (#2198 ) ackn: Built on Daniel's work	2024-05-20 13:59:18 +00:00
Kim De Mey	d3a706c229	Replace status-im/portal-spec-tests with ethereum fork version (#2097 ) - The fluffy test vector repo got forked (well, copied rather) to become the official one under ethereum github org, so we change to that repo now and archive ours. - Our repo also stored accumulator / historical_roots, replace that with a new repo which is only for network configs. - Several changes needed to be made due to test vectors that got updated + some of them got changed to / are yaml format instead of json.	2024-03-22 11:28:44 +01:00
jangko	b0000eed8b	Add check copyright year linter to CI	2023-11-01 10:41:20 +07:00
jangko	2f6b4de3e9	ci: fix nightly build	2023-02-23 18:34:04 +07:00
Ștefan Talpalaru	51bc1cf87f	dist: precompiled binaries and Docker images (#1015 ) * dist: precompiled binaries and Docker images The builds are reproducible, the binaries are portable and statically link librocksdb. This took some patching. Upstream PR: https://github.com/facebook/rocksdb/pull/9752 32-bit ARM is missing as a target because two different GCC versions fail with an ICE when trying to cross-compile RocksDB. Using Clang instead is too much trouble for a platform that nobody should be using anyway. (Clang doesn't come with its own target headers and libraries, can't be easily convinced to use the ones from GCC, so it needs an fs image from a 32-bit ARM distro - at which point I stopped caring). * CI: disable reproducibility test	2022-03-27 13:21:15 +02:00

24 Commits