nimbus-eth2

Commit Graph

Author	SHA1	Message	Date
Jacek Sieka	63a3f2b1ad	Tighten chunk decoding limits (#4264 ) * cap maximum number of chunks to download from peer (fixes #1620) * drop support for requesting blocks via v1 / phase0 protocol * tighten bounds checking of fixed-size messages	2022-10-27 18:51:43 +02:00
Etan Kissling	8936212f93	descore on empty response for range w known block (#4050 ) The sync protocol does not distinguish between: - All requested slots are empty - Peer does not have data available about requested range Therefore, we treat EOF for `beacon_blocks_by_range` and for `beacon_blocks_by_range` as valid responses, as if the entire epoch really contained no single block for any slot. Once a followup response provides new blocks, we detect that some blocks were missing and rewind. During backfill, we also request the known-to-exist `backfill.slot`, so we can actually detect whether an epoch really does not have blocks or whether a response is incomplete (`PeerScoreNoBlocks`).	2022-09-03 23:12:58 +02:00
Miran	dfd4afc9f2	compatibility with Nim 1.4+ (#3888 )	2022-07-29 10:53:42 +00:00
Eugene Kabanov	1b6651dfc3	Fix /eth/v1/node/syncing (#3720 ) * Fix REST `/eth/v1/node/syncing` call to return values even if SyncManager is not running. * Use syncManager.inProgress as is_syncing indicator.	2022-06-14 22:26:23 +02:00
Etan Kissling	7b04a94d43	fix #3674 (Sync progress >100% on checkpoint sync) (#3736 ) Corrects an off-by-1 in the reported sync percentage computation. New logic is based on `SyncQueue.total` and `SyncQueue.progress` with `pivot` instead of `sq.startSlot`.	2022-06-13 20:00:36 +03:00
Etan Kissling	15967c4076	keep track of latest blocks for optimistic sync (#3715 ) When launched with `--light-client-enable` the latest blocks are fetched and optimistic candidate blocks are passed to a callback (log for now). This helps accelerate syncing in the future (optimistic sync).	2022-06-10 14:16:37 +00:00
Jacek Sieka	7ec1521c52	use unsigned literals (#3717 ) in the hopes of avoiding potential for conversion bugs on i386	2022-06-08 11:09:33 +00:00
Jacek Sieka	b35584632b	sync: remove `step` from sync client implementation (#3678 ) * sync: remove `step` from sync client implementation Deprecated in the spec: https://github.com/ethereum/consensus-specs/pull/2856 - future PR:s will deprecate server support as well.	2022-06-06 16:56:59 +03:00
Eugene Kabanov	50f9596108	Eliminate rpc_types.nim usage. (#3692 )	2022-06-02 09:39:08 +00:00
Etan Kissling	01efa93cf6	add light client (standalone) (#3653 ) Introduces a new library for syncing using libp2p based light client sync protocol, and adds a new `nimbus_light_client` executable that uses this library for syncing. The new executable emits log messages when new beacon block headers are received, and is integrated into testing.	2022-05-31 12:45:37 +02:00
Eugene Kabanov	5592c7c674	NoMonitor and removed clock check for SyncManager. (#3420 ) * Add `NoMonitor` flag to stop SyncManager from monitoring sync situation. * Remove `toleranceValue` and `PeerScoreHeadTooNew`. Co-authored-by: Etan Kissling <etan@status.im>	2022-04-14 15:17:44 +02:00
Jacek Sieka	f70ff38b53	enable `styleCheck:usages` (#3573 ) Some upstream repos still need fixes, but this gets us close enough that style hints can be enabled by default. In general, "canonical" spellings are preferred even if they violate nep-1 - this applies in particular to spec-related stuff like `genesis_validators_root` which appears throughout the codebase.	2022-04-08 16:22:49 +00:00
Jacek Sieka	4207b127f9	era: load blocks and states (#3394 ) * era: load blocks and states Era files contain finalized history and can be thought of as an alternative source for block and state data that allows clients to avoid syncing this information from the P2P network - the P2P network is then used to "top up" the client with the most recent data. They can be freely shared in the community via whatever means (http, torrent, etc) and serve as a permanent cold store of consensus data (and, after the merge, execution data) for history buffs and bean counters alike. This PR gently introduces support for loading blocks and states in two cases: block requests from rest/p2p and frontfilling when doing checkpoint sync. The era files are used as a secondary source if the information is not found in the database - compared to the database, there are a few key differences: * the database stores the block indexed by block root while the era file indexes by slot - the former is used only in rest, while the latter is used both by p2p and rest. * when loading blocks from era files, the root is no longer trivially available - if it is needed, it must either be computed (slow) or cached (messy) - the good news is that for p2p requests, it is not needed * in era files, "framed" snappy encoding is used while in the database we store unframed snappy - for p2p2 requests, the latter requires recompression while the former could avoid it * front-filling is the process of using era files to replace backfilling - in theory this front-filling could happen from any block and front-fills with gaps could also be entertained, but our backfilling algorithm cannot take advantage of this because there's no (simple) way to tell it to "skip" a range. * front-filling, as implemented, is a bit slow (10s to load mainnet): we load the full BeaconState for every era to grab the roots of the blocks - it would be better to partially load the state - as such, it would also be good to be able to partially decompress snappy blobs * lookups from REST via root are served by first looking up a block summary in the database, then using the slot to load the block data from the era file - however, there needs to be an option to create the summary table from era files to fully support historical queries To test this, `ncli_db` has an era file exporter: the files it creates should be placed in an `era` folder next to `db` in the data directory. What's interesting in particular about this setup is that `db` remains as the source of truth for security purposes - it stores the latest synced head root which in turn determines where a node "starts" its consensus participation - the era directory however can be freely shared between nodes / people without any (significant) security implications, assuming the era files are consistent / not broken. There's lots of future improvements to be had: * we can drop the in-memory `BlockRef` index almost entirely - at this point, resident memory usage of Nimbus should drop to a cool 500-600 mb * we could serve era files via REST trivially: this would drop backfill times to whatever time it takes to download the files - unlike the current implementation that downloads block by block, downloading an era at a time almost entirely cuts out request overhead * we can "reasonably" recreate detailed state history from almost any point in time, turning an O(slot) process into O(1) effectively - we'll still need caches and indices to do this with sufficient efficiency for the rest api, but at least it cuts the whole process down to minutes instead of hours, for arbitrary points in time * CI: ignore failures with Nim-1.6 (temporary) * test fixes Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>	2022-03-23 09:58:17 +01:00
Etan Kissling	3ffab01b07	Refactor and optimize sync logs. (#3451 ) * Refactor and optimize logs. * Introduce shortLog(SyncRequest). * Address review comment. * make sync queue logs more consistent Adds a few minor logging improvements: - Fixes a typo (`was happened` -> `has happened`) - Avoids passing `reset_slot` argument to log statement multiple times - Uses same `rewind_to_slot` label when logging in both sync directions - Consistent rewind point logging Co-authored-by: cheatfate <eugene.kabanov@status.im>	2022-03-03 09:05:33 +01:00
Etan Kissling	6849536742	fix `firstSlot` computation for backfill sync When initializing backfill sync, the implementation intends to start at the first unknown slot (`1` before tail). However, an incorrect variable is passed, and backfill sync actually starts at the tail slot instead. This patch corrects this by passing the intended variable. The problem was introduced with the original backfill implementation at #3263.	2022-02-14 18:53:38 +02:00
Etan Kissling	d1f97e209a	remove unused `sleepTime` from `SyncManager` (#3384 ) The `SyncManager` has a leftover optional `sleepTime` parameter in its constructor that used to configure the sync loop polling rate. This parameter was replaced with a constant in #1602 and is no longer functional. This patch removes the `sleepTime` leftovers.	2022-02-14 12:05:01 +01:00
Etan Kissling	a28900c348	fix slot number display during sync (#3383 ) #3304 introduced a regression to the sync status string displayed in the status bar; during the main forward sync, the current slot is no longer reported and always displays as `0`. This patch corrects the computation to accurately report the current slot once more.	2022-02-14 12:04:04 +01:00
Etan Kissling	15fc7534cf	remove unused `maxStatusAge` from `SyncManager` (#3382 ) The `SyncManager` has a leftover optional `maxStatusAge` parameter in its constructor that used to configure the libp2p `Status` polling rate. This parameter was replaced with a constant in #1827 and is no longer functional. This patch removes the `maxStatusAge` leftovers.	2022-02-13 16:17:13 +01:00
Jacek Sieka	1760f4d7a7	move wallet/deposit commands to separate files (#3372 ) These commands have little to do with the "normal" beacon node operation - ergo, they deserve to live in their own module. * clean up imports/exports	2022-02-11 21:40:49 +01:00
Jacek Sieka	c7abc97545	harden and speed up block sync (#3358 ) * harden and speed up block sync The `GetBlockBy` server implementation currently reads SSZ bytes from database, deserializes them into a Nim object then serializes them right back to SSZ - here, we eliminate the deser/ser steps and send the bytes straight to the network. Unfortunately, the snappy recoding must still be done because of differences in framing. Also, the quota system makes one giant request for quota right before sending all blocks - this means that a 1024 block request will be "paused" for a long time, then all blocks will be sent at once causing a spike in database reads which potentially will see the reading client time out before any block is sent. Finally, on the reading side we make several copies of blocks as they travel through various queues - this was not noticeable before but becomes a problem in two cases: bellatrix blocks are up to 10mb (instead of .. 30-40kb) and when backfilling, we process a lot more of them a lot faster. fix status comparisons for nodes syncing from genesis (#3327 was a bit too hard) * don't hit database at all for post-altair slots in GetBlock v1 requests	2022-02-07 19:20:10 +02:00
Jacek Sieka	f70aceef37	Harden handling of unviable forks (#3312 ) * Harden handling of unviable forks In our current handling of unviable forks, we allow peers to send us blocks that come from a different fork - this is not necessarily an error as it can happen naturally, but it does open up the client to a case where the same unviable fork keeps getting requested - rather than allowing this to happen, we'll now give these peers a small negative score - if it keeps happening, we'll disconnect them. * keep track of unviable forks in quarantine, to avoid filling it with known junk * collect peer scores in single module * descore peers when they send unviable blocks during sync * don't give score for duplicate blocks * increase quarantine size to a level that allows finality to happen under optimal conditions - this helps avoid downloading the same blocks over and over in case of an unviable fork * increase initial score for new peers to make room for one more failure before disconnection * log and score invalid/unviable blocks in requestmanager too * avoid ChainDAG dependency in quarantine * reject gossip blocks with unviable parent * continue processing unviable sync blocks in order to build unviable dag * docs * Update beacon_chain/consensus_object_pools/block_pools_types.nim * add unviable queue test	2022-01-26 13:20:08 +01:00
Eugene Kabanov	0ea6dfa517	Fix current slot value and finishing progress for backfilling. (#3304 )	2022-01-21 10:35:54 +01:00
Jacek Sieka	570379d3d9	Backfiller (#3263 ) Backfilling is the process of downloading historical blocks via P2P that are required to fulfill `GetBlocksByRange` duties - this happens during both trusted node and finalized checkpoint syncs. In particular, backfilling happens after syncing to head, such that attestation work can start as soon as possible. * Fix SyncQueue initialization procedure. Remove usage of `awaitne`. Add cancellation support. Remove unneeded `sleepAsync()` if peer's head is older than needed. Add `direction` field to all logs. Fix syncmanager wedge issue. Add proper resource cleaning procedure on backward sync finish. Co-authored-by: cheatfate <eugene.kabanov@status.im>	2022-01-20 08:25:45 +01:00
tersec	9c0c9c98ce	complete switch to beacon_chain/specs/datatypes/bellatrix (#3295 )	2022-01-18 13:36:52 +00:00
Jacek Sieka	68247f81b3	Trusted node sync (#3209 ) * Trusted node sync Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a trusted node instead of relying on a full sync from genesis. Features include: * sync from any slot, including the latest finalized slot * backfill blocks either from the REST api (default) or p2p (#3263) Future improvements: * top up blocks between head in database and some other node - this makes for an efficient backup tool * recreate historical state to enable historical queries * fixes * load genesis from network metadata * check checkpoint block root against state * fix invalid block root in rest json decoding * odds and ends * retry looking for epoch-boundary checkpoint blocks	2022-01-17 10:27:08 +01:00
Jacek Sieka	d57c2dc4e5	use tail block as sync pivot (#3276 ) When syncing, we show how much of the sync has completed - with checkpoint sync, the syncing does not always go from slot 0 to head, but rather can start in the middle. To show a consistent `%` between restarts, we introduce the concept of a pivot point, such that if I sync 10% of the chain, then restart the client, it picks up at 10% (instead of counting from 0). What it looks like: ``` INF ... sync="01d12h41m (15.96%) 13.5158slots/s (QDDQDDQQDP:339018)" ... ```	2022-01-13 10:37:53 +01:00
tersec	6ef3834f4a	fix type-conversions-to-self, unexport from nimbus_beacon_node, and rm unused vars/procs (#3211 )	2021-12-20 12:21:17 +01:00
Jacek Sieka	118840d241	SyncManager cleanups for backfill support (#3189 ) * SyncManager cleanups for backfill support Cleanups, fixes and simplifications, in anticipation of backfill support for the `SyncManager`: * reformat sync progress indicator to show time left and % done more prominently: * old: `sync="sPssPsssss:2:2.4229:00h57m (2706898)"` * new: `sync="14d12h31m (0.52%) 1.1378slots/s (wQQQQQDDQQ:1287520)"` * reset average speed when going out of sync * pass all block errors to sync manager, including duplicate/unviable * penalize peers for reporting a head block that is outside of our expected wall clock time (they're likely on a different network or trying to disrupt sync) * remove `SyncFailureKind` (unused) * remove `inRange` (unused) * add `Q` for sync queue requests that are in the `SyncQueue` but not yet in the `BlockProcessor` queue * update last slot in `SyncQueue` after getting peer status * fix race condition between `wakeupWaiters` and `resetWait`, where workers would not be correctly reset if block verification returned a completed future without event loop * log syncmanager direction * Fix ordering issue. Some of the requests size of which are not equal to `chunkSize` could be processed in wrong order which could lead to sync process freezes. Co-authored-by: cheatfate <eugene.kabanov@status.im>	2021-12-16 15:57:16 +01:00
Eugene Kabanov	b05734f610	Backward sync support for SyncManager. (#3131 ) * Unbundle SyncQueue from sync_manager.nim. Unbundle Peer scores constants to peer_scores.nim. Add Forward/Backward enum. * Further improvements and tests. * Adopt getRewindPoint() and fix MissingParent handler. * Remove unused procedures. Refactor `result` usage. Fix resetWait(). * Add all the tests and fix the issue with rewind point. * Fix get() issue. * Fix flaky tests. * test fixes Co-authored-by: Jacek Sieka <jacek@status.im>	2021-12-08 22:15:29 +01:00
Jacek Sieka	233d756518	Logging and startup improvements (#3038 ) * Logging and startup improvements Color support for released binaries! * startup scripts no longer log to file by default - this only affects source builds - released binaries don't support file logging * add --log-stdout option to control logging to stdout (colors, json) * detect tty:s vs redirected logs and log accordingly * add option to disable log colors at runtime * simplify several "common" logs, showing the most important information earlier and more clearly * remove line numbers / file information / tid - these take up space and are of little use to end users * still enabled in debug builds and tools * remove `testnet_servers_image` compile-time option * server images, released binaries and compile-from-source now offer the same behaviour and features * fixes https://github.com/status-im/nimbus-eth2/issues/2326 * fixes https://github.com/status-im/nimbus-eth2/issues/1794 * remove instanteneous block speed from sync message, keeping only average before: ``` INF 2021-10-28 16:45:59.000+02:00 Slot start topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:884 lastSlot=2384027 wallSlot=2384028 delay=461us84ns peers=0 head=75a10ee5:3348 headEpoch=104 finalized=cd6804ba:3264 finalizedEpoch=102 sync="wwwwwwwwww:0:0.0000:0.0000:00h00m (3348)" INF 2021-10-28 16:45:59.046+02:00 Slot end topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:821 slot=2384028 nextSlot=2384029 head=75a10ee5:3348 headEpoch=104 finalizedHead=cd6804ba:3264 finalizedEpoch=102 nextAttestationSlot=-1 nextProposalSlot=-1 nextActionWait=n/a ``` after: ``` INF 2021-10-28 22:43:23.033+02:00 Slot start topics="beacnde" slot=2385815 epoch=74556 sync="DDPDDPUDDD:10:5.2258:01h19m (2361088)" peers=37 head=eacd2dae:2361096 finalized=73782:a4751487 delay=33ms687us715ns INF 2021-10-28 22:43:23.291+02:00 Slot end topics="beacnde" slot=2385815 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 head=eacd2dae:2361096 ``` * fix comment * documentation updates * mention `--log-file` may be deprecated in the future * update various docs	2021-11-02 18:06:36 +01:00
Jacek Sieka	4f7a8cf79d	register vc duties with subnet tracker (#2949 ) * register vc duties with subnet tracker * fix activation logging during startup * cache slot signature to avoid duplicate signature work * schedule aggregation duties one slot at a time to avoid CPU spike at each epoch * lower aggregation subnet pre-subscription time to 4 slots (lowers bandwidth and CPU usage) * update stability subnets in ENR on startup * log gossip state * perform gossip subscriptions just before the next slot starts * document stuff * add random include * don't overwrite subscription state when not subscribed * log target gossip state * updating gossip status once is enough * add test * remove syncQueueLen - this one is not updated at the end of the sync and may cause gossip to disconnect itself completely - use a simple head distance instead * fix gossip disconnection - if in hysteresis, node.gossipState will be set to disabled even though we don't disable topic subscriptions * fix extra duty registration call	2021-10-18 11:11:44 +02:00
tersec	9c0d9b546a	successfull -> successful (#2842 )	2021-09-01 18:08:24 +02:00
Jacek Sieka	01596c45dd	cleanups and fixes (#2827 ) * import cleanup * fix json-rpc exception handlers * avoid unnecessary presto client import * introduce ForkedBeaconBlock, some altair logging * url fixes	2021-08-27 11:00:06 +02:00
tersec	6e46445da2	switch result = foo to expression return; unexport rest of logtrace symbols (#2788 )	2021-08-17 09:51:39 +00:00
Jacek Sieka	7a622e8505	rework spec imports (#2779 ) The spec imports are a mess to work with, so this branch cleans them up a bit to ensure that we avoid generic sandwitches and that importing stuff generally becomes easier. * reexport crypto/digest/presets because these are part of the public symbol set of the rest of the spec types * don't export `merge` types from `base` - this causes circular deps * fix circular deps in `ssz/spec_types` - this is the first step in disentangling ssz from spec * be explicit about phase0 vs altair - longer term, `altair` will become the "natural" type set, then merge and so on, so no point in giving `phase0` special preferential treatment	2021-08-12 13:08:20 +00:00
Jacek Sieka	9697b73e71	forkedbeaconstate_helpers -> forks (#2772 ) Simpler module name for stuff that covers forks * check that runtime config matches database state * also include some assorted altair cleanups * use "standard" genesis fork in local testnet to work around missing runtime config support	2021-08-10 22:46:35 +02:00
Jacek Sieka	3d7bee8502	REST API client, JSON-RPC cleanups (#2756 ) This refactoring puts the JSON-RPC and REST APIs on more equal footing by renaming and moving things around, creating a separation between client and server, and documenting what they are - the aim is to have a simple-to-use base to start from when developing API clients, as well as make it easier to navigate the code when looking for the legacy JSON-RPC interface vs the new REST API. * move REST client, serialization and supporting types to spec/eth2_apis * REST stuff now starts with `rest_`, JSON-RPC stuff starts with `rpc_`, more or less * simplify imports such that there's a simple module to import for both server and client * map REST type and proc names to yaml spec more closely - in particular, reuse operation and type names in `rest_types` to make comparisons against spec more easy * cleaner separation between client and server modules - modules common between server and client such as `rest_types` and serialization move to the spec folder - this allows the client to be built with less knowledge about server internals	2021-08-03 17:17:11 +02:00
Jacek Sieka	2d6a661ac6	Syncv2 (#2723 ) * bump libp2p * altair sync v2 Use V2 sync requests after the altair fork has happened, according to the wall clock * Fix the behavior of the v1 req/resp calls after Altair Co-authored-by: Zahary Karadjov <zahary@gmail.com>	2021-07-15 21:01:07 +02:00
Jacek Sieka	7f52ffb8d9	clean up block processing (#2610 ) * gossip_to_consensus -> block_processor (it's processing only blocks, but not only from gossip) * measure queue and validation time for blocks * measure assignment and state loading times for updateStateData * avoid some unnecessary block copies in block sync * warn that database is corrupt if we hit tail without a state	2021-05-28 19:34:00 +03:00
Eugene Kabanov	5b5ea2e813	Fix integer overflow issue in sync_manager. (#2564 ) * Make Refactor rewind point assignment more concrete. * Fix overflow issue in getRewindPoint(). Add tests.	2021-05-18 12:25:14 +02:00
cheatfate	9de65fa293	Fixing issues after bump.	2021-04-09 21:42:13 +03:00
cheatfate	c4d891f583	Fix sync_manager.nim to return proper status. Bump REST API dependencies.	2021-04-09 21:42:13 +03:00
Jacek Sieka	2695cfa864	EH cleanup (#2455 ) almost 100% raises in nimbus-eth2 now! * fix some rare exception-related crashes in json-rpc	2021-03-26 07:52:01 +01:00
Mamy Ratsimbazafy	c47d636cb3	Split Eth2Processor in prep for batching (#2396 ) * Split Eth2Processor in gossip and consensus part and materialize the shared block queue * Update initialization in test_sync_manager	2021-03-11 11:10:57 +01:00
Mamy Ratsimbazafy	d47f53cd9d	Reorg (5/5) (#2377 ) * Reorg things left into networking and gossip_processing * time -> beacon_clock * fix builds	2021-03-05 14:12:00 +01:00
Mamy Ratsimbazafy	5d7f9c3a04	Consensus object pools [reorg 4/5] (#2374 ) * Add documentation * make test doesn't try to build the beacon node :/	2021-03-04 10:13:44 +01:00
tersec	4278e80657	document two uint64 -> int64 conversions (#2375 ) * document two uint64 -> int64 conversions * fix minimal preset slot time & calculation	2021-03-04 10:13:23 +01:00
Mamy Ratsimbazafy	3276dfc683	Consolidate modules by areas [part 1] (#2365 ) * Move sync in subfolder * move validator related thingies in validators * fix binary builds * update bounds comment [skip ci]	2021-03-02 11:27:45 +01:00

48 Commits