nimbus-eth2

Commit Graph

Author	SHA1	Message	Date
Eugene Kabanov	174292b7e4	Sync gaps fix (#4090 )	2022-09-19 12:37:42 +03:00
Etan Kissling	3ba016d75f	consistent peer scoring for missing non-finalized parent (#3381 ) When the sync queue processes results for a blocks by range request, and the requested range contained some slots that are already finalized, `BlockError.MissingParent` currently leads to `PeerScoreBadBlocks` even when the error occurs on a non-finalized slot in the requested range. This patch changes the scoring in that case to `PeerScoreMissingBlocks` for consistency with range requests solely covering non-finalized slots, and, likewise, rewinds the sync queue to the next `rewindSlot`.	2022-09-16 21:45:53 +02:00
Jacek Sieka	b35584632b	sync: remove `step` from sync client implementation (#3678 ) * sync: remove `step` from sync client implementation Deprecated in the spec: https://github.com/ethereum/consensus-specs/pull/2856 - future PR:s will deprecate server support as well.	2022-06-06 16:56:59 +03:00
Etan Kissling	8cfb630aa9	never request blocks before `safeSlot` in sync (#3512 ) Follows up on https://github.com/status-im/nimbus-eth2/pull/3461 which ensured that repeated `beaconBlocksByRange` requests get shrinked to account for potential out-of-band advancements to `safeSlot`, with similar logic for the initial request.	2022-05-10 13:46:14 +02:00
Etan Kissling	6d1d31dd01	avoid re-requesting finalized blocks during sync (#3461 ) When a `beaconBlocksByRange` response advances the `safeSlot`, but later has errors, the sync queue keeps repeating that same request until it is fulfilled without errors. Data up through `safeSlot` is considered to be immutable, i.e., finalized, so re-requesting that data is not useful. By advancing the sync progress in that scenario, those redundant query portions can be avoided. Note, the finalized block _itself_ is always requested, even in the initial request. This behaviour is kept same.	2022-03-15 18:56:56 +01:00
Etan Kissling	3ffab01b07	Refactor and optimize sync logs. (#3451 ) * Refactor and optimize logs. * Introduce shortLog(SyncRequest). * Address review comment. * make sync queue logs more consistent Adds a few minor logging improvements: - Fixes a typo (`was happened` -> `has happened`) - Avoids passing `reset_slot` argument to log statement multiple times - Uses same `rewind_to_slot` label when logging in both sync directions - Consistent rewind point logging Co-authored-by: cheatfate <eugene.kabanov@status.im>	2022-03-03 09:05:33 +01:00
Jacek Sieka	c7abc97545	harden and speed up block sync (#3358 ) * harden and speed up block sync The `GetBlockBy` server implementation currently reads SSZ bytes from database, deserializes them into a Nim object then serializes them right back to SSZ - here, we eliminate the deser/ser steps and send the bytes straight to the network. Unfortunately, the snappy recoding must still be done because of differences in framing. Also, the quota system makes one giant request for quota right before sending all blocks - this means that a 1024 block request will be "paused" for a long time, then all blocks will be sent at once causing a spike in database reads which potentially will see the reading client time out before any block is sent. Finally, on the reading side we make several copies of blocks as they travel through various queues - this was not noticeable before but becomes a problem in two cases: bellatrix blocks are up to 10mb (instead of .. 30-40kb) and when backfilling, we process a lot more of them a lot faster. fix status comparisons for nodes syncing from genesis (#3327 was a bit too hard) * don't hit database at all for post-altair slots in GetBlock v1 requests	2022-02-07 19:20:10 +02:00
Jacek Sieka	f70aceef37	Harden handling of unviable forks (#3312 ) * Harden handling of unviable forks In our current handling of unviable forks, we allow peers to send us blocks that come from a different fork - this is not necessarily an error as it can happen naturally, but it does open up the client to a case where the same unviable fork keeps getting requested - rather than allowing this to happen, we'll now give these peers a small negative score - if it keeps happening, we'll disconnect them. * keep track of unviable forks in quarantine, to avoid filling it with known junk * collect peer scores in single module * descore peers when they send unviable blocks during sync * don't give score for duplicate blocks * increase quarantine size to a level that allows finality to happen under optimal conditions - this helps avoid downloading the same blocks over and over in case of an unviable fork * increase initial score for new peers to make room for one more failure before disconnection * log and score invalid/unviable blocks in requestmanager too * avoid ChainDAG dependency in quarantine * reject gossip blocks with unviable parent * continue processing unviable sync blocks in order to build unviable dag * docs * Update beacon_chain/consensus_object_pools/block_pools_types.nim * add unviable queue test	2022-01-26 13:20:08 +01:00
Jacek Sieka	805e85e1ff	time: spring cleaning (#3262 ) Time in the beacon chain is expressed relative to the genesis time - this PR creates a `beacon_time` module that collects helpers and utilities for dealing the time units - the new module does not deal with actual wall time (that's remains in `beacon_clock`). Collecting the time related stuff in one place makes it easier to find, avoids some circular imports and allows more easily identifying the code actually needs wall time to operate. * move genesis-time-related functionality into `spec/beacon_time` * avoid using `chronos.Duration` for time differences - it does not support negative values (such as when something happens earlier than it should) * saturate conversions between `FAR_FUTURE_XXX`, so as to avoid overflows * fix delay reporting in validator client so it uses the expected deadline of the slot, not "closest wall slot" * simplify looping over the slots of an epoch * `compute_start_slot_at_epoch` -> `start_slot` * `compute_epoch_at_slot` -> `epoch` A follow-up PR will (likely) introduce saturating arithmetic for the time units - this is merely code moves, renames and fixing of small bugs.	2022-01-11 11:01:54 +01:00
Jacek Sieka	118840d241	SyncManager cleanups for backfill support (#3189 ) * SyncManager cleanups for backfill support Cleanups, fixes and simplifications, in anticipation of backfill support for the `SyncManager`: * reformat sync progress indicator to show time left and % done more prominently: * old: `sync="sPssPsssss:2:2.4229:00h57m (2706898)"` * new: `sync="14d12h31m (0.52%) 1.1378slots/s (wQQQQQDDQQ:1287520)"` * reset average speed when going out of sync * pass all block errors to sync manager, including duplicate/unviable * penalize peers for reporting a head block that is outside of our expected wall clock time (they're likely on a different network or trying to disrupt sync) * remove `SyncFailureKind` (unused) * remove `inRange` (unused) * add `Q` for sync queue requests that are in the `SyncQueue` but not yet in the `BlockProcessor` queue * update last slot in `SyncQueue` after getting peer status * fix race condition between `wakeupWaiters` and `resetWait`, where workers would not be correctly reset if block verification returned a completed future without event loop * log syncmanager direction * Fix ordering issue. Some of the requests size of which are not equal to `chunkSize` could be processed in wrong order which could lead to sync process freezes. Co-authored-by: cheatfate <eugene.kabanov@status.im>	2021-12-16 15:57:16 +01:00
Eugene Kabanov	b05734f610	Backward sync support for SyncManager. (#3131 ) * Unbundle SyncQueue from sync_manager.nim. Unbundle Peer scores constants to peer_scores.nim. Add Forward/Backward enum. * Further improvements and tests. * Adopt getRewindPoint() and fix MissingParent handler. * Remove unused procedures. Refactor `result` usage. Fix resetWait(). * Add all the tests and fix the issue with rewind point. * Fix get() issue. * Fix flaky tests. * test fixes Co-authored-by: Jacek Sieka <jacek@status.im>	2021-12-08 22:15:29 +01:00
Jacek Sieka	1a8b7469e3	move quarantine outside of chaindag (#3124 ) * move quarantine outside of chaindag The quarantine has been part of the ChainDAG for the longest time, but this design has a few issues: * the function in which blocks are verified and added to the dag becomes reentrant and therefore difficult to reason about - we're currently using a stateful flag to work around it * quarantined blocks bypass the processing queue leading to a processing stampede * the quarantine flow is unsuitable for orphaned attestations - these should also should be quarantined eventually Instead of processing the quarantine inside ChainDAG, this PR moves re-queueing to `block_processor` which already is responsible for dealing with follow-up work when a block is added to the dag This sets the stage for keeping attestations in the quarantine as well. Also: * make `BlockError` `{.pure.}` * avoid use of `ValidationResult` in block clearance (that's for gossip)	2021-12-06 10:49:01 +01:00
Jacek Sieka	c40cc6cec1	clean up fork enum and field names * single naming strategy * simplify some fork code * simplify forked block production	2021-10-19 11:06:38 +03:00
tersec	a060985abc	unexport various parts of tests/ and remove unused code (#2794 )	2021-08-18 13:58:43 +00:00
Jacek Sieka	9697b73e71	forkedbeaconstate_helpers -> forks (#2772 ) Simpler module name for stuff that covers forks * check that runtime config matches database state * also include some assorted altair cleanups * use "standard" genesis fork in local testnet to work around missing runtime config support	2021-08-10 22:46:35 +02:00
Jacek Sieka	2d6a661ac6	Syncv2 (#2723 ) * bump libp2p * altair sync v2 Use V2 sync requests after the altair fork has happened, according to the wall clock * Fix the behavior of the v1 req/resp calls after Altair Co-authored-by: Zahary Karadjov <zahary@gmail.com>	2021-07-15 21:01:07 +02:00
Jacek Sieka	7f52ffb8d9	clean up block processing (#2610 ) * gossip_to_consensus -> block_processor (it's processing only blocks, but not only from gossip) * measure queue and validation time for blocks * measure assignment and state loading times for updateStateData * avoid some unnecessary block copies in block sync * warn that database is corrupt if we hit tail without a state	2021-05-28 19:34:00 +03:00
Eugene Kabanov	5b5ea2e813	Fix integer overflow issue in sync_manager. (#2564 ) * Make Refactor rewind point assignment more concrete. * Fix overflow issue in getRewindPoint(). Add tests.	2021-05-18 12:25:14 +02:00
Jacek Sieka	ce49da6c0a	Introduce unittest2 and junit reports (#2522 ) * Introduce unittest2 and junit reports * fix XML path * don't combine multiple CI runs * fixup * public combined report also Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>	2021-04-28 18:41:02 +02:00
Dustin Brody	97504fdb9d	ncli_db pruneDatabase checkpointing; remove onSlotEnd lookaheadTime	2021-03-12 23:15:46 +02:00
Mamy Ratsimbazafy	c47d636cb3	Split Eth2Processor in prep for batching (#2396 ) * Split Eth2Processor in gossip and consensus part and materialize the shared block queue * Update initialization in test_sync_manager	2021-03-11 11:10:57 +01:00
Mamy Ratsimbazafy	d47f53cd9d	Reorg (5/5) (#2377 ) * Reorg things left into networking and gossip_processing * time -> beacon_clock * fix builds	2021-03-05 14:12:00 +01:00
Mamy Ratsimbazafy	3276dfc683	Consolidate modules by areas [part 1] (#2365 ) * Move sync in subfolder * move validator related thingies in validators * fix binary builds * update bounds comment [skip ci]	2021-03-02 11:27:45 +01:00
Jacek Sieka	22998fdfd4	avoid double deserialization When blocks and attestations arrive, they are SSZ-decoded twice: once for validation and once for processing. This branch enqueues the decoded block directly for processing, avoiding the second, slow deserialization. * move processing of blocks and attestations to queue * ...and out from beacon_node * split attestation processing into attestations and aggregates * also updates metrics * clean up logging to better follow the lifetime of gossip: arrival, validation and processing * drop attestations and aggregates if there are too many * try to prioritise blocks and aggregates before single-validator attestations	2020-08-21 11:46:25 +03:00
Eugene Kabanov	55fcece0b2	SyncManager fix to process blocks one by one. (#1464 ) * Allow sync manager process blocks one by one. * Log storeBlock() and updateHead() duration. * Calculate duration only for blocks added without any error. * Fix float compilation error. * Fix duration. * Fix SyncQueue tests.	2020-08-10 09:15:50 +02:00
Eugene Kabanov	1fc9413c48	Fix #1153 . (#1160 ) Add ability for SyncQueue to recover from unexpected MissingParent.	2020-06-11 16:20:53 +02:00
cheatfate	12e28a1fa9	Add proper concurrent connections. Add SeenTable to avoid continuous attempts to dead peers. Refactor onSecond. Block backward sync while forward sync is working. SyncManager now checks responses according corresponding requests + tests. SyncManager now watching for not progressing local_head_slot and resets SyncQueue.	2020-06-03 12:53:57 +03:00
Eugene Kabanov	21131e629b	Sync freeze fixes. (#1072 ) * Add ability to reset state of sync manager. Fix bug when sync got stuck on `zero-point` reset. Fix bug when sync got stuck when some of the workers waiting for failing one. * Remove debugging comments and imports. * Remove not used pendingLock.	2020-05-28 07:02:28 +02:00
Eugene Kabanov	ea95021073	Fix sync issues. (#1035 ) * Fix sync issues. * Add documentation about zero-point. Add more comments about syncing loops. Change to 4 blocks per request.	2020-05-19 14:08:50 +02:00
Jacek Sieka	ed74770451	spec: regulate exceptions (#913 ) * spec: regulate exceptions * a few more simple raises	2020-04-22 07:53:02 +02:00
Eugene Kabanov	3d42da90a8	Syncing. (#909 )	2020-04-20 16:59:18 +02:00
Ștefan Talpalaru	b7a32a17ba	bump submodules and remove failing syncManagerGroupRecoveryTest	2020-04-14 18:21:56 +02:00
Zahary Karadjov	22876da593	Fix gcsafety issues in the test suite	2020-03-24 22:14:40 +02:00
Zed	6ba7b4b117	Generate markdown test reports	2020-03-13 14:38:59 +00:00
Ștefan Talpalaru	1caafba79c	Merge pull request #803 from status-im/misc/bump-libp2p bumping libp2p to latest master	2020-03-12 01:57:11 +01:00
cheatfate	399e91fe5b	Fix failing test.	2020-03-12 02:02:33 +02:00
Dustin Brody	0d3de00714	remove unused imports	2020-03-11 10:50:55 +00:00
cheatfate	98dc701473	Add PeerPool.addPeer async version and tests.	2020-01-29 15:28:41 +00:00
cheatfate	8b229d68ad	Add testutil and timedTest.	2020-01-29 15:28:41 +00:00
cheatfate	db20fc1172	Fix SyncQueue push(data) bug. Rename lastSlot to HeadSlot. Add failure test.	2020-01-29 15:28:41 +00:00
cheatfate	73dc72583f	Initial commit.	2020-01-29 15:28:41 +00:00

41 Commits