Commit Graph

78 Commits

Author SHA1 Message Date
Jacek Sieka f70aceef37
Harden handling of unviable forks (#3312)
* Harden handling of unviable forks

In our current handling of unviable forks, we allow peers to send us
blocks that come from a different fork - this is not necessarily an
error as it can happen naturally, but it does open up the client to a
case where the same unviable fork keeps getting requested - rather than
allowing this to happen, we'll now give these peers a small negative
score - if it keeps happening, we'll disconnect them.

* keep track of unviable forks in quarantine, to avoid filling it with
known junk
* collect peer scores in single module
* descore peers when they send unviable blocks during sync
* don't give score for duplicate blocks
* increase quarantine size to a level that allows finality to happen
under optimal conditions - this helps avoid downloading the same blocks
over and over in case of an unviable fork
* increase initial score for new peers to make room for one more failure
before disconnection
* log and score invalid/unviable blocks in requestmanager too
* avoid ChainDAG dependency in quarantine
* reject gossip blocks with unviable parent
* continue processing unviable sync blocks in order to build unviable
dag

* docs

* Update beacon_chain/consensus_object_pools/block_pools_types.nim

* add unviable queue test
2022-01-26 13:20:08 +01:00
Eugene Kabanov 0ea6dfa517
Fix current slot value and finishing progress for backfilling. (#3304) 2022-01-21 10:35:54 +01:00
Jacek Sieka 570379d3d9
Backfiller (#3263)
Backfilling is the process of downloading historical blocks via P2P that
are required to fulfill `GetBlocksByRange` duties - this happens during
both trusted node and finalized checkpoint syncs.

In particular, backfilling happens after syncing to head, such that
attestation work can start as soon as possible.

* Fix SyncQueue initialization procedure.
Remove usage of `awaitne`.
Add cancellation support.
Remove unneeded `sleepAsync()` if peer's head is older than needed.
Add `direction` field to all logs.
Fix syncmanager wedge issue.
Add proper resource cleaning procedure on backward sync finish.

Co-authored-by: cheatfate <eugene.kabanov@status.im>
2022-01-20 08:25:45 +01:00
tersec 9c0c9c98ce
complete switch to beacon_chain/specs/datatypes/bellatrix (#3295) 2022-01-18 13:36:52 +00:00
Jacek Sieka 68247f81b3
Trusted node sync (#3209)
* Trusted node sync

Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a
trusted node instead of relying on a full sync from genesis.

Features include:

* sync from any slot, including the latest finalized slot
* backfill blocks either from the REST api (default) or p2p (#3263)

Future improvements:

* top up blocks between head in database and some other node - this
makes for an efficient backup tool
* recreate historical state to enable historical queries

* fixes

* load genesis from network metadata
* check checkpoint block root against state
* fix invalid block root in rest json decoding
* odds and ends

* retry looking for epoch-boundary checkpoint blocks
2022-01-17 10:27:08 +01:00
Jacek Sieka d57c2dc4e5
use tail block as sync pivot (#3276)
When syncing, we show how much of the sync has completed - with
checkpoint sync, the syncing does not always go from slot 0 to head, but
rather can start in the middle.

To show a consistent `%` between restarts, we introduce the concept of a
pivot point, such that if I sync 10% of the chain, then restart the
client, it picks up at 10% (instead of counting from 0).

What it looks like:
```
INF ... sync="01d12h41m (15.96%) 13.5158slots/s (QDDQDDQQDP:339018)" ...
```
2022-01-13 10:37:53 +01:00
tersec 6ef3834f4a
fix type-conversions-to-self, unexport from nimbus_beacon_node, and rm unused vars/procs (#3211) 2021-12-20 12:21:17 +01:00
Jacek Sieka 118840d241
SyncManager cleanups for backfill support (#3189)
* SyncManager cleanups for backfill support

Cleanups, fixes and simplifications, in anticipation of backfill support
for the `SyncManager`:

* reformat sync progress indicator to show time left and % done more
prominently:
  * old: `sync="sPssPsssss:2:2.4229:00h57m (2706898)"`
  * new: `sync="14d12h31m (0.52%) 1.1378slots/s (wQQQQQDDQQ:1287520)"`
* reset average speed when going out of sync
* pass all block errors to sync manager, including duplicate/unviable
* penalize peers for reporting a head block that is outside of our
expected wall clock time (they're likely on a different network or
trying to disrupt sync)
* remove `SyncFailureKind` (unused)
* remove `inRange` (unused)
* add `Q` for sync queue requests that are in the `SyncQueue` but not
yet in the `BlockProcessor` queue
* update last slot in `SyncQueue` after getting peer status
* fix race condition between `wakeupWaiters` and `resetWait`, where
workers would not be correctly reset if block verification returned a
completed future without event loop
* log syncmanager direction

* Fix ordering issue.
Some of the requests size of which are not equal to `chunkSize` could be processed in wrong order which could lead to sync process freezes.

Co-authored-by: cheatfate <eugene.kabanov@status.im>
2021-12-16 15:57:16 +01:00
Eugene Kabanov b05734f610
Backward sync support for SyncManager. (#3131)
* Unbundle SyncQueue from sync_manager.nim.
Unbundle Peer scores constants to peer_scores.nim.
Add Forward/Backward enum.

* Further improvements and tests.

* Adopt getRewindPoint() and fix MissingParent handler.

* Remove unused procedures.
Refactor `result` usage.
Fix resetWait().

* Add all the tests and fix the issue with rewind point.

* Fix get() issue.

* Fix flaky tests.

* test fixes

Co-authored-by: Jacek Sieka <jacek@status.im>
2021-12-08 22:15:29 +01:00
Jacek Sieka 233d756518
Logging and startup improvements (#3038)
* Logging and startup improvements

Color support for released binaries!

* startup scripts no longer log to file by default - this only affects
source builds - released binaries don't support file logging
* add --log-stdout option to control logging to stdout (colors, json)
* detect tty:s vs redirected logs and log accordingly
* add option to disable log colors at runtime
* simplify several "common" logs, showing the most important information
earlier and more clearly
* remove line numbers / file information / tid - these take up space and
are of little use to end users
  * still enabled in debug builds and tools
* remove `testnet_servers_image` compile-time option
* server images, released binaries and compile-from-source now offer
the same behaviour and features
* fixes https://github.com/status-im/nimbus-eth2/issues/2326
* fixes https://github.com/status-im/nimbus-eth2/issues/1794
* remove instanteneous block speed from sync message, keeping only
average

before:

```
INF 2021-10-28 16:45:59.000+02:00 Slot start                                 topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:884 lastSlot=2384027 wallSlot=2384028 delay=461us84ns peers=0 head=75a10ee5:3348 headEpoch=104 finalized=cd6804ba:3264 finalizedEpoch=102 sync="wwwwwwwwww:0:0.0000:0.0000:00h00m (3348)"
INF 2021-10-28 16:45:59.046+02:00 Slot end                                   topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:821 slot=2384028 nextSlot=2384029 head=75a10ee5:3348 headEpoch=104 finalizedHead=cd6804ba:3264 finalizedEpoch=102 nextAttestationSlot=-1 nextProposalSlot=-1 nextActionWait=n/a
```

after:

```
INF 2021-10-28 22:43:23.033+02:00 Slot start                                 topics="beacnde" slot=2385815 epoch=74556 sync="DDPDDPUDDD:10:5.2258:01h19m (2361088)" peers=37 head=eacd2dae:2361096 finalized=73782:a4751487 delay=33ms687us715ns
INF 2021-10-28 22:43:23.291+02:00 Slot end                                   topics="beacnde" slot=2385815 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 head=eacd2dae:2361096
```

* fix comment

* documentation updates

* mention `--log-file` may be deprecated in the future
* update various docs
2021-11-02 18:06:36 +01:00
Jacek Sieka 4f7a8cf79d
register vc duties with subnet tracker (#2949)
* register vc duties with subnet tracker
* fix activation logging during startup
* cache slot signature to avoid duplicate signature work
* schedule aggregation duties one slot at a time to avoid CPU spike at
each epoch
* lower aggregation subnet pre-subscription time to 4 slots (lowers
bandwidth and CPU usage)
* update stability subnets in ENR on startup
* log gossip state
* perform gossip subscriptions just before the next slot starts
* document stuff
* add random include
* don't overwrite subscription state when not subscribed
* log target gossip state
* updating gossip status once is enough
* add test
* remove syncQueueLen - this one is not updated at the end of the sync
and may cause gossip to disconnect itself completely - use a simple head
distance instead
* fix gossip disconnection - if in hysteresis, node.gossipState will be
set to disabled even though we don't disable topic subscriptions
* fix extra duty registration call
2021-10-18 11:11:44 +02:00
tersec 9c0d9b546a
successfull -> successful (#2842) 2021-09-01 18:08:24 +02:00
Jacek Sieka 01596c45dd
cleanups and fixes (#2827)
* import cleanup
* fix json-rpc exception handlers
* avoid unnecessary presto client import
* introduce ForkedBeaconBlock, some altair logging
* url fixes
2021-08-27 11:00:06 +02:00
tersec 6e46445da2
switch result = foo to expression return; unexport rest of logtrace symbols (#2788) 2021-08-17 09:51:39 +00:00
Jacek Sieka 7a622e8505
rework spec imports (#2779)
The spec imports are a mess to work with, so this branch cleans them up
a bit to ensure that we avoid generic sandwitches and that importing
stuff generally becomes easier.

* reexport crypto/digest/presets because these are part of the public
symbol set of the rest of the spec types
* don't export `merge` types from `base` - this causes circular deps
* fix circular deps in `ssz/spec_types` - this is the first step in
disentangling ssz from spec
* be explicit about phase0 vs altair - longer term, `altair` will become
the "natural" type set, then merge and so on, so no point in giving
`phase0` special preferential treatment
2021-08-12 13:08:20 +00:00
Jacek Sieka 9697b73e71
forkedbeaconstate_helpers -> forks (#2772)
Simpler module name for stuff that covers forks

* check that runtime config matches database state
* also include some assorted altair cleanups
* use "standard" genesis fork in local testnet to work around missing
runtime config support
2021-08-10 22:46:35 +02:00
Jacek Sieka 3d7bee8502
REST API client, JSON-RPC cleanups (#2756)
This refactoring puts the JSON-RPC and REST APIs on more equal footing
by renaming and moving things around, creating a separation between
client and server, and documenting what they are - the aim is to have a
simple-to-use base to start from when developing API clients, as well as
make it easier to navigate the code when looking for the legacy JSON-RPC
interface vs the new REST API.

* move REST client, serialization and supporting types to spec/eth2_apis
* REST stuff now starts with `rest_`, JSON-RPC stuff starts with `rpc_`,
more or less
* simplify imports such that there's a simple module to import for both
server and client
* map REST type and proc names to yaml spec more closely - in
particular, reuse operation and type names in `rest_types` to make
comparisons against spec more easy
* cleaner separation between client and server modules - modules common
between server and client such as `rest_types` and serialization move to
the spec folder - this allows the client to be built with less knowledge
about server internals
2021-08-03 17:17:11 +02:00
Jacek Sieka 2d6a661ac6
Syncv2 (#2723)
* bump libp2p

* altair sync v2

Use V2 sync requests after the altair fork has happened, according to
the wall clock

* Fix the behavior of the v1 req/resp calls after Altair

Co-authored-by: Zahary Karadjov <zahary@gmail.com>
2021-07-15 21:01:07 +02:00
Jacek Sieka 7f52ffb8d9
clean up block processing (#2610)
* gossip_to_consensus -> block_processor (it's processing only blocks,
but not only from gossip)
* measure queue and validation time for blocks
* measure assignment and state loading times for updateStateData
* avoid some unnecessary block copies in block sync
* warn that database is corrupt if we hit tail without a state
2021-05-28 19:34:00 +03:00
Eugene Kabanov 5b5ea2e813
Fix integer overflow issue in sync_manager. (#2564)
* Make Refactor rewind point assignment more concrete.

* Fix overflow issue in getRewindPoint().
Add tests.
2021-05-18 12:25:14 +02:00
cheatfate 9de65fa293 Fixing issues after bump. 2021-04-09 21:42:13 +03:00
cheatfate c4d891f583 Fix sync_manager.nim to return proper status.
Bump REST API dependencies.
2021-04-09 21:42:13 +03:00
Jacek Sieka 2695cfa864
EH cleanup (#2455)
almost 100% raises in nimbus-eth2 now!

* fix some rare exception-related crashes in json-rpc
2021-03-26 07:52:01 +01:00
Mamy Ratsimbazafy c47d636cb3
Split Eth2Processor in prep for batching (#2396)
* Split Eth2Processor in gossip and consensus part and materialize the shared block queue

* Update initialization in test_sync_manager
2021-03-11 11:10:57 +01:00
Mamy Ratsimbazafy d47f53cd9d
Reorg (5/5) (#2377)
* Reorg things left into networking and gossip_processing

* time -> beacon_clock

* fix builds
2021-03-05 14:12:00 +01:00
Mamy Ratsimbazafy 5d7f9c3a04
Consensus object pools [reorg 4/5] (#2374)
* Add documentation

* make test doesn't try to build the beacon node :/
2021-03-04 10:13:44 +01:00
tersec 4278e80657
document two uint64 -> int64 conversions (#2375)
* document two uint64 -> int64 conversions

* fix minimal preset slot time & calculation
2021-03-04 10:13:23 +01:00
Mamy Ratsimbazafy 3276dfc683
Consolidate modules by areas [part 1] (#2365)
* Move sync in subfolder

* move validator related thingies in validators

* fix binary builds

* update bounds comment [skip ci]
2021-03-02 11:27:45 +01:00