Commit Graph

2387 Commits

Author SHA1 Message Date
Etan Kissling d1f97e209a
remove unused `sleepTime` from `SyncManager` (#3384)
The `SyncManager` has a leftover optional `sleepTime` parameter in
its constructor that used to configure the sync loop polling rate.
This parameter was replaced with a constant in #1602 and is no longer
functional. This patch removes the `sleepTime` leftovers.
2022-02-14 12:05:01 +01:00
Etan Kissling a28900c348
fix slot number display during sync (#3383)
#3304 introduced a regression to the sync status string displayed in the
status bar; during the main forward sync, the current slot is no longer
reported and always displays as `0`. This patch corrects the computation
to accurately report the current slot once more.
2022-02-14 12:04:04 +01:00
tersec 873a8ec1e6
use isZeroMemory for Eth2Digest comparisons (#3386)
* use isZeroMemory for Eth2Digest comparisons

* use Eth2Digest.isZero abstraction
2022-02-14 05:26:19 +00:00
Eugene Kabanov 1a0bcf0b02
Fix #3267 (#3367)
* Initial commit.

* One more fix.

* Trying to debug the finalization issue.

* Add debug logs to understand signature issue.

* Restore hash_tree_root calculation.

* Remove all the debugging helpers.

* Add `slot` check.

* Address review comment.
2022-02-13 16:21:55 +01:00
Etan Kissling 15fc7534cf
remove unused `maxStatusAge` from `SyncManager` (#3382)
The `SyncManager` has a leftover optional `maxStatusAge` parameter in
its constructor that used to configure the libp2p `Status` polling rate.
This parameter was replaced with a constant in #1827 and is no longer
functional. This patch removes the `maxStatusAge` leftovers.
2022-02-13 16:17:13 +01:00
Jacek Sieka 1f89b7f7b9
speed up trusted node backfill (#3371)
With these changes, we can backfill about 400-500 slots/sec, which means
a full backfill of mainnet takes about 2-3h.

However, the CPU is not saturated - neither in server nor in client
meaning that somewhere, there's an artificial inefficiency in the
communication - 16 parallel downloads *should* saturate the CPU.

One plasible cause would be "too many async event loop iterations" per
block request, which would introduce multiple "sleep-like" delays along
the way.

I can push the speed up to 800 slots/sec by increasing parallel
downloads even further, but going after the root cause of the slowness
would be better.

* avoid some unnecessary block copies
* double parallel requests
2022-02-12 12:09:59 +01:00
Jacek Sieka 40fe8f5336 fix missing backfill when restarting node
When node is restarted before backfill has started but after some blocks
have finalized with forward sync, we would not start the backfill.

* also clean up one last `SomeSome`
2022-02-11 23:08:50 +02:00
Jacek Sieka 1760f4d7a7
move wallet/deposit commands to separate files (#3372)
These commands have little to do with the "normal" beacon node operation
- ergo, they deserve to live in their own module.

* clean up imports/exports
2022-02-11 21:40:49 +01:00
Eugene Kabanov b4eb150b9a
Revert restAccept workaround. (#3369)
Bump fixed version of nim-presto.
2022-02-11 12:01:45 +01:00
Ștefan Talpalaru 70b38e37e6
Nim GC metrics for the main thread (#3108)
* Nim GC metrics for the main thread
2022-02-08 20:19:21 +01:00
Eugene Kabanov 40c77e5928
Remote KeyManager API and number of fixes/tests for KeyManager API (#3360)
* Initial commit.

* Fix current test suite.

* Fix keymanager api test.

* Fix wss_sim.

* Add more keystore_management tests.

* Recover deleted isEmptyDir().

* Add `HttpHostUri` distinct type.
Move keymanager calls away from rest_beacon_calls to rest_keymanager_calls.
Add REST serialization of RemoteKeystore and Keystore object.
Add tests for Remote Keystore management API.
Add tests for Keystore management API (Add keystore).
Fix serialzation issues.

* Fix test to use HttpHostUri instead of Uri.

* Add links to specification in comments.

* Remove debugging echoes.
2022-02-07 22:36:09 +02:00
Jacek Sieka c7abc97545
harden and speed up block sync (#3358)
* harden and speed up block sync

The `GetBlockBy*` server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.

Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.

Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.

* fix status comparisons for nodes syncing from genesis (#3327 was a bit
too hard)
* don't hit database at all for post-altair slots in GetBlock v1
requests
2022-02-07 19:20:10 +02:00
tersec bf3ef987e4
deactivate doppelganger protection during genesis (#3362)
* deactivate Doppelganger Protection during genesis

* also don't actually flag supposed-doppelgangers (because they're before broadcastStartEpoch) on GENESIS_SLOT start
2022-02-07 07:12:36 +02:00
Jacek Sieka 6f10e651ff
rest: fix ssz preference string (#3357) 2022-02-04 15:26:27 +02:00
tersec e0fb5d95a6
remove --subscribe-all{att,sync}nets (#3359) 2022-02-04 12:34:03 +00:00
tersec 02349b4181
update to engine API alpha.6 (#3351) 2022-02-04 12:12:19 +00:00
tersec d358299875
fork choice proposer boosting support (#3349)
* fork choice proposer boosting support

* detect nodeDelta underflow/overflow
2022-02-04 12:59:40 +01:00
Jacek Sieka a50e21e229
fix doppelganger detection logging
* update action tracker on dependent-root-changing reorg (instead of
epoch change)
* don't try to log duties while syncing - we're not tracking actions yet
* fix slot used for doppelganger loss detection
2022-02-04 12:25:32 +01:00
Jacek Sieka 49282e9477
val_mon: register locally produced aggregates (#3352)
These use a separate flow, and were previously only registered from the
network

* don't log successes in totals mode (TMI)
* remove `attestation-sent` event which is unused
2022-02-04 08:33:20 +01:00
Zahary Karadjov 215caa21ae Eth1 monitor fixes
* Fix a resource leak introduced in https://github.com/status-im/nimbus-eth2/pull/3279

* Don't restart the Eth1 syncing proggress from scratch in case of
  monitor failures during Eth2 syncing.

* Switch to the primary operator as soon as it is back online.

* Log the web3 credentials in fewer places

Other changes:

The 'web3 test' command has been enhanced to obtain and print more
data regarding the selected provider.
2022-02-03 14:01:55 +02:00
tersec 8e6a920bf4
rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH (#3350)
* rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH

* fix REST test rules
2022-02-02 14:06:55 +01:00
Jacek Sieka ff4f2a6b6c
better log on finalized slot failure 2022-02-01 21:23:18 +01:00
Tanguy bcd7b4598c
Tune peering (#3348)
- Request metadata_v2 (altair) by default instead of the v1
- Change the metadata pinger to a 3 failure-then-kick, instead of being time based
- Update kicker scorer to take into account topics which we're not subscribed to, to be sure that we will be able to publish correctly
- Add some metrics to give "fanout" health (in the same spirit of mesh health)
2022-02-01 18:20:55 +01:00
tersec 0c814f49ee
rename sync_{committee_,}aggregate and execute_payload -> notify_new_payload (#3347) 2022-02-01 07:31:53 +00:00
EmilIvanichkovv 336403d18b Refactor `handleValidatorExitCommand`
Make `validator exit command` work both with `JSON-RPC` and `REST` APIs
Fix problem with specifying rest-url using `localhost`
Change back exit error messages in `state_transition_block`
2022-02-01 01:24:05 +02:00
Jacek Sieka 3df9ffca9f val-mon: remove redundant `_total` suffix from counters
It turns out nim-metrics adds this suffix on its own - it also turns out
some of the names are non-conventional and need follow-up.
2022-01-31 18:51:24 +02:00
tersec c9aa1bee01
spec URL updates (#3342) 2022-01-31 09:56:59 +00:00
Jacek Sieka ad327a8769
Fix counters in validator monitor totals mode (#3332)
The current counters set gauges etc to the value of the _last_ validator
to be processed - as the name of the feature implies, we should be using
sums instead.

* fix missing beacon state metrics on startup, pre-first-head-selection
* fix epoch metrics not being updated on cross-epoch reorg
2022-01-31 08:36:29 +01:00
Jacek Sieka d583e8e4ac
Store finalized block roots in database (3s startup) (#3320)
* Store finalized block roots in database (3s startup)

When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.

In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.

Doing so has a number of interesting effects:

* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.

Other random stuff:

* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
2022-01-30 18:51:04 +02:00
Emil 0051af430b Put `application/json` as a higher preference than `application/octet-stream` 2022-01-30 18:50:14 +02:00
tersec 29e2169585
phase 0 & altair beacon chain and altair validator spec URL updates (#3339) 2022-01-29 13:53:31 +00:00
tersec 89ffa8a1a7
spec URL & copyright year update (#3338) 2022-01-29 01:05:39 +00:00
tersec 60bf5b8bf4
use v1.1.9 test vectors (#3337) 2022-01-28 22:47:48 +00:00
tersec 95fee10328
clean up hashed rollback proc declarations (#3333)
* clean up hashed rollback proc declarations

* use generic hashed rollback proc type
2022-01-28 14:24:37 +00:00
cheatfate 1287a20b13 Use HTTP status codes instead of status in body. 2022-01-28 15:36:27 +02:00
Jacek Sieka e264276b36
keep unviables in quarantine (#3331)
they remain unviable even after a reorg
2022-01-28 11:59:55 +01:00
Zahary Karadjov 49b7daa39d [ncli_db] bugfix: take into account finalization delay in reward calc post Altair
This fixes a problem affecting Prater's epoch 64444.
2022-01-28 12:03:23 +02:00
tersec dcb671617c
add/support TERMINAL_BLOCK_HASH_ACTIVATION_EPOCH (#3303) 2022-01-27 19:52:08 +00:00
Jacek Sieka 84b6ad871d harden status message handling
Additional sanity checking of the status message exchanged during a
fresh connection:

* check that head and finalized make sense, slot-wise
* verify that finalized root lies on the canonical chain, when possible
* re-check these things for every status message during sync
2022-01-27 18:46:47 +02:00
Eugene Kabanov aa27baacf5
Fix 408 Timeout error returned by REST server. (#3301)
* Disable REST server timeouts.
* Add options to CLI to tune REST server parameters.
2022-01-27 18:41:05 +02:00
tersec 7c51da037f
add block gossip validation condition (#3325) 2022-01-26 17:22:06 +00:00
tersec 2b4a960270
rename On{Merge,Bellatrix}BlockAdded and Rollback{Merge,Bellatrix}HashedProc (#3321) 2022-01-26 13:21:29 +01:00
Jacek Sieka f70aceef37
Harden handling of unviable forks (#3312)
* Harden handling of unviable forks

In our current handling of unviable forks, we allow peers to send us
blocks that come from a different fork - this is not necessarily an
error as it can happen naturally, but it does open up the client to a
case where the same unviable fork keeps getting requested - rather than
allowing this to happen, we'll now give these peers a small negative
score - if it keeps happening, we'll disconnect them.

* keep track of unviable forks in quarantine, to avoid filling it with
known junk
* collect peer scores in single module
* descore peers when they send unviable blocks during sync
* don't give score for duplicate blocks
* increase quarantine size to a level that allows finality to happen
under optimal conditions - this helps avoid downloading the same blocks
over and over in case of an unviable fork
* increase initial score for new peers to make room for one more failure
before disconnection
* log and score invalid/unviable blocks in requestmanager too
* avoid ChainDAG dependency in quarantine
* reject gossip blocks with unviable parent
* continue processing unviable sync blocks in order to build unviable
dag

* docs

* Update beacon_chain/consensus_object_pools/block_pools_types.nim

* add unviable queue test
2022-01-26 13:20:08 +01:00
tersec bd0a3a9b10
rearrange MEV code (#3319) 2022-01-25 19:43:28 +00:00
Emil efbd939108 Make `handleValidatorExitCommand` work with `REST API` 2022-01-25 14:00:29 +02:00
Jacek Sieka d076e1a11b
ncli_db: import states and blocks from era file (#3313) 2022-01-25 09:28:26 +01:00
tersec 00a347457a
dynamic sync committee subscriptions (#3308)
* dynamic sync committee subscriptions

* fast-path trivial case rather than rely on RNG with probability 1 outcome

Co-authored-by: zah <zahary@gmail.com>

* use func instead of template; avoid calling async function unnecessarily

* avoid unnecessary sync committee topic computation; use correct epoch lookahead; enforce exception/effect tracking

* don't over-optimistically update ENR syncnets; non-looping version of nearSyncCommitteePeriod

* allow separately setting --allow-all-{sub,att,sync}nets

* remove unnecessary async

Co-authored-by: zah <zahary@gmail.com>
2022-01-24 20:40:59 +00:00
tersec 062275461c
add flashbots (milestone 1) consensus beacon block types (#3314)
* add flashbots (milestone 1) consensus beacon block types

* remove MEV types from main bellatrix spec module
2022-01-24 20:15:22 +00:00
tersec 351c2fd48a
rename mergeData to bellatrixData and mergeFork to bellatrixFork (#3315) 2022-01-24 16:23:13 +00:00
Jacek Sieka 61342c2449
limit by-root requests to non-finalized blocks (#3293)
* limit by-root requests to non-finalized blocks

Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.

We can distinguish between two cases where by-root access is useful:

* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really

In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.

Future work includes:

* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)

Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.

* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`

* fix dag.blocks ref
2022-01-21 13:33:16 +02:00