Commit Graph

4046 Commits

Author SHA1 Message Date
Jacek Sieka 1f89b7f7b9
speed up trusted node backfill (#3371)
With these changes, we can backfill about 400-500 slots/sec, which means
a full backfill of mainnet takes about 2-3h.

However, the CPU is not saturated - neither in server nor in client
meaning that somewhere, there's an artificial inefficiency in the
communication - 16 parallel downloads *should* saturate the CPU.

One plasible cause would be "too many async event loop iterations" per
block request, which would introduce multiple "sleep-like" delays along
the way.

I can push the speed up to 800 slots/sec by increasing parallel
downloads even further, but going after the root cause of the slowness
would be better.

* avoid some unnecessary block copies
* double parallel requests
2022-02-12 12:09:59 +01:00
Jacek Sieka 40fe8f5336 fix missing backfill when restarting node
When node is restarted before backfill has started but after some blocks
have finalized with forward sync, we would not start the backfill.

* also clean up one last `SomeSome`
2022-02-11 23:08:50 +02:00
Jacek Sieka 1760f4d7a7
move wallet/deposit commands to separate files (#3372)
These commands have little to do with the "normal" beacon node operation
- ergo, they deserve to live in their own module.

* clean up imports/exports
2022-02-11 21:40:49 +01:00
Dustin Brody 3daa52ab87
update docs for geth/kiln 2022-02-11 20:06:06 +00:00
tersec d02daf8cbd
bump nim-web3 to fix kiln interop (#3373) 2022-02-11 18:38:44 +00:00
tersec fc0ce57b68
kiln merge test vectors for EL (#3377) 2022-02-11 18:17:37 +00:00
Eugene Kabanov b4eb150b9a
Revert restAccept workaround. (#3369)
Bump fixed version of nim-presto.
2022-02-11 12:01:45 +01:00
Jeremy Schlatter 47b1870100
update prysm export docs (#3365)
This command was [recently renamed](https://github.com/prysmaticlabs/prysm/pull/9873).
2022-02-09 11:25:12 +01:00
Mamy Ratsimbazafy 97a1735e4a
Bump BLST (security fic on currently unused primitive) (#3364) 2022-02-09 03:08:47 +01:00
Ștefan Talpalaru 70b38e37e6
Nim GC metrics for the main thread (#3108)
* Nim GC metrics for the main thread
2022-02-08 20:19:21 +01:00
yslcrypto 71d98a03b3 add warning to REST API page 2022-02-08 18:30:59 +01:00
Ștefan Talpalaru e03a653bbe
update AllTests-mainnet.md (#3363) 2022-02-08 15:39:15 +01:00
Eugene Kabanov 40c77e5928
Remote KeyManager API and number of fixes/tests for KeyManager API (#3360)
* Initial commit.

* Fix current test suite.

* Fix keymanager api test.

* Fix wss_sim.

* Add more keystore_management tests.

* Recover deleted isEmptyDir().

* Add `HttpHostUri` distinct type.
Move keymanager calls away from rest_beacon_calls to rest_keymanager_calls.
Add REST serialization of RemoteKeystore and Keystore object.
Add tests for Remote Keystore management API.
Add tests for Keystore management API (Add keystore).
Fix serialzation issues.

* Fix test to use HttpHostUri instead of Uri.

* Add links to specification in comments.

* Remove debugging echoes.
2022-02-07 22:36:09 +02:00
Jacek Sieka c7abc97545
harden and speed up block sync (#3358)
* harden and speed up block sync

The `GetBlockBy*` server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.

Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.

Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.

* fix status comparisons for nodes syncing from genesis (#3327 was a bit
too hard)
* don't hit database at all for post-altair slots in GetBlock v1
requests
2022-02-07 19:20:10 +02:00
tersec bf3ef987e4
deactivate doppelganger protection during genesis (#3362)
* deactivate Doppelganger Protection during genesis

* also don't actually flag supposed-doppelgangers (because they're before broadcastStartEpoch) on GENESIS_SLOT start
2022-02-07 07:12:36 +02:00
Ștefan Talpalaru 70579f2fb1
Jenkins: macOS ARM64 CI job (#3128) 2022-02-04 14:43:40 +01:00
Jacek Sieka 6f10e651ff
rest: fix ssz preference string (#3357) 2022-02-04 15:26:27 +02:00
tersec e0fb5d95a6
remove --subscribe-all{att,sync}nets (#3359) 2022-02-04 12:34:03 +00:00
tersec 02349b4181
update to engine API alpha.6 (#3351) 2022-02-04 12:12:19 +00:00
tersec d358299875
fork choice proposer boosting support (#3349)
* fork choice proposer boosting support

* detect nodeDelta underflow/overflow
2022-02-04 12:59:40 +01:00
Jacek Sieka a50e21e229
fix doppelganger detection logging
* update action tracker on dependent-root-changing reorg (instead of
epoch change)
* don't try to log duties while syncing - we're not tracking actions yet
* fix slot used for doppelganger loss detection
2022-02-04 12:25:32 +01:00
Jacek Sieka 49282e9477
val_mon: register locally produced aggregates (#3352)
These use a separate flow, and were previously only registered from the
network

* don't log successes in totals mode (TMI)
* remove `attestation-sent` event which is unused
2022-02-04 08:33:20 +01:00
tersec 9c18765b3b
remove ncli_db pruneDatabase (#3356) 2022-02-03 20:03:01 +01:00
tersec de0d473ea1
docs: don't rely on ncli_db pruneDatabase for reducing storage usage (#3355)
* don't rely on ncli_db pruneDatabase for reducing storage usage

* remove "Running out of storage" section altogether

* Update docs/the_nimbus_book/src/troubleshooting.md

Co-authored-by: sacha <sacha@status.im>

Co-authored-by: sacha <sacha@status.im>
2022-02-03 12:35:34 +00:00
Zahary Karadjov 215caa21ae Eth1 monitor fixes
* Fix a resource leak introduced in https://github.com/status-im/nimbus-eth2/pull/3279

* Don't restart the Eth1 syncing proggress from scratch in case of
  monitor failures during Eth2 syncing.

* Switch to the primary operator as soon as it is back online.

* Log the web3 credentials in fewer places

Other changes:

The 'web3 test' command has been enhanced to obtain and print more
data regarding the selected provider.
2022-02-03 14:01:55 +02:00
tersec 702d9e8c55
for x86 macOS, require >= Nehalem (#3353) 2022-02-02 15:24:41 +00:00
tersec 8e6a920bf4
rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH (#3350)
* rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH

* fix REST test rules
2022-02-02 14:06:55 +01:00
Jacek Sieka ff4f2a6b6c
better log on finalized slot failure 2022-02-01 21:23:18 +01:00
Tanguy bcd7b4598c
Tune peering (#3348)
- Request metadata_v2 (altair) by default instead of the v1
- Change the metadata pinger to a 3 failure-then-kick, instead of being time based
- Update kicker scorer to take into account topics which we're not subscribed to, to be sure that we will be able to publish correctly
- Add some metrics to give "fanout" health (in the same spirit of mesh health)
2022-02-01 18:20:55 +01:00
Zachinquarantine f5de887df7
Delete Pyrmont docs (#3340)
* Delete pyrmont.md

* Update log-rotate.md

* Update pi-guide.md
2022-02-01 12:05:20 +01:00
sacha 7d731322b2
Book: trusted sync (edits and clarifications) (#3329)
* edits

* more edits and clarifications

* edit

* add clarification on --trusted-node-url

* address feedback

* remove repetition
2022-02-01 12:02:04 +01:00
Zahary Karadjov ac16eb4691 Streamline the validator reward analysis
Notable improvements:

* A separate aggregation pass is no longer required.

* The user can opt to produce only aggregated data
  (resuing in a much smaller data set).

* Large portion of the number cruching in Jupyter is now done in C
  through the rich DataFrames API.

* Added support for comparisons against the "median" validator
  performance in the network.
2022-02-01 11:30:14 +02:00
tersec 0c814f49ee
rename sync_{committee_,}aggregate and execute_payload -> notify_new_payload (#3347) 2022-02-01 07:31:53 +00:00
EmilIvanichkovv 336403d18b Refactor `handleValidatorExitCommand`
Make `validator exit command` work both with `JSON-RPC` and `REST` APIs
Fix problem with specifying rest-url using `localhost`
Change back exit error messages in `state_transition_block`
2022-02-01 01:24:05 +02:00
Jacek Sieka 3df9ffca9f val-mon: remove redundant `_total` suffix from counters
It turns out nim-metrics adds this suffix on its own - it also turns out
some of the names are non-conventional and need follow-up.
2022-01-31 18:51:24 +02:00
tersec c9aa1bee01
spec URL updates (#3342) 2022-01-31 09:56:59 +00:00
Jacek Sieka ad327a8769
Fix counters in validator monitor totals mode (#3332)
The current counters set gauges etc to the value of the _last_ validator
to be processed - as the name of the feature implies, we should be using
sums instead.

* fix missing beacon state metrics on startup, pre-first-head-selection
* fix epoch metrics not being updated on cross-epoch reorg
2022-01-31 08:36:29 +01:00
Jacek Sieka d583e8e4ac
Store finalized block roots in database (3s startup) (#3320)
* Store finalized block roots in database (3s startup)

When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.

In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.

Doing so has a number of interesting effects:

* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.

Other random stuff:

* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
2022-01-30 18:51:04 +02:00
Emil 0051af430b Put `application/json` as a higher preference than `application/octet-stream` 2022-01-30 18:50:14 +02:00
tersec 29e2169585
phase 0 & altair beacon chain and altair validator spec URL updates (#3339) 2022-01-29 13:53:31 +00:00
tersec 89ffa8a1a7
spec URL & copyright year update (#3338) 2022-01-29 01:05:39 +00:00
tersec 60bf5b8bf4
use v1.1.9 test vectors (#3337) 2022-01-28 22:47:48 +00:00
tersec 95fee10328
clean up hashed rollback proc declarations (#3333)
* clean up hashed rollback proc declarations

* use generic hashed rollback proc type
2022-01-28 14:24:37 +00:00
cheatfate 1287a20b13 Use HTTP status codes instead of status in body. 2022-01-28 15:36:27 +02:00
Jacek Sieka e264276b36
keep unviables in quarantine (#3331)
they remain unviable even after a reorg
2022-01-28 11:59:55 +01:00
Zahary Karadjov 49b7daa39d [ncli_db] bugfix: take into account finalization delay in reward calc post Altair
This fixes a problem affecting Prater's epoch 64444.
2022-01-28 12:03:23 +02:00
tersec dcb671617c
add/support TERMINAL_BLOCK_HASH_ACTIVATION_EPOCH (#3303) 2022-01-27 19:52:08 +00:00
Jacek Sieka 84b6ad871d harden status message handling
Additional sanity checking of the status message exchanged during a
fresh connection:

* check that head and finalized make sense, slot-wise
* verify that finalized root lies on the canonical chain, when possible
* re-check these things for every status message during sync
2022-01-27 18:46:47 +02:00
Eugene Kabanov aa27baacf5
Fix 408 Timeout error returned by REST server. (#3301)
* Disable REST server timeouts.
* Add options to CLI to tune REST server parameters.
2022-01-27 18:41:05 +02:00
Ștefan Talpalaru d5a2c75963
restapi.sh: cleanup on exit (#3328)
also rename a confusing option/var combo
2022-01-27 13:03:38 +01:00