Introduce (optional) pruning of historical data - a pruned node will
continue to answer queries for historical data up to
`MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping
typical database usage at around 60-70gb.
To enable pruning, add `--history=prune` to the command line - on the
first start, old data will be cleared (which may take a while) - after
that, data is pruned continuously.
When pruning an existing database, the database will not shrink -
instead, the freed space is recycled as the node continues to run - to
free up space, perform a trusted node sync with a fresh database.
When switching on archive mode in a pruned node, history is retained
from that point onwards.
History pruning is scheduled to be enabled by default in a future
release.
In this PR, `minimal` mode from #4419 is not implemented meaning
retention periods for states and blocks are always the same - depending
on user demand, a future PR may implement `minimal` as well.
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).
The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.
Other changes:
* The logtrace tool has been enhanced with a framework for adding
new simpler log aggregation and analysis algorithms.
The default CI testnet simulation will now ensure that the blocks
in the network have reasonable sync committee participation.
To further tighten Nimbus against spam, this PR introduces a global
quota for block requests (shared between peers) as well as a general
per-peer request limit that applies to all libp2p requests.
* apply request quota before decoding message
* for high-bandwidth requests (blocks), apply a shared global quota
which helps manage bandwidth for high-peer setups
* add metrics
When connecting to hosts on shared IP/Port using TLS, SNI must be sent
to allow the remote server to provide the correct TLS certificate.
Bump the `nim-json-rpc` and `nim-websock` dependencies to send SNI ext.
`news` has a few open issues that are not present in `nim-websock`:
1. There is a 1 second delay between each MB of sent data.
2. Cancelling an ongoing `send` makes the entire WebSocket unusable.
3. Control packets do not have priority over ongoing message frames.
Using `news`, there are quite a few of these messages in Geth:
```
Previously seen beacon client is offline. Please ensure it is
operational to follow the chain!
```
It may take quite some time to reconnect when this happens.
Using `nim-websock`, this message still occurs because `eth1_monitor`
reconnects the EL connection when no new blocks occurred for 5 minutes,
but reconnecting is quick and the message is rarer.
When calling `newPayload` on a >1MB payload (can happen post-merge),
`news` splits up that payload into 1MB chunks. The chunks are each sent
individually, though, with `await` in-between. This means that when we
send concurrent `forkChoiceUpdated` calls, that those may end up getting
in-between the `newPayload` chunks, leading to invalid data being sent.
The EL then returns an error message with a `null` `id` entry (as it
could not read the request `id` due to the mangling) and disconnects.
A PR has been submitted to fix this in `news`, and merged into `status`
branch early as this fix is critical for reliable post-merge operation:
https://github.com/Tormund/news/pull/22
* Re-enabled requireAllFields after a fix in nim-json-serialization
The problem was that `Option[T]` fields were not treated as optional
when requireAllFields is set to true. This is now fixed in NJS.
* Add makefile targets for recreating the Jenkins simulation runs
* Fix a discrepancy with the REST spec
Whether new blocks/attestations/etc are produced internally or received
via REST, their journey through the node is the same - to ensure that
they get the same treatment (logging, metrics, processing), this PR
moves the routing to a dedicated module and fixes several small
differences that existed before.
* `xxxValidator` -> `processMessageName` - the processor also was adding
messages to pools, so we want the name to reflect that action
* add missing "sent" metrics for some messages
* document ignore policy better - already-seen messages are not actaully
rebroadcast by libp2p
* skip redundant signature checks for internal validators consistently
This updates `nim-ssz-serialization` to
`3db6cc0f282708aca6c290914488edd832971d61`.
Notable changes:
- Use `uint64` for `GeneralizedIndex`
- Add support for building merkle multiproofs
* Initial commit
* Make `events` API spec compliant.
* Add `Eth-Consensus-Version` in responses.
* Bump chronos to get redirect with headers working.
* Add `is_optimistic` field and handling to syncing RestSyncInfo.
* SSZ `[]` -> `mitem`
* `[]` -> `item`
immutable access via mutable instance cannot rely on template
overloading, and `[]` cannot be a `func` because of special seq handling
in compiler.
Other fixes:
* Fix bit rot in the `make prater-dev-deposit` target.
* Correct content-type in the responses of the Nimbus signing node
* Invalid JSON payload was being sent in the web3signer requests
This PR makes the necessary adjustments to deal with the revamped snappy
API.
In practical terms for nimbus-eth2, there are performance increases to
gossip processing, database reading and writing as well as era file
processing. Exporting `.era` files for example, a snappy-heavy
operation, almost halves in total processing time:
Pre:
```
Average, StdDev, Min, Max, Samples, Test
39.088, 8.735, 23.619, 53.301, 50, tState
237.079, 46.692, 165.620, 355.481, 49, tBlocks
```
Post:
```
All time are ms
Average, StdDev, Min, Max, Samples, Test
25.350, 5.303, 15.351, 41.856, 50, tState
141.238, 24.164, 99.990, 199.329, 49, tBlocks
```
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.
In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
rocksdb was never actually used in nimbus-eth2 and existed only to satisfy nim-eth dependencies for test running - these have since moved to nimbus-eth1.
* bump nim-eth
`.era` files and Req/Resp protocols use framed formats - aligning the
database with these makes for less recompression work overall as gossip
is sent only once while req/resp repeats (potentially) - this also
allows efficient pruning-to-era where snappy-recompression is the major
cycle thief.
Changes in nim-eth relevant to nimbus-eth2:
- Style fixes according to --styleCheck:usages (#452)
- Add discoveryv5 session metrics (#454)
- Don’t use exceptions for enr get call (#453)
- Add DiscoveryConfig to tune routing table ip limits and bitPerHops
- More --styleCheck fixes for discoveryv5 and eth/common (#473)
The added options work in opt-in fashion. If they are not specified,
the server will respond to all requests as if the CORS specification
doesn't exist. This will result in errors in CORS-enabled clients.
Please note that future versions may support more than one allowed
origin. The option names will stay the same, but the user will be
able to repeat them on the command line (similar to other options
such as --web3-url).
To be documented in the guide in a separate PR.
* Fix a resource leak introduced in https://github.com/status-im/nimbus-eth2/pull/3279
* Don't restart the Eth1 syncing proggress from scratch in case of
monitor failures during Eth2 syncing.
* Switch to the primary operator as soon as it is back online.
* Log the web3 credentials in fewer places
Other changes:
The 'web3 test' command has been enhanced to obtain and print more
data regarding the selected provider.
* Store finalized block roots in database (3s startup)
When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.
In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.
Doing so has a number of interesting effects:
* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.
Other random stuff:
* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
* initial support for minification and new interchange tests. Removal of v1 and v1 migration.
* Synthetic attestations: SQLite3 requires one statement/query per prepared statement
* Fix DB import interrupted if no attestation was found
* Skip test relying on undocumented test behavior (https://github.com/eth-clients/slashing-protection-interchange-tests/pull/12#issuecomment-1011158701)
* Skip test relying on unclear minification behavior:
creating an invalid minified attestation with source > target or setting target = max(source, target)
* remove DB v1 and update submodule
* Apply suggestions from code review
Co-authored-by: Jacek Sieka <jacek@status.im>
Co-authored-by: Jacek Sieka <jacek@status.im>
* REST cleanups
* reject out-of-range committee requests
* print all hex values as lower-case
* allow requesting state information by head state root
* turn `DomainType` into array (follow spec)
* `uint_to_bytesXX` -> `uint_to_bytes` (follow spec)
* fix wrong dependent root in `/eth/v1/validator/duties/proposer/`
* update documentation - `--subscribe-all-subnets` is no longer needed
when using the REST interface with validator clients
* more fixes
* common helpers for dependent block
* remove test rules obsoleted by more strict epoch tests
* fix trailing commas
* Update docs/the_nimbus_book/src/rest-api.md
* Update docs/the_nimbus_book/src/rest-api.md
Co-authored-by: sacha <sacha@status.im>
* 3x speedup in snappy compression
oh, the wonders of `copyMem` in `endians2` - speeds up all kinds of
operations like database stores, sending gossip etc.
* endian usage fixes
A novel optimisation for attestation and sync committee message
validation: when batching, we look for signatures of the same message
and aggregate these before batch-validating: this results in up to 60%
fewer signature verifications on a busy server, leading to a significant
reduction in CPU usage.
* increase batch size slightly which helps finding more aggregates
* add metrics for batch verification efficiency
* use simple `blsVerify` when there is only one signature to verify in
the batch, avoiding the RNG
This updates `nim-ssz-serialization` to
`3cd8d2d6b80bde0ce7f25609cb5cb9fc37852fe2`.
Notable changes:
- Serialization of object variant (case object) to/from SSZ Union.
- int -> int64 fix in hashTreeRootCached
This updates `nim-confutils` to
`6a56d01381f434d5fbcc61b6e497b9409155bcbc`.
Notable changes:
- feature: separator text when displaying help
- feature: multiple lines long description
- feature: add ignore property in addition to hidden
- add compile time check to detect duplicate abbr and duplicate name
This updates `nim-eth` to `5655bd035cfd7319c6f2e60b7fdefef65a057939`.
Notable changes:
- db: Allow Sqlite keystores to be used in read-only mode
- net: avoid allocation in hash(ValidIpAddress)
- net: Remove hashData usage on objects
- p2p: reject WHOAREYOU packets with non-empty message
- p2p: Adjust logging when node is not reachable but enrAutoUpdate is on
- p2p: Allow a node to self resolve
- p2p: Fix logDistance for BE arch and remove toBytes for NodeId
- p2p: Export discovery routing table and its buckets nodes
- ssz: remove outdated and incorrect SSZ code
- utp: Various updates and fixes
* Add some indicators to help fixing issue.
* Bump presto to help debugging.
* Fix compilation problems in presto.
* Fix SIGSEGV.
* Bump latest changes in chronos and presto.
Fix rare cases in validator_client.
* Use proper commits for chronos and presto.
* Logging and startup improvements
Color support for released binaries!
* startup scripts no longer log to file by default - this only affects
source builds - released binaries don't support file logging
* add --log-stdout option to control logging to stdout (colors, json)
* detect tty:s vs redirected logs and log accordingly
* add option to disable log colors at runtime
* simplify several "common" logs, showing the most important information
earlier and more clearly
* remove line numbers / file information / tid - these take up space and
are of little use to end users
* still enabled in debug builds and tools
* remove `testnet_servers_image` compile-time option
* server images, released binaries and compile-from-source now offer
the same behaviour and features
* fixes https://github.com/status-im/nimbus-eth2/issues/2326
* fixes https://github.com/status-im/nimbus-eth2/issues/1794
* remove instanteneous block speed from sync message, keeping only
average
before:
```
INF 2021-10-28 16:45:59.000+02:00 Slot start topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:884 lastSlot=2384027 wallSlot=2384028 delay=461us84ns peers=0 head=75a10ee5:3348 headEpoch=104 finalized=cd6804ba:3264 finalizedEpoch=102 sync="wwwwwwwwww:0:0.0000:0.0000:00h00m (3348)"
INF 2021-10-28 16:45:59.046+02:00 Slot end topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:821 slot=2384028 nextSlot=2384029 head=75a10ee5:3348 headEpoch=104 finalizedHead=cd6804ba:3264 finalizedEpoch=102 nextAttestationSlot=-1 nextProposalSlot=-1 nextActionWait=n/a
```
after:
```
INF 2021-10-28 22:43:23.033+02:00 Slot start topics="beacnde" slot=2385815 epoch=74556 sync="DDPDDPUDDD:10:5.2258:01h19m (2361088)" peers=37 head=eacd2dae:2361096 finalized=73782:a4751487 delay=33ms687us715ns
INF 2021-10-28 22:43:23.291+02:00 Slot end topics="beacnde" slot=2385815 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 head=eacd2dae:2361096
```
* fix comment
* documentation updates
* mention `--log-file` may be deprecated in the future
* update various docs
* Add `child_id` field.
* Fix json-rpc api call and bump chronos.
* Bump chronos master and fix compilation warnings.
* One more bump of `chronos`.
* add random import
* export rest_utils a bit more
Co-authored-by: Jacek Sieka <jacek@status.im>
* auto-bump nim-libp2p
* Remove peer info for other peers
Not definitive, just to test the libp2p's unstable branch
* finish up Remove peer info for other peers
* getKey -> getPublicKey
* bump libp2p
* libp2p bump
Co-authored-by: = <Menduist@users.noreply.github.com>
Co-authored-by: Tanguy <tanguy@status.im>
This still doesn't work properly because it leads to a change in
the "eth" ENR field of the client. Since the client uses the value
of this field to look for other nodes on the network, the change
effectively prevents us from finding peers.
* Placing callbacks into strategic places.
* Initial events call implementation.
* Post rebase fixes.
* Change addSyncContribution() implementation.
* Add `attestation-sent` event.
Remove gcsafe, raises from callbacks implementations.
Move `attestation-received` fire at the end of attestation processing.
* Address review comments.
* Add parallel attestation verification
* Update tests, batchVerify doesn't use the threadpool with only single core (nim-blscurve update)
* bump nim-blscurve
* Debug info for failing eth2 test vectors
* remove submodule eth2-testnets
* verbose debugging of make failure on Windows (libbacktrace?)
* Remove CI debug mode
* initialization convention
* Fix new altair tests
* add random tests and rename "Official" to "Ethereum Foundation"
* checkDir = true covers dirExists(...)
* invalidate CI EF fixtures cache
* more correct cache invalidation
* reorganize ssz dependencies
This PR continues the work in
https://github.com/status-im/nimbus-eth2/pull/2646,
https://github.com/status-im/nimbus-eth2/pull/2779 as well as past
issues with serialization and type, to disentangle SSZ from eth2 and at
the same time simplify imports and exports with a structured approach.
The principal idea here is that when a library wants to introduce SSZ
support, they do so via 3 files:
* `ssz_codecs` which imports and reexports `codecs` - this covers the
basic byte conversions and ensures no overloads get lost
* `xxx_merkleization` imports and exports `merkleization` to specialize
and get access to `hash_tree_root` and friends
* `xxx_ssz_serialization` imports and exports `ssz_serialization` to
specialize ssz for a specific library
Those that need to interact with SSZ always import the `xxx_` versions
of the modules and never `ssz` itself so as to keep imports simple and
safe.
This is similar to how the REST / JSON-RPC serializers are structured in
that someone wanting to serialize spec types to REST-JSON will import
`eth2_rest_serialization` and nothing else.
* split up ssz into a core library that is independendent of eth2 types
* rename `bytes_reader` to `codec` to highlight that it contains coding
and decoding of bytes and native ssz types
* remove tricky List init overload that causes compile issues
* get rid of top-level ssz import
* reenable merkleization tests
* move some "standard" json serializers to spec
* remove `ValidatorIndex` serialization for now
* remove test_ssz_merkleization
* add tests for over/underlong byte sequences
* fix broken seq[byte] test - seq[byte] is not an SSZ type
There are a few things this PR doesn't solve:
* like #2646 this PR is weak on how to handle root and other
dontSerialize fields that "sometimes" should be computed - the same
problem appears in REST / JSON-RPC etc
* Fix a build problem on macOS
* Another way to fix the macOS builds
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
* bump libp2p
* altair sync v2
Use V2 sync requests after the altair fork has happened, according to
the wall clock
* Fix the behavior of the v1 req/resp calls after Altair
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
* Initial commit.
* Exporting getConfig().
* Add beacon node checking procedures.
* Post rebase fixes.
* Use runSlotLoop() from nimbus_beacon_node.
Fallback implementation.
Fixes for ETH2 REST serialization.
* Add beacon_clock.durationToNextSlot().
Move type declarations from beacon_rest_api to json_rest_serialization.
Fix seq[ValidatorIndex] serialization.
Refactor ValidatorPool and add some utility procedures.
Create separate version of validator_client.
* Post-rebase fixes.
Remove CookedPubKey from validator_pool.nim.
* Now we should be able to produce attestations and aggregate and proofs.
But its not working yet.
* Debugging attestation sending.
* Add durationToNextAttestation.
Optimize some debug logs.
Fix aggregation_bits encoding.
Bump chronos/presto.
* Its alive.
* Fixes for launch_local_testnet script.
Bump chronos.
* Switch client API to not use `/api` prefix.
* Post-rebase adjustments.
* Fix endpoint for publishBlock().
* Add CONFIG_NAME.
Add more checks to ensure that beacon_node is compatible.
* Add beacon committee subscription support to validator_client.
* Fix stacktrace should be an array of strings.
Fix committee subscriptions should not be `data` keyed.
* Log duration to next block proposal.
* Fix beacon_node_status import.
* Use jsonMsgResponse() instead of jsonError().
* Fix graffityBytes usage.
Remove unnecessary `await`.
Adjust creation of SignedBlock instance.
Remove legacy files.
* Rework durationToNextSlot() and durationToNextEpoch() to use `fromNow`.
* Fix race condition for block proposal and attestations for same slot.
Fix local_testnet script to properly kill tasks on Windows.
Bump chronos and nim-http-tools, to allow connections to infura.io (basic auth).
* Catch services errors.
Improve performance of local_testnet.sh script on Windows.
Fix race condition when attestation producing.
* Post-rebase fixes.
* Bump chronos and presto.
* Calculate block publishing delay.
Fix pkill in one more place.
* Add error handling and timeouts to firstSuccess() template.
Add onceToAll() template.
Add checkNodes() procedure.
Refactor firstSuccess() template.
Add error checking to api.nim calls.
* Deprecated usage onceToAll() for better stability.
Address comment and send attestations asap.
* Avoid unnecessary loop when calculating minimal duration.
* Implement split preset/config support
This is the initial bulk refactor to introduce runtime config values in
a number of places, somewhat replacing the existing mechanism of loading
network metadata.
It still needs more work, this is the initial refactor that introduces
runtime configuration in some of the places that need it.
The PR changes the way presets and constants work, to match the spec. In
particular, a "preset" now refers to the compile-time configuration
while a "cfg" or "RuntimeConfig" is the dynamic part.
A single binary can support either mainnet or minimal, but not both.
Support for other presets has been removed completely (can be readded,
in case there's need).
There's a number of outstanding tasks:
* `SECONDS_PER_SLOT` still needs fixing
* loading custom runtime configs needs redoing
* checking constants against YAML file
* yeerongpilly support
`build/nimbus_beacon_node --network=yeerongpilly --discv5:no --log-level=DEBUG`
* load fork epoch from config
* fix fork digest sent in status
* nicer error string for request failures
* fix tools
* one more
* fixup
* fixup
* fixup
* use "standard" network definition folder in local testnet
Files are loaded from their standard locations, including genesis etc,
to conform to the format used in the `eth2-networks` repo.
* fix launch scripts, allow unknown config values
* fix base config of rest test
* cleanups
* bundle mainnet config using common loader
* fix spec links and names
* only include supported preset in binary
* drop yeerongpilly, add altair-devnet-0, support boot_enr.yaml
* update to Altair as of v1.1.0-alpha.7
* introduce Altair types into attestation pool
* avoid allocating/copying pubkeys excessively in get_next_sync_committee()
* write uncompressed validator keys to database
Loading 150k+ validator keys on startup in compressed format takes a lot
of time - better store them in uncompressed format which makes behaviour
just after startup faster / more predictable.
* refactor cached validator key access
* fix isomorphic cast to work with non-var instances
* remove cooked pubkey cache - directly use database cache in chaindag
as well (one less cache to keep in sync)
* bump blscurve, introduce loadValid for known-to-be-valid keys
The V1 table structure shows great improvements in performance, but if
there's an old `kvstore` without rowid:s, these benefits are nullified:
reorgs during writes and deletes remain expensive (even if the
degradation is reduced somewhat).
This PR creates the tables in a new file instead, and uses the old file
as a read-only store - this has several interesting properties:
* the old database is left completely untouched - this guarantees that
downgrades work smooth (they'll only need to resync their missing
portions)
* starting sync after this PR means only a v1 database is created
* v0 databases stick around - no migration is performed (for now)
Future PR:s can introduce migration of the data from one database to
another - a simply copy will take hours which is downtime we want to
avoid - at that point, it might make sense to migrate straight to era
files instead.