With these changes, we can backfill about 400-500 slots/sec, which means
a full backfill of mainnet takes about 2-3h.
However, the CPU is not saturated - neither in server nor in client
meaning that somewhere, there's an artificial inefficiency in the
communication - 16 parallel downloads *should* saturate the CPU.
One plasible cause would be "too many async event loop iterations" per
block request, which would introduce multiple "sleep-like" delays along
the way.
I can push the speed up to 800 slots/sec by increasing parallel
downloads even further, but going after the root cause of the slowness
would be better.
* avoid some unnecessary block copies
* double parallel requests
* limit by-root requests to non-finalized blocks
Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.
We can distinguish between two cases where by-root access is useful:
* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really
In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.
Future work includes:
* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)
Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.
* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`
* fix dag.blocks ref
* Trusted node sync
Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a
trusted node instead of relying on a full sync from genesis.
Features include:
* sync from any slot, including the latest finalized slot
* backfill blocks either from the REST api (default) or p2p (#3263)
Future improvements:
* top up blocks between head in database and some other node - this
makes for an efficient backup tool
* recreate historical state to enable historical queries
* fixes
* load genesis from network metadata
* check checkpoint block root against state
* fix invalid block root in rest json decoding
* odds and ends
* retry looking for epoch-boundary checkpoint blocks