Commit Graph

274 Commits

Author SHA1 Message Date
zah b1ac9c9fe4
Fix a potential segfault and various potential stalls (#4003)
* Fixes a segfault during block production when the Keymanager API
  is disabled. The Keymanager is now disabled on half of the local
  testnet nodes to catch such problems in the future.

* Fixes multiple potential stalls from REST requests being done
  without a timeout. From practice, we know that such requests
  can hang forever if not cancelled with a timeout. At best,
  this would be a resource leak, at worst, it may lead to a
  full stall of the client and missed validator duties.

* Changes some Options usages to Opt (for easier use of valueOr)
2022-08-19 21:51:30 +00:00
zah d64c17ffc3
Minor post-merge cleanups (#3945)
https://github.com/status-im/nimbus-eth2/pull/3944

The use of nested `awaitWithRetries` calls would have
resulted in an unexpected number of retries (3x3).
We now use regular `await` in outer layer to avoid the problem.

https://github.com/status-im/nimbus-eth2/pull/3943

The new code has an invariant that the `headMerkleizer` field in
the `Eth1Chain` is always kept in sync with the blocks stored in
the chain.

This invariant is now enforced better by doing the necessary merkleizer updates
in the `Eth1Chain.addBlock` function, in the `Eth1Chain.init` function and in the
`Eth1Chain.reset` function.
2022-08-10 12:31:10 +00:00
zah dc50abbc90
Implement a missing ingnore rule for sync committee contributions (#3941) 2022-08-09 12:52:11 +03:00
Etan Kissling 2a2bcea70d
group justified and finalized `Checkpoint` (#3841)
The justified and finalized `Checkpoint` are frequently passed around
together. This introduces a new `FinalityCheckpoint` data structure that
combines them into one.

Due to the large usage of this structure in fork choice, also took this
opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec.
Many additional tests enabled, some need more work, e.g. EL mock blocks.
Also implemented `discard_equivocations` which was skipped in #3661,
and improved code reuse across fork choice logic while at it.
2022-07-06 13:33:02 +03:00
Jacek Sieka c145916414
cleanups (#3819)
* avoid circular panda imports
* move deposit merkleization helpers to spec/
* normalize validator signature helpers to spec names / params
* remove redundant functions for remote signing
2022-06-29 18:53:59 +02:00
tersec 8eb5d5de09
use ZERO_HASH for default(Eth2Digest)/Eth2Digest() in func calls (#3770) 2022-06-18 04:57:37 +00:00
Etan Kissling c808f17a37
update to latest light client libp2p protocol (#3623)
Incorporates the latest changes to the light client sync protocol based
on Devconnect AMS feedback. Note that this breaks compatibility with the
previous prototype, due to changes to data structures and endpoints.
See https://github.com/ethereum/consensus-specs/pull/2802
2022-05-23 14:02:54 +02:00
zah a2ba34f686
Implement all sync committee duties in the validator client (#3583)
Other changes:

* logtrace can now verify sync committee messages and contributions
* Many unnecessary use of pairs() have been removed for consistency
* Map 40x BN response codes to BeaconNodeStatus.Incompatible in the VC
2022-05-10 10:03:40 +00:00
tersec 9e738a92b4
stylecheck fixes (#3595) 2022-04-15 12:46:56 +00:00
tersec 61ba308e13
stylecheck fixes (#3593) 2022-04-14 17:39:37 +02:00
tersec 28ba2d5544
stylecheck fixes (#3592) 2022-04-14 13:47:14 +03:00
Jacek Sieka f70ff38b53
enable `styleCheck:usages` (#3573)
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.

In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
2022-04-08 16:22:49 +00:00
Jacek Sieka 05ffe7b2bf
Prune `BlockRef` on finalization (#3513)
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.

By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.

Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time -  instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).

In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.

What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.

A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.

The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.

* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
2022-03-17 17:42:56 +00:00
Jacek Sieka c64bf045f3
remove StateData (#3507)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)

99% mechanical change.
2022-03-16 08:20:40 +01:00
tersec 79761c78a4
proc -> func, mainly in spec/state transition and adjecent modules (#3405) 2022-02-17 11:53:55 +00:00
Eugene Kabanov 40c77e5928
Remote KeyManager API and number of fixes/tests for KeyManager API (#3360)
* Initial commit.

* Fix current test suite.

* Fix keymanager api test.

* Fix wss_sim.

* Add more keystore_management tests.

* Recover deleted isEmptyDir().

* Add `HttpHostUri` distinct type.
Move keymanager calls away from rest_beacon_calls to rest_keymanager_calls.
Add REST serialization of RemoteKeystore and Keystore object.
Add tests for Remote Keystore management API.
Add tests for Keystore management API (Add keystore).
Fix serialzation issues.

* Fix test to use HttpHostUri instead of Uri.

* Add links to specification in comments.

* Remove debugging echoes.
2022-02-07 22:36:09 +02:00
tersec 8e6a920bf4
rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH (#3350)
* rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH

* fix REST test rules
2022-02-02 14:06:55 +01:00
tersec 00a347457a
dynamic sync committee subscriptions (#3308)
* dynamic sync committee subscriptions

* fast-path trivial case rather than rely on RNG with probability 1 outcome

Co-authored-by: zah <zahary@gmail.com>

* use func instead of template; avoid calling async function unnecessarily

* avoid unnecessary sync committee topic computation; use correct epoch lookahead; enforce exception/effect tracking

* don't over-optimistically update ENR syncnets; non-looping version of nearSyncCommitteePeriod

* allow separately setting --allow-all-{sub,att,sync}nets

* remove unnecessary async

Co-authored-by: zah <zahary@gmail.com>
2022-01-24 20:40:59 +00:00
tersec 351c2fd48a
rename mergeData to bellatrixData and mergeFork to bellatrixFork (#3315) 2022-01-24 16:23:13 +00:00
Jacek Sieka c7e92bfd84
wss_sim: state transition simulator (#3309)
It's sometimes useful to simulate what happens when a chain runs from a
given state with a given set of private keys - `wss_sim` allows running
such a simulation.

One use of such a tool is to simulate a weak subjectivity attack,
creating alternative histories of the same chain:
https://notes.status.im/nimbus-insecura-network#
2022-01-22 10:25:30 +01:00
tersec 9c0c9c98ce
complete switch to beacon_chain/specs/datatypes/bellatrix (#3295) 2022-01-18 13:36:52 +00:00
Jacek Sieka e9486f5e5b
state_sim: clean up attestation production (#3274)
* use same naming as everywhere
* avoid iterator bug that leads to state copy
2022-01-12 21:42:03 +01:00
Jacek Sieka 805e85e1ff
time: spring cleaning (#3262)
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).

Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.

* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`

A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
2022-01-11 11:01:54 +01:00
Jacek Sieka 20e700fae4
Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex (#3259)
* Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex

Harden the use of `CommitteeIndex` et al to prevent future issues by
using a distinct type, then validating before use in several cases -
datatypes in spec are kept simple though so that invalid data still can
be read.

* fix invalid epoch used in REST
`/eth/v1/beacon/states/{state_id}/committees` committee length (could
return invalid data)
* normalize some variable names
* normalize committee index loops
* fix `RestAttesterDuty` to use `uint64` for `validator_committee_index`
* validate `CommitteeIndex` on ingress in REST API
* update rest rules with stricter parsing
* better REST serializers
* save lots of memory by not using `zip` ...at least a few bytes!
2022-01-09 01:28:49 +02:00
Jacek Sieka 0a4728a241
Handle access to historical data for which there is no state (#3217)
With checkpoint sync in particular, and state pruning in the future,
loading states or state-dependent data may fail. This PR adjusts the
code to allow this to be handled gracefully.

In particular, the new availability assumption is that states are always
available for the finalized checkpoint and newer, but may fail for
anything older.

The `tail` remains the point where state loading de-facto fails, meaning
that between the tail and the finalized checkpoint, we can still get
historical data (but code should be prepared to handle this as an
error).

However, to harden the code against long replays, several operations
which are assumed to work only with non-final data (such as gossip
verification and validator duties) now limit their search horizon to
post-finalized data.

* harden several state-dependent operations by logging an error instead
of introducing a panic when state loading fails
* `withState` -> `withUpdatedState` to differentiate from the other
`withState`
* `updateStateData` can now fail if no state is found in database - it
is also hardened against excessively long replays
* `getEpochRef` can now fail when replay fails
* reject blocks with invalid target root - they would be ignored
previously
* fix recursion bug in `isProposed`
2022-01-05 19:38:04 +01:00
tersec 66c9b7fbce
shift block_sim fork epochs; allow VC to work with non-multiple-of-3 SECONDS_PER_SLOT (#3244) 2022-01-05 13:41:39 +00:00
tersec b81c06edab
rename Beacon{Block,State}Fork.Merge to Bellatrix; update copyright years (#3240) 2022-01-04 09:45:38 +00:00
tersec 1a6a56bdb1
use BeaconTime instead of Slot in fork choice (#3138)
* use v1.1.6 test vectors; use BeaconTime instead of Slot in fork choice

* tick through every slot at least once

* use div INTERVALS_PER_SLOT and use precomputed constants of them

* use correct (even if numerically equal) constant
2021-12-21 18:56:08 +00:00
Jacek Sieka c270ec21e4
Validator monitoring (#2925)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.

The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
2021-12-20 20:20:31 +01:00
Jacek Sieka 03005f48e1
Backfill support for ChainDAG (#3171)
In the ChainDAG, 3 block pointers are kept: genesis, tail and head. This
PR adds one more block pointer: the backfill block which represents the
block that has been backfilled so far.

When doing a checkpoint sync, a random block is given as starting point
- this is the tail block, and we require that the tail block has a
corresponding state.

When backfilling, we end up with blocks without corresponding states,
hence we cannot use `tail` as a backfill pointer - there is no state.

Nonetheless, we need to keep track of where we are in the backfill
process between restarts, such that we can answer GetBeaconBlocksByRange
requests.

This PR adds the basic support for backfill handling - it needs to be
integrated with backfill sync, and the REST API needs to be adjusted to
take advantage of the new backfilled blocks when responding to certain
requests.

Future work will also enable moving the tail in either direction:
* pruning means moving the tail forward in time and removing states
* backwards means recreating past states from genesis, such that
intermediate states are recreated step by step all the way to the tail -
at that point, tail, genesis and backfill will match up.
* backfilling is done when backfill != genesis - later, this will be the
WSS checkpoint instead
2021-12-13 14:36:06 +01:00
Jacek Sieka dfbd50b4d6
avoid SyncCommitteMsgPool copy (#3185)
introduced by batch verification, when verifiers were made async
2021-12-11 16:39:24 +01:00
Jacek Sieka 069bccd51b
batch-verify sync messages for a small perf boost (#3151)
* batch-verify sync messages for a small perf boost

Generally reuses the same structure as attestation and aggregate
verification

* normalize `signatures` and `signature_batch` to use the same pattern
of verification
* normalize parameter names, order etc for signature stuff in general
* avoid calling `blsSign` directly - instead, go through `signatures`
consistently
2021-12-09 14:56:54 +02:00
Jacek Sieka 1a8b7469e3
move quarantine outside of chaindag (#3124)
* move quarantine outside of chaindag

The quarantine has been part of the ChainDAG for the longest time, but
this design has a few issues:

* the function in which blocks are verified and added to the dag becomes
reentrant and therefore difficult to reason about - we're currently
using a stateful flag to work around it
* quarantined blocks bypass the processing queue leading to a processing
stampede
* the quarantine flow is unsuitable for orphaned attestations - these
should also should be quarantined eventually

Instead of processing the quarantine inside ChainDAG, this PR moves
re-queueing to `block_processor` which already is responsible for
dealing with follow-up work when a block is added to the dag

This sets the stage for keeping attestations in the quarantine as well.

Also:

* make `BlockError` `{.pure.}`
* avoid use of `ValidationResult` in block clearance (that's for gossip)
2021-12-06 10:49:01 +01:00
Jacek Sieka a223d62b07
Cleanups (#3123)
Renames and cleanups split out from the validator monitoring branch, so
as to reduce conflict area vs other PR:s

* add constants for expected message timing
* name validators after the messages they validate, mostly, to make
grepping easier
* unify field naming of EpochInfo across forks to make cross-fork code
easier
2021-11-25 13:20:36 +01:00
Jacek Sieka 9c2f43ed0e
Speed up altair block processing 2x (#3115)
* Speed up altair block processing >2x

Like #3089, this PR drastially speeds up historical REST queries and
other long state replays.

* cache sync committee validator indices
* use ~80mb less memory for validator pubkey mappings
* batch-verify sync aggregate signature (fixes #2985)
* document sync committee hack with head block vs sync message block
* add batch signature verification failure tests

Before:

```
../env.sh nim c -d:release -r ncli_db --db:mainnet_0/db bench --start-slot:-1000
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
    5830.675,        0.000,     5830.675,     5830.675,            1, Initialize DB
       0.481,        1.878,        0.215,       59.167,          981, Load block from database
    8422.566,        0.000,     8422.566,     8422.566,            1, Load state from database
       6.996,        1.678,        0.042,       14.385,          969, Advance slot, non-epoch
      93.217,        8.318,       84.192,      122.209,           32, Advance slot, epoch
      20.513,       23.665,       11.510,      201.561,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

After:

```
    7081.422,        0.000,     7081.422,     7081.422,            1, Initialize DB
       0.553,        2.122,        0.175,       66.692,          981, Load block from database
    5439.446,        0.000,     5439.446,     5439.446,            1, Load state from database
       6.829,        1.575,        0.043,       12.156,          969, Advance slot, non-epoch
      94.716,        2.749,       88.395,      100.026,           32, Advance slot, epoch
      11.636,       23.766,        4.889,      205.250,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

* add comment
2021-11-24 13:43:50 +01:00
Jacek Sieka f19a497eec
ncli_db: add putState, putBlock (#3096)
* ncli_db: add putState, putBlock

These tools allow modifying an existing nimbus database for the purpose
of recovery or reorg, moving the head, tail and genesis to arbitrary
points.

* remove potentially expensive `putState` in `BeaconStateDB`
* introduce `latest_block_root` which computes the root of the latest
applied block from the `latest_block_header` field (instead of passing
it in separately)
* avoid some unnecessary BeaconState copies during init
* discover https://github.com/nim-lang/Nim/issues/19094
* prefer `HashedBeaconState` in a few places to avoid recomputing state
root
* fetch latest block root from state when creating blocks
* harden `get_beacon_proposer_index` against invalid slots and document
* move random spec function tests to `test_spec.nim`
* avoid unnecessary state root computation before block proposal
2021-11-18 13:02:43 +01:00
Jacek Sieka ec650c7fd7
Support starting from altair (#3054)
* Support starting from altair

* hide `finalized-checkpoint-` - they are incomplete and usage may cause
crashes
* remove genesis detection code (broken, obsolete)
* enable starting ChainDAG from altair checkpoints - this is a
prerequisite for checkpoint sync (TODO: backfill)
* tighten checkpoint state conditions
* show error when starting from checkpoint with existing database (not
supported)
* print rest-compatible JSON in ncli/state_sim
* altair/merge support in ncli
* more altair/merge support in ncli_db
* pre-load header to speed up loading
* fix forked block decoding
2021-11-10 13:39:08 +02:00
tersec 95b0ecc5a2
only invalidate {current,previous}_epoch_participation flag cache once (#3063) 2021-11-09 02:44:02 +00:00
Jacek Sieka ea0a191723
Better REST/RPC error messages (#3046)
* Better REST/RPC error messages
* homogenise block logging (root first)
* homegenise message verification pipeline (verify in
`gossip_verification`, act in `eth2_processor`)
* use `subcommitteeIdx` consistently
* log each sent contribution
* fix block_sim
* fix block topic
* don't recalc root on gossip block validation
* move position loop into sync pool
2021-11-05 17:39:47 +02:00
Jacek Sieka a086cf01ac
altair fork handling cleanups (#3050)
* fix stack overflow crash in REST/debug/getStateV2
* introduce `ForkyXxx` for generic type matching of `Xxx` across
branches (SomeHashedBeaconState -> ForkyHashedBeaconState et al) -
`Some` is already used for other types of type classes
* consolidate function naming in BeaconChainDB, use some generics
* import `forks.nim` from other spec modules and move `Forked*` helpers
around to resolve circular imports
* remove `ForkedBeaconState`, use `ForkedHashedBeaconState` throughout
(less data shuffling between the types)
* fix several cases of states being stored on stack in tests, causing
random failures on some platforms
* remove reading json support from ncli - this should be ported to the
rest json reading instead (doesn't currently work because stack sizes)
2021-11-05 08:34:34 +01:00
Jacek Sieka 421bf936ff
odds and ends (#3015)
* `allSyncCommittees` => `allSyncSubcommittees`
* simplify `_snappy` topic generation (avoid pointless string copies)
* simplify gossip id generator (avoid pointless string copies)
* avoid redundant syncnet ENR updates
* simplify topic validation (allow only validated topics)
2021-10-21 15:09:19 +02:00
Jacek Sieka 9cf32c3748 clean up sync subcommittee handling
* `SyncCommitteeIndex` -> `SyncSubcommitteeIndex`
* `syncCommitteePeriod` -> `sync_committee_period` (spec spelling)
* tighten period comparisons
* fix assert when validating committee message with non-altair state in
REST api
2021-10-20 22:59:13 +03:00
Jacek Sieka df3fc9525f
import cleanup (#2997)
* import cleanup

...and remove some unused types

* add random imports

* more imports
2021-10-19 16:09:26 +02:00
Jacek Sieka c40cc6cec1 clean up fork enum and field names
* single naming strategy
* simplify some fork code
* simplify forked block production
2021-10-19 11:06:38 +03:00
Etan Kissling 2bbffbde10
abort compile when fork epoch is forgotten (#2939)
There are a few locations in the code that compare the current epoch to
the various FORK_EPOCH constants and branch off into fork-specific code.
When a new fork is introduced, it is sometimes forgotten to update all
of those branch locations. This patch introduces a compile-time check
that ensures that all branches need to be covered exhaustively. This is
done by replacing if-elif structures with case expressions.
2021-10-04 08:31:21 +00:00
Etan Kissling f8e9b1ff9d
remove privkey from mock withdrawal credentials (#2936)
In tests, the private key was put into the validator deposit's withdraw
credentials so that it can be recovered later. This leads to problems
when creating the validators through other means that do not put the key
there. In general, mock private keys only depend on the validator index,
though, and because it is clear what the index of a validator is, it is
not actually needed to put the key into the credentials.
2021-10-01 13:35:16 +02:00
Etan Kissling b217150f1d
use forked `getAttestationsForBlock` everywhere (#2937)
There are a number of locations in the code that get attestations on a
forked beacon state. For attestation pools test, a convenience wrapper
was available to reduce clutter. This patch integrates that wrapper into
the core component so that it can also take advantage of the wrapper.
2021-10-01 01:29:32 +00:00
Etan Kissling 2e9fa87f8b
use `SyncAggregate.init()` everywhere (#2932)
The initialization of a `SyncAggregate` to its default value is not very
intuitive. There is an `init` function in `sync_committee_msg_pool` that
provides a convenience wrapper. This patch exports that initializer so
that the rest of the code base can also take advantage of it.
2021-09-30 13:56:07 +00:00
tersec 6b3bf7eb7b
merge hardfork database support (#2911)
* merge hardfork database support

* working block_sim

* recreate state transition changes
2021-09-30 01:07:24 +00:00
Etan Kissling e243ba2c0b
revise `makeBeaconBlock` overloads (#2879)
The phase0 and altair overloads of `makeBeaconBlock` slightly differ in
their signatures which makes using them unnecessarily verbose.
- A placeholder `sync_aggregate` argument similar to `executionPayload`
  is added to the phase0 overload to match the altair signature.
- A wrapper operating on `ForkedHashedBeaconState` is introduced.
2021-09-29 12:10:44 +00:00
Etan Kissling ddbbbae3c8
allow adding Altair test blocks (#2872)
testblockutil currently fails to add test blocks to Altair state because
it assumes phase0 blocks in various places. This patch corrects this
limitation by using forked types internally. Note that existing clients
so far solely operate on phase0 blocks and should eventually be updated.
2021-09-17 10:55:04 +00:00
Mamy Ratsimbazafy d1cb5b7220
Parallel attestation verification (#2718)
* Add parallel attestation verification

* Update tests, batchVerify doesn't use the threadpool with only single core (nim-blscurve update)

* bump nim-blscurve

* Debug info for failing eth2 test vectors

* remove submodule eth2-testnets

* verbose debugging of make failure on Windows (libbacktrace?)

* Remove CI debug mode

* initialization convention

* Fix new altair tests
2021-09-17 03:13:52 +03:00
Etan Kissling b6334e0970
update research VCS ignores (#2849)
In edd9826464, VCS ignores were introduced
to ignore the output from building and running the research tools from
their directory. Over time, new research tools got introduced, but the
ignore list did not get updated. Updating now for current set of tools.
2021-09-07 23:09:49 +00:00
Zahary Karadjov 7d1efa443d Restore the sync committee pool pruning and add tests 2021-08-30 11:06:45 +03:00
tersec 2d8a796a93
altair-capable beacon block creation (#2834)
* altair-capable beacon block creation

* update block_sim to use sync committees and the new block production interface
2021-08-29 14:50:21 +00:00
tersec 43a976f89b
proc -> func in ncli/, research/, and test/ (#2818) 2021-08-25 14:51:52 +00:00
Jacek Sieka a7a65bce42
disentangle eth2 types from the ssz library (#2785)
* reorganize ssz dependencies

This PR continues the work in
https://github.com/status-im/nimbus-eth2/pull/2646,
https://github.com/status-im/nimbus-eth2/pull/2779 as well as past
issues with serialization and type, to disentangle SSZ from eth2 and at
the same time simplify imports and exports with a structured approach.

The principal idea here is that when a library wants to introduce SSZ
support, they do so via 3 files:

* `ssz_codecs` which imports and reexports `codecs` - this covers the
basic byte conversions and ensures no overloads get lost
* `xxx_merkleization` imports and exports `merkleization` to specialize
and get access to `hash_tree_root` and friends
* `xxx_ssz_serialization` imports and exports `ssz_serialization` to
specialize ssz for a specific library

Those that need to interact with SSZ always import the `xxx_` versions
of the modules and never `ssz` itself so as to keep imports simple and
safe.

This is similar to how the REST / JSON-RPC serializers are structured in
that someone wanting to serialize spec types to REST-JSON will import
`eth2_rest_serialization` and nothing else.

* split up ssz into a core library that is independendent of eth2 types
* rename `bytes_reader` to `codec` to highlight that it contains coding
and decoding of bytes and native ssz types
* remove tricky List init overload that causes compile issues
* get rid of top-level ssz import
* reenable merkleization tests
* move some "standard" json serializers to spec
* remove `ValidatorIndex` serialization for now
* remove test_ssz_merkleization
* add tests for over/underlong byte sequences
* fix broken seq[byte] test - seq[byte] is not an SSZ type

There are a few things this PR doesn't solve:

* like #2646 this PR is weak on how to handle root and other
dontSerialize fields that "sometimes" should be computed - the same
problem appears in REST / JSON-RPC etc

* Fix a build problem on macOS

* Another way to fix the macOS builds

Co-authored-by: Zahary Karadjov <zahary@gmail.com>
2021-08-18 20:57:58 +02:00
tersec 6e46445da2
switch result = foo to expression return; unexport rest of logtrace symbols (#2788) 2021-08-17 09:51:39 +00:00
Jacek Sieka 7a622e8505
rework spec imports (#2779)
The spec imports are a mess to work with, so this branch cleans them up
a bit to ensure that we avoid generic sandwitches and that importing
stuff generally becomes easier.

* reexport crypto/digest/presets because these are part of the public
symbol set of the rest of the spec types
* don't export `merge` types from `base` - this causes circular deps
* fix circular deps in `ssz/spec_types` - this is the first step in
disentangling ssz from spec
* be explicit about phase0 vs altair - longer term, `altair` will become
the "natural" type set, then merge and so on, so no point in giving
`phase0` special preferential treatment
2021-08-12 13:08:20 +00:00
Jacek Sieka 9697b73e71
forkedbeaconstate_helpers -> forks (#2772)
Simpler module name for stuff that covers forks

* check that runtime config matches database state
* also include some assorted altair cleanups
* use "standard" genesis fork in local testnet to work around missing
runtime config support
2021-08-10 22:46:35 +02:00
Jacek Sieka 3f9c1fdf4e
More RuntimeConfig cleanup (#2716)
* remove from BeaconChainDB (doesn't depend on runtime config)
* eth2-testnets -> eth2-networks
* use `cfg` name throughout
2021-07-13 16:27:10 +02:00
Jacek Sieka 23eea197f6
Implement split preset/config support (#2710)
* Implement split preset/config support

This is the initial bulk refactor to introduce runtime config values in
a number of places, somewhat replacing the existing mechanism of loading
network metadata.

It still needs more work, this is the initial refactor that introduces
runtime configuration in some of the places that need it.

The PR changes the way presets and constants work, to match the spec. In
particular, a "preset" now refers to the compile-time configuration
while a "cfg" or "RuntimeConfig" is the dynamic part.

A single binary can support either mainnet or minimal, but not both.
Support for other presets has been removed completely (can be readded,
in case there's need).

There's a number of outstanding tasks:

* `SECONDS_PER_SLOT` still needs fixing
* loading custom runtime configs needs redoing
* checking constants against YAML file

* yeerongpilly support

`build/nimbus_beacon_node --network=yeerongpilly --discv5:no --log-level=DEBUG`

* load fork epoch from config

* fix fork digest sent in status
* nicer error string for request failures
* fix tools

* one more

* fixup

* fixup

* fixup

* use "standard" network definition folder in local testnet

Files are loaded from their standard locations, including genesis etc,
to conform to the format used in the `eth2-networks` repo.

* fix launch scripts, allow unknown config values

* fix base config of rest test

* cleanups

* bundle mainnet config using common loader
* fix spec links and names
* only include supported preset in binary

* drop yeerongpilly, add altair-devnet-0, support boot_enr.yaml
2021-07-12 15:01:38 +02:00
tersec ae1abf24af
add Altair support to block quarantine/clearance and block_sim (#2662)
* add Altair support to the block quarantine

* switch some spec/datatypes imports to spec/datatypes/base

* add Altair support to block_clearance

* allow runtime configuration of Altair transition slot

* enable Altair in block_sim, including in CI
2021-06-23 14:43:18 +00:00
tersec b1d5609171
remove false OnBlockAdded dependency on phase0 HashedBeaconState (#2661)
* remove false OnBlockAdded dependency on phase.HashedBeaconState

* introduce altair data types into block_clearance; update some alpha.6 spec refs to alpha.7; add get_active_validator_indices_len ForkedHashedBeaconState wrapper

* switch many modules from using datatypes (with phase0 states/blocks) to datatypes/base (fork-independent); update spec refs from alpha.6 to alpha.7 and remove rm'd G2_POINT_AT_INFINITY

* switch more modules from using datatypes (with phase0 states/blocks) to datatypes/base (fork-independent); update spec refs from alpha.6 to alpha.7

* remove unnecessary phase0-only wrapper of get_attesting_indices(); allow signatures_batch to process either fork; remove O(n^2) nested loop in process_inactivity_updates(); add altair support to getAttestationsforTestBlock()

* add Altair versions of asSigVerified(), asTrusted(), and makeBeaconBlock()

* fix spec URL to be Altair for Altair makeBeaconBlock()
2021-06-21 08:35:24 +00:00
tersec 146fa48454
use ForkedHashedBeaconState in StateData (#2634)
* use ForkedHashedBeaconState in StateData

* fix FAR_FUTURE_EPOCH -> slot overflow; almost always use assign()

* avoid stack allocation in maybeUpgradeStateToAltair()

* create and use dispatch functions for check_attester_slashing(), check_proposer_slashing(), and check_voluntary_exit()

* use getStateRoot() instead of various state.data.hbsPhase0.root

* remove withStateVars.hashedState(), which doesn't work as a design anymore

* introduce spec/datatypes/altair into beacon_chain_db

* fix inefficient codegen for getStateField(largeStateField)

* state_transition_slots() doesn't either need/use blocks or runtime presets

* combine process_slots(HBS)/state_transition_slots(HBS) which differ only in last-slot htr optimization

* getStateField(StateData, ...) was replaced by getStateField(ForkedHashedBeaconState, ...)

* fix rollback

* switch some state_transition(), process_slots, makeTestBlocks(), etc to use ForkedHashedBeaconState

* remove state_transition(phase0.HashedBeaconState)

* remove process_slots(phase0.HashedBeaconState)

* remove state_transition_block(phase0.HashedBeaconState)

* remove unused callWithBS(); separate case expression from if statement

* switch back from nested-ref-object construction to (ref Foo)(Bar())
2021-06-11 20:51:46 +03:00
Jacek Sieka abe0d7b4ae singe validator key cache
Instead of keeping a validator key list per EpochRef, this PR introduces
a single shared validator key list in ChainDAG, and cleans up some other
ChainDAG and key-related issues.

The PR does not introduce the validator key list in the state transition
- this is because we batch-check all signatures before entering the spec
code, thus the spec code never hits the cache.

A future refactor should _probably_ remove the threadvar altogether.

There's a few other small fixes in here that make the flow easier to
read:

* fix `var ChainDAGRef` -> `ChainDAGRef`
* fix `var QuarantineRef` -> `QuarantineRef`
* consistent `dag` variable name
* avoid using threadvar pubkey cache in most cases
* better error messages in batch signature checking
2021-06-01 20:43:44 +03:00
tersec 0b0bfd1de0
use StateData in place of BeaconState outside state transition code (#2551)
* use StateData in place of BeaconState outside state transition code

* propagate more StateData usage

* remove withStateVars().state

* wrap get_beacon_committee(BeaconState, ...) as gbc(StateData, ...)

* switch makeAttestation() to use StateData

* use StateData wrapper/dispatcher for get_committee_count_per_slot()

* convert AttestationCache.init(), weak subjectivity functions, and updateValidatorMetrics()

* add get_shuffled_active_validator_indices(StateData) and get_block_root_at_slot(StateData)

* switch makeAttestationData() to StateData

* sync AllTests-mainnet.md after rebase
2021-05-21 09:23:28 +00:00
tersec d8bb91d9a9
partially integrate eth1 merge changes (#2548)
* partially integrate eth1 merge changes

* use hexToSeqByte() and validate execution engine opaque transaction length

* remove incorrect REST serialization code
2021-05-20 10:44:13 +00:00
tersec e0f4d28116
rename initialize_beacon_state to initialize_beacon_state_from_eth1 (#2536) 2021-05-04 12:19:11 +02:00
Dustin Brody 7f42d38219 rename initialize_beacon_state{_from_eth1,}; suppress warnings when doppelganger detection disabled 2021-04-28 00:12:41 +03:00
Jacek Sieka 7dba1b37dd
remove attestation/aggregate queue (#2519)
With the introduction of batching and lazy attestation aggregation, it
no longer makes sense to enqueue attestations between the signature
check and adding them to the attestation pool - this only takes up
valuable CPU without any real benefit.

* add successfully validated attestations to attestion pool directly
* avoid copying participant list around for single-vote attestations,
pass single validator index instead
* release decompressed gossip memory earlier, specially during async
message validation
* use cooked signatures in a few more places to avoid reloads and errors
* remove some Defect-raising versions of signature-loading
* release decompressed data memory before validating message
2021-04-26 22:39:44 +02:00
tersec 99fccaee6e
more abstraction over BeaconState (#2509)
* more abstraction over BeaconState

* use HashedBeaconState copy of htr
2021-04-16 08:49:37 +00:00
Jacek Sieka 4ed2e34a9e Revamp attestation pool
This is a revamp of the attestation pool that cleans up several aspects
of attestation processing as the network grows larger and block space
becomes more precious.

The aim is to better exploit the divide between attestation subnets and
aggregations by keeping the two kinds separate until it's time to either
produce a block or aggregate. This means we're no longer eagerly
combining single-vote attestations, but rather wait until the last
moment, and then try to add singles to all aggregates, including those
coming from the network.

Importantly, the branch improves on poor aggregate quality and poor
attestation packing in cases where block space is running out.

A basic greed scoring mechanism is used to select attestations for
blocks - attestations are added based on how much many new votes they
bring to the table.

* Collect single-vote attestations separately and store these until it's
time to make aggregates
* Create aggregates based on single-vote attestations
* Select _best_ aggregate rather than _first_ aggregate when on
aggregation duty
* Top up all aggregates with singles when it's time make the attestation
cut, thus improving the chances of grabbing the best aggregates out
there
* Improve aggregation test coverage
* Improve bitseq operations
* Simplify aggregate signature creation
* Make attestation cache temporary instead of storing it in attestation
pool - most of the time, blocks are not being produced, no need to keep
the data around
* Remove redundant aggregate storage that was used only for RPC
* Use tables to avoid some linear seeks when looking up attestation data
* Fix long cleanup on large slot jumps
* Avoid some pointers
* Speed up iterating all attestations for a slot (fixes #2490)
2021-04-13 20:24:02 +03:00
tersec 79bb0d5379
only deserialize attestation and aggregation gossiped signatures once (#2472)
* only deserialize attestation and aggregation gossiped signatures once

* re-indent some aggregate checks into block scope

* spelling

* remove debugging assertion

* put part of gossip validation back into block context

* attestation pool test signature loading isn't so unsafe, and exportRaw isn't free

* remove more development doAsserts; don't exportRaw in loops
2021-04-09 14:59:24 +02:00
Jacek Sieka beceb060c4
Write state diffs to separate table (and experimentally, files instead of db) (#2460) 2021-04-06 21:56:45 +03:00
tersec b059cb42c5
increase block proposal speed with many validators (#2423)
* increase block proposal speed with many validators

* document CookedSig rationale
2021-03-17 13:35:59 +00:00
Jacek Sieka 3cb31e66b4
set upper bound on EpochRef cache (#2403)
* set upper bound on EpochRef cache

* max 32 EpochRef instances
* less memory waste in BlockRef by removing EpochRef seq that is mostly
unused (~20mb)
* less memory waste in dag block lookup by not keeping an extra copy of
digest (~70mb)
* fix `==` and `$` for Eth2Digest
* remove `ChainDAG.tmpState` (~50mb?)

all in all, this branch cuts mainnet memory usage by ~160-180mb and puts
limits on EpochRef cache usage - where normally it hovered around 950mb
before, it's now sitting at 600-700mb on my machine.

* docs
2021-03-17 11:17:15 +01:00
Mamy Ratsimbazafy 8e28a05cea
Move pruning out of latency critical path (#2384)
* Deferred DAG and fork choice pruning

* fixup

* Address https://github.com/status-im/nimbus-eth2/pull/2384/files#r589448448, rely only on onSLotEnd for state pruning

* no need to store needPruning in the data structure

* lastPrunePoint is updated in pruning proc

* Split eager and LazyPruning

* enforce pruning in updateHead
2021-03-09 15:36:17 +01:00
Mamy Ratsimbazafy 5d7f9c3a04
Consensus object pools [reorg 4/5] (#2374)
* Add documentation

* make test doesn't try to build the beacon node :/
2021-03-04 10:13:44 +01:00
Mamy Ratsimbazafy 2f17ac7b64
Move SSZ, deposit_contracts & eth1_monitor [reorg files 3/5] (#2371)
* move deposit_contract

* Move SSZ

* fix ssz import in tests

* move also eth1_monitor

* forgot to delete the original

* fix comma [skip ci]

* Fix "make" & tools imports

* Fix import

* Fix import again

* rename deposit_contract -> eth1

* Revert ssz move to subfolder

* path fixes [skip ci]
2021-03-03 07:23:05 +01:00
Mamy Ratsimbazafy 3276dfc683
Consolidate modules by areas [part 1] (#2365)
* Move sync in subfolder

* move validator related thingies in validators

* fix binary builds

* update bounds comment [skip ci]
2021-03-02 11:27:45 +01:00
tersec 5cab17dc1a
database state storage benchmarking via ncli_db (#2312)
* database state storage benchmarking via ncli_db

* more cleanups from immutable validator state branch

* unexport some eth2_network constants and remove unused variables/templates

* make two PeerScore constants public
2021-02-15 17:40:00 +01:00
tersec 8d25663681
remove several IntSet usages in lieu of seq[ValidatorIndex] (#2288)
* remove several IntSet usages in lieu of seq[ValidatorIndex]

* convert smaller types to larger types

* larger type, again
2021-02-08 08:27:30 +01:00
tersec 1bdbf099cc
use IntSet rather than HashSet[ValidatorIndex] (#2267)
* use IntSet rather than HashSet[ValidatorIndex]

* add bounds check before uint64 -> int conversion

* use intsets in block transitions

* remove superfluous Nim issue explanation/reference
2021-01-26 12:52:00 +01:00
Mamy Ratsimbazafy 70a03658e3
Block validation flow v2 + Batch (serial) sig verification (#2250)
* bump nim-blscurve

* Outline the block validation flow

* introduce the SigVerified types, pass the tests

* Split clearance/quarantine to prepare for batch crypto verif

* Add a batch signature collector

* Make clearance use SigVerified block and split verification between crypto and state transition

* Always use signedBeaconBlock for the onBlockAdded callback

* RANDAO signing_root is the epoch instead of the full block

* Support skipping BLS for testing

* Fix compilation of the validator client

* Try to fix strange errors MacOS and Jenkins (Clang, unknown type name br_hmac_drbg_context in stdlib_assertions.nim.c)

* address https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561819858

* address https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561828025

* onBlockAdded callback should use TrustedSignedBeaconBlock https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561837261

* address https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561828946

* Use the application RNG: https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561815336

* Improve codegen of conversion zero-cost)

* Quick fixes with loadWithCache after #2259 (TODO: graceful error since pubkey validations is now done first in signatures_batch)

* Graceful handle rogue pubkeys and signatures now that those are lazy-loaded
2021-01-25 20:45:48 +02:00
Zahary Karadjov 338428cbd7 Add Eth1 deposits simulation to block_sim 2021-01-04 13:22:00 +02:00
Zahary Karadjov b022dc4d1f Use O(n) algorithm in initialize_beacon_state_from_eth1; Avoid unnecessary merkle proofs generation 2020-11-15 21:40:40 +02:00
Zahary Karadjov e9b9cd75ee Rename binaries; Mimic the original repo layout in the distribution 2020-11-09 11:38:52 +02:00
Jacek Sieka 95f5f76180
Datatype cleanup (#1953)
* clear up spec todo

* test fix

* remove unnecessary toSszType

* type

* one more
2020-11-04 21:52:47 +00:00
Jacek Sieka df43b8aa8b
save some more states after all (#1887)
Don't save states when replaying history, but do save states when
applying new blocks (!)
2020-10-18 15:47:39 +00:00
Zahary Karadjov 5f6bdc6709 Store all deposit-derived data in memory 2020-10-15 20:15:51 +03:00
Zahary Karadjov e6320e5881 Address #1584 Don't keep all deposits in memory (persist them to disk) 2020-10-15 20:15:51 +03:00
tersec 513ba72b9a
add v1.0.0-rc.0 support behind compile-time flag (#1852)
* add v1.0.0-rc.0 support behind compile-time flag

* keep runtime presets consistent
2020-10-13 17:21:25 +00:00
tersec f08f44b9a2
in exit pool, bundle receive messages into beaconblocks (#1812)
* in exit pool, filter out already-packaged messages; bundle remaining messages into beaconblocks

* filter messages at block construction time

* allow adding up to intended capacity of buffers, beyond per-block limits

* document rationale/design for filtering mechanism
2020-10-07 16:57:21 +00:00
Zahary Karadjov aed291128a Add support for starting from weak subjectivity checkpoints
Also removes the `genesis.ssz` file stored in the data folder.
The `medalla-fast-sync` target has been adapted to use the new features.
2020-10-07 09:32:03 +03:00
tersec 6d8130dc49
close block_sim database; remove code duplication in exit_pool (#1656) 2020-09-16 09:16:23 +02:00
Jacek Sieka c76305f824
fix some todo (#1645)
* remove some superfluous gcsafes
* remove getTailState (unused)
* don't store old epochrefs in blocks
* document attestation pool a bit
* remove `pcs =` cruft from log
2020-09-14 14:50:03 +00:00
tersec ab255662df
bound block quarantine size (#1564)
* bound block quarantine size

* add additional logging for block quarantining

* re-add quarantine.add() call

* remove pre-finalization blocks; add logging for full quarantine

* clear quarantine on chain reorganization

* update block_sim and tests

* update test_attestation_pool
2020-08-31 11:00:38 +02:00
Jacek Sieka fa1621db46
implement clock disparity for attestation validation (#1568)
This implements disparity, resolving a part of
https://github.com/status-im/nim-beacon-chain/issues/1367

* make BeaconTime a duration for fractional seconds
* factor out attestation/aggregate validation
* simplify recording of queued attestations
* simplify attestation signature check
* fix blocks_received metric
* add some trivial validation tests
* remove unresolved attestation table - attestations for unknown blocks
are dropped instead (cannot verify their signature)
2020-08-27 09:34:12 +02:00
Jacek Sieka 46c94a18ba rework epoch cache referencing
* collect all epochrefs in specific blocks to make them easier to find
and to avoid lots of small seqs
* reuse validator key databases more aggressively by comparing keys
* make state cache available from within `withState`
* make epochRef available from within onBlockAdded callback
* integrate getEpochInfo into block resolution and epoch ref logic such
that epochrefs are created when blocks are added to pool or lazily when
needed by a getEpochRef
* fill state cache better from EpochRef, speeding up replay and
validation
* store epochRef in specific blocks to make them easier to find and
reuse
* fix database corruption when state is saved while replaying quarantine
* replay slots fully from block pool before processing state
* compare bls values more smartly
* store epoch state without block applied in database - it's recommended
to resync the node!

this branch will drastically speed up processing in times of long
non-finality, as well as cut memory usage by 10x during the recent
medalla madness.
2020-08-19 10:09:06 +03:00