* Add metrics related to devp2p peer connections
* Avoid reconnecting to peers that just failed connection
- Add SeenTable to avoid reconnecting to peers immediately after
a failed connect. Depending on the failure, the amount of time is
different. This is similar to what is done in nimbus-eth2.
- Attempt to rework rlpxConnect at the same time, in order to
make sure that errors are properly handled. The current structure
is far from ideal, but it is hopefully a small step in the right
direction. To many oddities in there right now to really rework
rlpxConnect properply.
* Fix rlpx thunk fuzzer
The old code, which did it in sequence, made it take a very long
time to accumulate peers in the peer pool, particularly because
occasionally a connection-attempt would hang for a couple of minutes
before timing out.
* persist ENRs (parsed) from the discovery5 protocol
* filter fork_digest from eth2 data
* Update eth/p2p/discoveryv5/dcli.nim
Co-authored-by: Kim De Mey <kim.demey@gmail.com>
* apply reviewer's suggestions
Co-authored-by: Kim De Mey <kim.demey@gmail.com>
- Fix two possible RangeErrors, due to negative seq allocation
in decodeAuthMessageEIP8 and in decodeAckMessageEIP8
- Set the minimum auth message size to AuthMessageEIP8Length,
in case there are clients that no longer add padding
- Add tests for invalid length cases
RLP Enum deserialization would currently not check if "hole values"
were attempted to be converted to the enum type. Now use
checkedEnumAssign and fail with RlpTypeMismatch on invalid values.
There is at least one occurance of an enum with holes in rlpx p2p:
DisconnectionReason. For this enum the issue could occur.
Also:
- Added enum RLP tests and rlpx p2p disconnect message tests to
test the DisconnectionReason with enum hole value.
- Fixed worse custom DisconnectionReason decoding occurance
in rlpx in waitSingleMsg proc where this issue could occur.
- Unify the debug log and info log into one. Removing the
redundant information
- Log the custom ENR fields more pretty
- Make the JSON format logging more pretty for several types
Nim devel brach(1.7.1) introduce gc=orc as default mode.
Because the p2p protocol using unsafe pointer operations
for it's ProtocolInfo and using global variables scattered
around, the orc mistakenly(or maybe correctly) crash the protocol.
Seems there are two occasions possible where we try to ping the
own local node which causes an assert
- One via the bounding
- One via a received ping, which is strange in the first place but
notice in a stack trace
* Move NetworkId type to common eth_types
NetworkId is after all a common type and this way it avoid an
application that requires it to also import all devp2p related
network types.
RLP related calls are moved to eth_types_rlp, this means that p2p
protocols that send NetworkId over the wire need to import this
too. Or just common in general.
* Remove # in front of multiline comment end bracket
These seem to be interpreted wrongly by the GitHub code browser.
This is specifically useful for when the failure is caused by an
"Operation not permitted", as then we can see for which specific
port(s) a firewall might be blocking outgoing traffic.
`eth_types` is being imported from many projects and ends up causing
long build times due to its extensive import lists - this PR starts
cleaning some of that up by moving the chain DB and RLP to their own
modules.
this PR also moves `keccakHash` to its own module and uses it in many
places.
* Fix stability issues
why:
Handling malformed messages typically raises `RangeError` exceptions
when de-serialising RLP, or decoding message data. This is an
(incomplete) attempt to weed out some out it driven by real live
tests.
remark:
Employing the new `snap` protocol there might be different views on what
the messages really contain (currently specs are more a hint.)
* Update RLP exception handling
* Undo effect-less patch
why:
problem occurred somewhere above the try/catch handler
* Using `checkedEnumAssign()` for RLP enum
* Provide dedicated `DisconnectionReason` enum type RLP reader
why:
Without this reader, the program communicating via RLPX will crash when
receiving out of bound reason codes disconnect message.
Out of bound value assignments to an enum causes a `RangeError`defect
and consequently the program to terminate. This `RangeError` is avoided
here and a `MalformedRlpError` catchable error raised.
* Using default exception type in bespoke `read(DisconnectionReason)`
why:
This should not differ from the default enum parser. The particular
message is different and more targeted, here.
Note: The default RLP parser was not used because `
`array[1,DisconnectionReason]` is currently not properly handled and
should give a siliar error message as a `DisconnectionReason` error.
* De-clutter, custom read() was not needed
Co-authored-by: jordan <jordan@curd.mjh-it.com>
* Handle the decodeAuthMessage error case separatly and log to trace
Garbage data on the TCP port (e.g. from port scanners) would
cause lots of error log messages, so log this to trace and get rid
of a (little) bit of exception usage in the process.
* Remove usage of result var in rlpxAccept and rlpxConnect
* Discv4: Add ENRRequest & ENRResponse msgs to avoid fails on these
Fix#499
These messages are not implemented yet however, but just ignored.
* disc: updateExternalIp()
New public proc that can be used to inform the discovery subsystem about
a changed external IP (as reported by UPnP/NAT-PMP in some other module).
why:
Rlp errors throw exceptions which cause the dispatcher loop to
terminate the current session immediately.
details:
The DisconnectionReasonList message requires a single entry list.
Observed and now accepted deviations are:
Geth: single byte number
bor(a Geth fork): blobbed single entry list containing a number
why:
This is a legacy feature and its usage should peter out over time.
details:
Use -d:chunked_rlpx_enabled for enabling chunked RLPx message handling.
why:
For some reason, Nethermind insists on sending chunked messages to
the syncing peer. Unfortunately, for the test networks the Nethermind
modes are the importent ones as they speak eth/65 as well while others
like Geth only support eth/66 which is not implemented here, yet.
The ENR code used to be solely exception based, and these
exceptions where a left-over of that. They are useless as later
calls use Result anyhow.
Additionally, they cause quite the performance loss because they
are used in the "common path" for the toTypedRecord call, e.g.
when reading the fields of ip6, tcp6 and udp6.
Currently only setting `--styleCheck:hint` as there are some
dependency fixes required and the compiler seems to trip over the
findnode MessageKind, findnode Message field and the findNode
proc. Also over protocol.Protocol usage.
Closes [nimbus-eth1#767](https://github.com/status-im/nimbus-eth1/issues/767).
Crashes occur when certain invalid RLPx messages are received from a peer.
Specifically, `msgId` out of range. Because any peer can easily trigger this
crash, we'd consider it a DOS vulnerability if Nimbus-eth1 was in general use.
We noticed when syncing to Goerli, there were some rare crashes with this
exception. It turned out one peer with custom code, perhaps malfunctioning,
was sending these messages if we were unlucky enough to connect to it.
`invokeThunk` is called from `dispatchMessages` and checks the range of
`msgId`. It correctly recognise that it's out of range, raises and exception
and produces a message. Job done.
Except the code in `dispatchMessage` treats all that as a warning instead of
error, and continues to process the message. A bit lower down, `msgId` is used
again without a range check.
The trivial fix is to check array bounds before access.
--
ps. Here's the stack trace ("reraised" sections hidden):
```
WRN 2021-11-08 21:29:33.238+00:00 Error while handling RLPx message topics="rlpx" tid=2003472 file=rlpx.nim:607 peer=Node[<IP>:45456] msg=45 err="RLPx message with an invalid id 45 on a connection supporting eth,snap"
/home/jamie/Status/nimbus-eth1/nimbus/p2p/chain/chain_desc.nim(437) main
/home/jamie/Status/nimbus-eth1/nimbus/p2p/chain/chain_desc.nim(430) NimMain
/home/jamie/Status/nimbus-eth1/nimbus/nimbus.nim(258) process
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncloop.nim(279) poll
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
/home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(1218) rlpxAccept
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(101) postHelloSteps
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
/home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(985) postHelloSteps
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(101) dispatchMessages
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(77) colonanonymous
/home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(614) dispatchMessages
/home/jamie/Status/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/chcks.nim(23) raiseIndexError2
/home/jamie/Status/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(49) sysFatal
[[reraised from: ... ]]
[[reraised from: ... ]]
[[reraised from: ... ]]
[[reraised from: ... ]]
Error: unhandled exception: index 45 not in 0 .. 40 [IndexError]
```
Signed-off-by: Jamie Lokier <jamie@shareable.org>
Separate the logging when the node is not reachable and
enrAutoUpdate is on or off to avoid confusion whether or not the
node might still become reachable.
queryRandom was currently only async for the `enrField` version.
However the basic queryRandom is also exported and thus gets
changed so it can be properly used as async proc.
Also added exports for the modules of which objects are used in
the discovery public API.
toBytes for NodeId wasn't selected by compiler byt if it does
get selected, it will fail on the test cases due to the
countdown that is done in logDistance.
Set to toBytesBE properly now and do countup, that should make
it correct also for BE architecture.
Removed toBytes to avoid confusion and avoid this one being
selected ever. The only place toBytes for NodeId was used is in
sessions.nim makeKey func and there also the stint one
(thus native endianness) was selected in Nim 1.2.x.
Native endianness is fine there as it is only an internal
representation.
* Don't fail ENR decoding when value is an RLP list
* Store RLP raw list in the ENR field pair instead
* Add ENR kList FieldKind so lists can be treated differently
Treated differently now when printing out the ENR, mentioning
that it is a raw RLP list
* Allow for tcp/udp ports to always be configured
- Allow for an ENR to be build with tcp and udp ports also when
no IP address is provided
- In the address set-up always provide best efforttcp and udp ports
also when configuration of external ip (and/or ports) fails.
* Modify nodes verification
* Move nodes verification to separate module
By moving verification to separate module it can be re-used
in different contexts not only in discoveryv5.
Don't treat 3 characters as special in `cmp`. `cmp` for `ProtocolInfo` was
wrong because it ignored all characters after the first 3.
In the wild we have seen protocol names longer than 3 characters. `snap`,
`hive`, `istanbul`, `bzzeth`, `bzz-stream`, `bzz-retrieve`, `dbix`, `opera`,
`pchain`, `pchain_child_0`, `sero`, `smilobft`, `spock`.
There was never a 3 character limit in the [specification]
(https://github.com/ethereum/devp2p/blob/master/rlpx.md).
It always said "short ASCII name", until recently on 2021-02-25 it was changed
to an 8 characters limit.
Also `pi.nameStr` can be removed. Nothing uses it, and it has the same actual
effect as just copying the string `pi.name`.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
For a long time this caused invalid RLP parsing of `NewBlock` messages in the
`eth` protocol.
The `rlpInline` pragma was accepted but had no effect. We could implemented
it, but it doesn't seem worth doing, with tests etc, as there's only one user
which has been fixed another way.
With `NewBlock`, whenever a peer sent us `NewBlock`, we'd get an RLP decoding
error, and disconnected the peer thinking it was the peer's error.
These messages are sent often by good peers, so whenever we connected to a
really good peer, we'd end up disconnecting within a minute due to this. This
went unnoticed for years, as we stayed connected to old peers which have no new
blocks, and we weren't looking at peer quality, disconnect reasons or real-time
blockchain updates anyway.
Signed-off-by: Jamie Lokier <jamie@shareable.org>