* Handle the decodeAuthMessage error case separatly and log to trace
Garbage data on the TCP port (e.g. from port scanners) would
cause lots of error log messages, so log this to trace and get rid
of a (little) bit of exception usage in the process.
* Remove usage of result var in rlpxAccept and rlpxConnect
* Discv4: Add ENRRequest & ENRResponse msgs to avoid fails on these
Fix#499
These messages are not implemented yet however, but just ignored.
* disc: updateExternalIp()
New public proc that can be used to inform the discovery subsystem about
a changed external IP (as reported by UPnP/NAT-PMP in some other module).
why:
Rlp errors throw exceptions which cause the dispatcher loop to
terminate the current session immediately.
details:
The DisconnectionReasonList message requires a single entry list.
Observed and now accepted deviations are:
Geth: single byte number
bor(a Geth fork): blobbed single entry list containing a number
why:
This is a legacy feature and its usage should peter out over time.
details:
Use -d:chunked_rlpx_enabled for enabling chunked RLPx message handling.
why:
For some reason, Nethermind insists on sending chunked messages to
the syncing peer. Unfortunately, for the test networks the Nethermind
modes are the importent ones as they speak eth/65 as well while others
like Geth only support eth/66 which is not implemented here, yet.
- add eventLoop to control all incoming events
- change semantic of write to asynchronously block only when send buffer is full, and not when bytes do not fit into send window
- change handling of receive buffer, to start dropping packets if the reorder buffer and receive buffer are full. Old behaviour was to async block unless there is space which could lead to resource exhaustion attacks
The ENR code used to be solely exception based, and these
exceptions where a left-over of that. They are useless as later
calls use Result anyhow.
Additionally, they cause quite the performance loss because they
are used in the "common path" for the toTypedRecord call, e.g.
when reading the fields of ip6, tcp6 and udp6.
Currently only setting `--styleCheck:hint` as there are some
dependency fixes required and the compiler seems to trip over the
findnode MessageKind, findnode Message field and the findNode
proc. Also over protocol.Protocol usage.
This removes the outdated copy of the SSZ code. It became incorrect
over time (e.g., empty SSZ list elements), and is no longer in use by
GitHub projects: https://github.com/search?q=extension%3Anim+eth%2Fssz
The canonical SSZ implementation resides at `nim-ssz-serialization`.
Compared to `nim-eth`, these changes were made meanwhile:
- `bitseqs` was extended with JSON serialization support,
and with the new functions `isZero` and `countOnes`.
- `bytes_reader` was renamed to `codec`, extended with a few additional
SSZ type conversions as well as support for `SingleMemberUnion`.
- The simplified merkle tree implementation in `merkle_tree.nim`
was removed. It was not used by other projects.
- `merkleization` was extended with support for `HashArray`, `HashList`
and `SingleMemberUnion`. The `isValidProof` functionality has been
moved to `nimbus-eth2` and replaced with the EF defined function
`is_valid_merkle_branch`. The test was also moved to `nimbus-eth2`.
There are no other GitHub projects using `isValidProof`:
https://github.com/search?q=extension%3Anim+isValidProof
Furthermore, a definition for `GeneralizedIndex` was added.
- `ssz_serialization` was moved one directory up, and improved with
bug fixes and `HashArray`, `HashList` and `SingleMemberUnion` support.
- `types` was extended with JSON serialization and new type support for
`Uint128`, `Uint256`, `HashArray`, `HashList` and `SingleMemberUnion`.
There is also a new `getBit` function for `BitList`.
* Add few missing top level raises Defect in uTP
- Add top level {.push raises: [Defect].}
- remove some local raises, including some unneeded
CatchableErrors.
- Don't export messageHandler (avoiding annoying naming collisions)
- export utp_router as those connection callbacks are in the API
* Add some missing copyright clauses
* Some ident and max line length cleanup
* Rename utp_discv5_protocol.nim to be more consistent
* Add separate datastructure to keep track of window
* Asynchronously block write until until new space in snd buffer
* Introduce write loop
* Properly handle write cancellation
* Proper handling of sending fin packet
* Reset remote window after configured amount of time
Closes [nimbus-eth1#767](https://github.com/status-im/nimbus-eth1/issues/767).
Crashes occur when certain invalid RLPx messages are received from a peer.
Specifically, `msgId` out of range. Because any peer can easily trigger this
crash, we'd consider it a DOS vulnerability if Nimbus-eth1 was in general use.
We noticed when syncing to Goerli, there were some rare crashes with this
exception. It turned out one peer with custom code, perhaps malfunctioning,
was sending these messages if we were unlucky enough to connect to it.
`invokeThunk` is called from `dispatchMessages` and checks the range of
`msgId`. It correctly recognise that it's out of range, raises and exception
and produces a message. Job done.
Except the code in `dispatchMessage` treats all that as a warning instead of
error, and continues to process the message. A bit lower down, `msgId` is used
again without a range check.
The trivial fix is to check array bounds before access.
--
ps. Here's the stack trace ("reraised" sections hidden):
```
WRN 2021-11-08 21:29:33.238+00:00 Error while handling RLPx message topics="rlpx" tid=2003472 file=rlpx.nim:607 peer=Node[<IP>:45456] msg=45 err="RLPx message with an invalid id 45 on a connection supporting eth,snap"
/home/jamie/Status/nimbus-eth1/nimbus/p2p/chain/chain_desc.nim(437) main
/home/jamie/Status/nimbus-eth1/nimbus/p2p/chain/chain_desc.nim(430) NimMain
/home/jamie/Status/nimbus-eth1/nimbus/nimbus.nim(258) process
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncloop.nim(279) poll
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
/home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(1218) rlpxAccept
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(101) postHelloSteps
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(74) colonanonymous
/home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(985) postHelloSteps
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(101) dispatchMessages
/home/jamie/Status/nimbus-eth1/vendor/nim-chronos/chronos/asyncmacro2.nim(77) colonanonymous
/home/jamie/Status/nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx.nim(614) dispatchMessages
/home/jamie/Status/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/chcks.nim(23) raiseIndexError2
/home/jamie/Status/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(49) sysFatal
[[reraised from: ... ]]
[[reraised from: ... ]]
[[reraised from: ... ]]
[[reraised from: ... ]]
Error: unhandled exception: index 45 not in 0 .. 40 [IndexError]
```
Signed-off-by: Jamie Lokier <jamie@shareable.org>
Separate the logging when the node is not reachable and
enrAutoUpdate is on or off to avoid confusion whether or not the
node might still become reachable.
* avoid allocation in `hash(ValidIpAddress)`
While casually browsing the profiler output, to my great surprise I
found that an allocating string conversion function (gasp!) in the hash
function for ip addresses - this PR carefully excises this evil
construct from the codebase.
* bump nim version
queryRandom was currently only async for the `enrField` version.
However the basic queryRandom is also exported and thus gets
changed so it can be properly used as async proc.
Also added exports for the modules of which objects are used in
the discovery public API.
toBytes for NodeId wasn't selected by compiler byt if it does
get selected, it will fail on the test cases due to the
countdown that is done in logDistance.
Set to toBytesBE properly now and do countup, that should make
it correct also for BE architecture.
Removed toBytes to avoid confusion and avoid this one being
selected ever. The only place toBytes for NodeId was used is in
sessions.nim makeKey func and there also the stint one
(thus native endianness) was selected in Nim 1.2.x.
Native endianness is fine there as it is only an internal
representation.
* Refactor tests and move socket to separate file
* Move sockets handling to separate class
* Abstract over underlying transport
* Fix bug with receiving duplicated SYN packet
* Fix race condition in connect
* Modify outbuffer
Each element of outbuffer keeps encoded packet ,number
of transmissions of givern packet and information if
given packet needs to be re-send.
* Add initial handling of timeouts
* Add tests for syn re-sends
Currently, `ElemType` can only be called on `List` types when they are
first assigned to a variable called `T`. The template has been adjusted
so that different variable names may be used. The template can now also
be applied to in-line computed types, e.g., `ElemType(typeof(x))`.
There was a cyrillic T in some big-endian specific code that broke the
compilation on such platforms. This replaces that T with an ASCII T to
fix the build.
* Don't fail ENR decoding when value is an RLP list
* Store RLP raw list in the ENR field pair instead
* Add ENR kList FieldKind so lists can be treated differently
Treated differently now when printing out the ENR, mentioning
that it is a raw RLP list
* Add initial handling of acks
Add implemetaion of circular buffer based on reference implementation
Add way to test number of packet in flight
Add acking of initial syn packet
* Allow for tcp/udp ports to always be configured
- Allow for an ENR to be build with tcp and udp ports also when
no IP address is provided
- In the address set-up always provide best efforttcp and udp ports
also when configuration of external ip (and/or ports) fails.
* Add initial impl of utp over udp
* Add more comments
* Add licenses and push declarations
* Add tests to nimble task
* Pr comments
Use better random generator
Raise assert error in case of buffer io exception
* Modify nodes verification
* Move nodes verification to separate module
By moving verification to separate module it can be re-used
in different contexts not only in discoveryv5.
The new sync code wants to start without waiting. We can `discard` the async
result but there is no need for a background task polling and running a timer
for no clear benefit.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
Don't treat 3 characters as special in `cmp`. `cmp` for `ProtocolInfo` was
wrong because it ignored all characters after the first 3.
In the wild we have seen protocol names longer than 3 characters. `snap`,
`hive`, `istanbul`, `bzzeth`, `bzz-stream`, `bzz-retrieve`, `dbix`, `opera`,
`pchain`, `pchain_child_0`, `sero`, `smilobft`, `spock`.
There was never a 3 character limit in the [specification]
(https://github.com/ethereum/devp2p/blob/master/rlpx.md).
It always said "short ASCII name", until recently on 2021-02-25 it was changed
to an 8 characters limit.
Also `pi.nameStr` can be removed. Nothing uses it, and it has the same actual
effect as just copying the string `pi.name`.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
For a long time this caused invalid RLP parsing of `NewBlock` messages in the
`eth` protocol.
The `rlpInline` pragma was accepted but had no effect. We could implemented
it, but it doesn't seem worth doing, with tests etc, as there's only one user
which has been fixed another way.
With `NewBlock`, whenever a peer sent us `NewBlock`, we'd get an RLP decoding
error, and disconnected the peer thinking it was the peer's error.
These messages are sent often by good peers, so whenever we connected to a
really good peer, we'd end up disconnecting within a minute due to this. This
went unnoticed for years, as we stayed connected to old peers which have no new
blocks, and we weren't looking at peer quality, disconnect reasons or real-time
blockchain updates anyway.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
1. Generalises the special cases for serialising RLP `seq[Transaction]`.
Previously it only used the special case inside `BlockBody` and `EthBlock`.
Now it uses it for all `seq[Transaction]` regardless of what objects they
are parts of, or no object at all. `openArray[Transaction]` is also
included, as this was found to be necessary to match in some places.
2. Bug fix parsing `Transaction`: Always read the first byte to get the
transaction type instead of parsing an RLP `int`. This way invalid or
adversarial input gives a correct error (i.e. invalid type code).
When it was read with `rlp.read(int)`, those inputs gave many crazy
messages (e.g. "too large to fit in memory"). In the specification it's a
byte. (Technically the input is not RLP and we shouldn't be using the RLP
parser anyway to parse standalone transaction objects).
3. Bug fix parsing `Transaction`: If a typed transaction is detected in
`seq[Transaction]`, the previous code removed the RLP (blob) wrapper, then
passed the contents to `read(Transaction)`. That meant a blob-wrapped
legacy transaction would be accepted. This is incorrect. The new code
passes the contents to the typed transaction decoder, which correctly
rejects a wrapped legacy transaction as having invalid type.
Change 1 has a large, practical effect on `eth/65` syncing with peers.
Serialisation of `eth` message types `Transactions` and `PooledTransactions`
have been broken since the introduction of typed transactions (EIP-2718), as
used in Berlin/London forks. (The special case for `seq[Transaction]` inside
`BlockBody` only fixed message type `BlockBodies`.)
Due to this, whenever a peer sent us a `Transactions` message, we had an RLP
decoding error processing it, and disconnected the peer thinking it was the
peer's error.
These messages are sent often by good peers, so whenever we connected to a
really good peer, we'd end up disconnecting from it within a few tens of
seconds due to this.
This didn't get noticed before updating to `eth/65`, because with old protocols
we tend to only connect to old peers, which may be out of date themselves and
have no typed transactions. Also, we didn't really investigate occasional
disconnects before, we assumed they're just part of P2P life.
The root cause is the RLP serialisation of individual `Transaction` is meant to
be subtly different from arrays/sequences of `Transaction` objects in network
messages. RFC-2976 covers this but it's quite subtle:
- Individual transactions are encoded and stored as either `RLP([fields..])`
for legacy transactions, or `Type || RLP([fields..])`. Both of these
encodings are byte sequences. The part after `Type` doesn't have to be
RLP in theory, but all types so far use RLP. EIP-2718 covers this.
- In arrays (sequences), transactions are encoded as either `RLP([fields..])`
for legacy transactions, or `RLP(Type || RLP([fields..]))` for all typed
transactions to date. Spot the extra `RLP(..)` blob encoding, to make it
valid RLP inside a larger RLP. EIP-2976 covers this, "Typed Transactions
over Gossip", although it's not very clear about the blob encoding.
In practice the extra `RLP(..)` applies to all arrays/sequences of transactions
that are to be RLP-encoded as a list. In principle, it should be all
aggregates (object fields etc.), but it's enough for us to enable it for all
arrays/sequences, as this is what's used in the protocol and EIP-2976.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
Storing large blobs in a "WITHOUT ROWID" table turns out to be extremely
slow when the tree must be rebalanced.
* Split out keystore capability into separate interface, making each
keystore a separate instance
* Disable "WITHOUT ROWID" optimization by default
* Implement prefix lookup that allows iterating over all values with a
certain prefix in their key
* Fix raw Exceptions in hexary caused by forward declarations
* Fix raw Exceptions in trie/db caused by forward declarations
* And now we can remove those db Proc CatchableError raises
`les_protocol.nim` failed to build, due to very silly Nim bugs
nim-lang/Nim#8792 and nim-lang/Nim#17102.
import
../../[rlp, keys], ../../common/eth_types,
../[rlpx, kademlia, blockchain_utils], ../private/p2p_types,
The silly part is `../` has to be quoted if it's before a group of files, but
not before a single file. Most places in PR #344 / 7624153 use the workaround
`".."/` but it was missed in `les_protocol.nim`:
nimbus-eth1/vendor/nim-eth/eth/p2p/rlpx_protocols/les_protocol.nim(14, 3)
Error: cannot open file: ../../[rlp,keys]
Signed-off-by: Jamie Lokier <jamie@shareable.org>
* Add build_dcli target and add it to CI
* Fix local imports for dcli
* And use local imports for all other files too
* Use local imports in tests and rlpx protocols
Fixesstatus-im/nim-eth#341, status-im/nimbus-eth1#489.
When using discv4 (Kademlia) to find peers, there is a crash after a few
minutes. It occurs for most of us on Eth1 mainnet, and everyone on Ropsten.
The cause is `findNodes` being called twice in succession to the same peer,
within about 5 seconds of each other. ("About" 5 seconds, because Chronos does
not guarantee to run the timeout branch at a particular time, due to queuing
and clock reading delays.)
Then `findNodes` sends a duplicate message to the peer and calls
`waitNeighbours` to listen for the reply. There's already a `waitNeighbours`
callback in a shared table, so that function hits an assert failure.
Ignoring the assert would be wrong as it would break timeout logic, and sending
`FindNodes` twice in rapid succession also makes us a bad peer.
As a simple workaround, just skip `findNodes` in this state and return a fake
empty `Neighbours` reply. This is a bit of a hack as `findNodes` should not be
called like this; there's a logic error at a higher level. But it works.
Tested for about 4 days constant operation on Ropsten. The crash which occured
every few minutes no longer occurs, and discv4 keeps working.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
* Add search for best route and refactor setupNat to setupAddress
* Update setupAddress and make enr ports in discovery optional
* Add specific error log when no route is found
* Use bindIP if it is public
* Adjust some log levels
This query proc is similar to the original (faulty) lookup proc.
But as we don't need to look for specific targets, it can be
used still as it gives a quicker and broader search resulting
in more nodes.
* Add ip limits to routing table and routing table buckets
* Fix order of ip limit check and duplicate check for replacement
* Fix ip limit for node with updated ip in ENR
* Fix bug where address wouldn't update on ENR update
and update some comments
* Reuse some add/remove code in routing table
* Fix seen bug on ENR update in routing table
* Rework addNode to make sure to do address check always
and adjust some logs.
* More documentation on the ip limits in routing table [skip ci]
If the database is locked for reading, as it is when step returns ROW,
writes cannot checkpoint the wal leading to ever-increasing wal sizes
and a long delay at shutdown.
By resetting the transaction early, writes become more independent of
reads, memory is released earlier and wal can be checkpointed.
* use bearssl rng throughout
* seeder can fail
* imports and exports
* modules, sigh
* one more try
* move var
* even fewer thread vars
* remove out-of-date genrated files