EIP 2124: Fork identifier for chain compatibility checks (#2124)

* EIP 2124: Zero RTT netsplit on chain mismatch

* Assign number to EIP 2124

* Address review comments on EIP 2124.

* Rename EIP 2124 to better reflect content

* Fix cardinality

* Fix email address in 2124

* Update fork identifier EIP based on review discussions.

* Fixup EIP name for the fork identifier.

* Remove stale mention of ENR from forkid EIP.

* Add RLP encoding test cases for forkid EIP.

* Add Felix too to the forkid EIP author list.

* Get rid of mentioning the MTU stuffin the forkid EIP.
This commit is contained in:
Péter Szilágyi 2019-08-08 17:52:35 +03:00 committed by Alex Beregszaszi
parent 30fb1b70d5
commit a7c13f469e

294
EIPS/eip-2124.md Normal file
View File

@ -0,0 +1,294 @@
---
eip: 2124
title: Fork identifier for chain compatibility checks
author: Péter Szilágyi <peterke@gmail.com>, Felix Lange <fjl@ethereum.org>
discussions-to: https://github.com/ethereum/EIPs/issues/2125
status: Draft
type: Standards Track
category: Networking
created: 2019-05-03
---
## Simple Summary
Currently nodes in the Ethereum network try to find each other by establishing random connections to remote machines "looking" like an Ethereum node (public networks, private networks, test networks, etc), hoping that they found a useful peer (same genesis, same forks). This wastes time and resources, especially for smaller networks.
To avoid this overhead, Ethereum needs a mechanism that can precisely identify whether a node will be useful, as early as possible. Such a mechanism requires a way to summarize chain configurations, as well as a way to disseminate said summaries in the network.
This proposal focuses only on the definition of said summary - a generally useful *fork identifier* - and it's validation rules, allowing it to be embedded into arbitrary network protocols (e.g. [discovery ENRs](https://eips.ethereum.org/EIPS/eip-778) or `eth/6x` handshakes).
## Abstract
There are many public and private Ethereum networks, but the discovery protocol doesn't differentiate between them. The only way to check if a peer is good or bad (same chain or not), is to establish a TCP/IP connection, wrap it with RLPx cryptography, then execute an `eth` handshake. This is an extreme cost to bear if it turns out that the remote peer is on a different network and it's not even precise enough to differentiate Ethereum and Ethereum Classic. This cost is magnified for small networks, where a lot more trial and errors are needed to find good nodes.
Even if the peer **is** on the same chain, during non-controversial consensus upgrades, not everybody updates their nodes in time (developer nodes, leftovers, etc). These stale nodes put a meaningless burden on the peer-to-peer network, since they just latch on to good nodes, but don't accept upgraded blocks. This causes valuable peer slots and bandwidth to be lost until the stale nodes finally update. This is a serious issue for test networks, where leftovers can linger for months.
This EIP proposes a new identity scheme to both precisely and concisely summarize the chain's current status (genesis and all applied forks). The conciseness is particularly important to make the identity useful across datagram protocols too. The EIP solves a number of issues:
* If two nodes are on different networks, they should never even consider connecting.
* If a hard fork passes, upgraded nodes should reject non-upgraded ones, but **NOT** before.
* If two chains share the same genesis, but not forks (ETH / ETC), they should reject each other.
This EIP does not attempt to solve the clean separation of 3-way-forks! If at the same future block number, the network splits into three (non-fork, fork-A and fork-B), separating the forkers from each another will need case-by-case special handling. Not handling this keeps the proposal pragmatic, simple and also avoids making it too easy to fork off mainnet.
To keep the scope limited, this EIP only defines the identity scheme and validation rules. The same scheme and algorithm can be embedded into various networking protocols, allowing both the `eth/6x` handshake to be more precise (Ethereum vs. Ethereum Classic); as well as the discovery to be more useful (reject surely peers without ever connecting).
## Motivation
Peer-to-peer networking is messy and hard due to firewalls and network address translation (NAT). Generally only a small fraction of nodes have publicly routed addresses and P2P networks rely mainly on these for forwarding data for everyone else. The best way to maximize the utility of the public nodes is to ensure their resources aren't wasted on tasks that are worthless to the network.
By aggressively cutting off incompatible nodes from each other we can extract a lot more value from the public nodes, making the entire P2P network much more robust and reliable. Supporting this network partitioning at a discovery layer can further enhance performance as we avoid the costly crypto and latency/bandwidth hit associated with establishing a stream connection in the first place.
## Specification
Each node maintains the following values:
- **`FORK_HASH`**: IEEE CRC32 checksum (`[4]byte`) of the genesis hash and fork blocks numbers that already passed.
- The fork block numbers are fed into the CRC32 checksum in ascending order.
- If multiple forks are applied at the same block, the block number is checksummed only once.
- Block numbers are regarded as `uint64` integers, encoded in big endian format when checksumming.
- If a chain is configured to start with a non-Frontier ruleset already in its genesis, that is NOT considered a fork.
- **`FORK_NEXT`**: Block number (`uint64`) of the next upcoming fork, or `0` if no next fork is known.
E.g. `FORK_HASH` for mainnet would be:
- forkhash₀ = `0xfc64ec04` (Genesis) = `CRC32(<genesis-hash>)`
- forkhash₁ = `0x97c2c34c` (Homestead) = `CRC32(<genesis-hash> || uint64(1150000))`
- forkhash₂ = `0x91d1f948` (DAO fork) = `CRC32(<genesis-hash> || uint64(1150000) || uint64(1920000))`
The *fork identifier* is defined as `RLP([FORK_HASH, FORK_NEXT])`. This `forkid` is cross validated (**NOT** naively compared) to assess a remote chain's compatibility. Irrespective of fork state, both parties must come to the same conclusion to avoid indefinite reconnect attempts from one side.
#### Validation rules
1) If local and remote `FORK_HASH` matches, connect.
- The two nodes are in the same fork state currently. They might know of differing future forks, but that's not relevant until the fork triggers (might be postponed, nodes might be updated to match).
2) If the remote `FORK_HASH` is a subset of the local past forks and the remote `FORK_NEXT` matches with the locally following fork block number, connect.
- Remote node is currently syncing. It might eventually diverge from us, but at this current point in time we don't have enough information.
3) If the remote `FORK_HASH` is a superset of the local past forks and can be completed with locally known future forks, connect.
- Local node is currently syncing. It might eventually diverge from the remote, but at this current point in time we don't have enough information.
4) Reject in all other cases.
#### Stale software examples
The examples below try to exhaust the fork combination possibilities that arise when nodes do not run matching software versions, but otherwise follow the same chain (mainnet nodes, testnet nodes, etc).
| Past forks | Future forks | Remote `FORK_HASH` | Remote `FORK_NEXT` | Connect | Reason |
|:---:|:---:|:---:|:---:|:---:|:---:|
| A | | A | | Yes (1) | Same forks, same sync state. |
| A | | A | B | Yes (1) | Remote is advertising a future fork, but that is uncertain. |
| A | B | A | | Yes (1) | Local knows about a future fork, but that is uncertain. |
| A | B | A | B | Yes (1) | Both know about a future fork, but that is uncertain. |
| A | B1 | A | B2 | Yes (1) | Both know about differing future forks, but those are uncertain. |
| [A,B] | | A | B | Yes (2) | Remote out of sync. |
| [A,B,C] | | A | B | Yes¹ (2) | Remote out of sync. Remote will need a software update, but we don't know it yet. |
| A | B | A ⊕ B | | Yes (3) | Local out of sync. |
| A | B,C | A ⊕ B | | Yes (3) | Local out of sync. Local also knows about a future fork, but that is uncertain yet. |
| A | | A ⊕ B | | No (4) | Local needs software update. |
| A | B | A ⊕ B ⊕ C | | No² (4) | Local needs software update. |
| [A,B] | | A | | No (4) | Remote needs software update. |
*Note, there's one asymmetry in the table, marked with ¹ and ². Since we don't have access to a remote node's future fork list (just the next one), we can't detect that it's software is stale until it syncs up. This is acceptable as 1) the remote node will disconnect from us anyway, and 2) this is a temporary fluke during sync, not permanent with a leftover node.*
#### Mismatching chain examples (local perspective)
TODO: Give some examples as to what happens if the nodes follow different forks.
## Rationale
##### Why flatten `FORK_HASH` into 4 bytes? Why not share the entire genesis and fork list?
Whilst the `eth` devp2p protocol permits arbitrarily much data to be transmitted, the discovery protocol's total space allowance for all ENR entries is 300 bytes.
Reducing the `FORK_HASH` into a 4 bytes checksum ensures that we leave ample room in the ENR for future extensions; and 4 bytes is more than enough for arbitrarily many Ethereum networks from a (practical) collision perspective.
##### Why use IEEE CRC32 as the checksum instead of Keccak256?
We need a mechanism that can flatten arbitrary data into 4 bytes, without ignoring any of the input. Any other checksum or hashing algorithm would work, but since nodes can lie at any time, there's no value in cryptographic hash functions.
Instead of just taking the first 4 bytes of a Keccak256 hash (seems odd) or XOR-ing all the 4-byte groups (messy), CRC32 is a better alternative, as this is exactly what it was designed for. IEEE CRC32 is also used by ethernet, gzip, zip, png, etc, so every programming language support should not be a problem.
##### We're not using `FORK_NEXT` for much, can't we get rid of it somehow?
We need to be able to differentiate whether a remote node is out of sync or whether its software is stale. Sharing only the past forks cannot tell us if the node is legitimately behind or stuck.
##### Why advertise only one next fork, instead of "hashing" all known future ones like the `FORK_HASH`?
Opposed to past forks that have already passed (for us locally) and can be considered immutable, we don't know anything about future ones. Maybe we're out of sync or maybe the fork didn't pass yet. If it didn't pass yet, it might be postponed, so enforcing it would split the network apart. It could also happen that we're not yet aware of all future forks (haven't updated our software in a while).
## Backwards Compatibility
This EIP only defines an identity scheme, it does not define functional changes.
## Test Cases
Here's a full suite of tests for all possible fork IDs that Mainnet, Ropsten, Rinkeby and Görli can advertise given the Petersburg fork cap (time of writing).
```go
type testcase struct {
head uint64
want ID
}
tests := []struct {
config *params.ChainConfig
genesis common.Hash
cases []testcase
}{
// Mainnet test cases
{
params.MainnetChainConfig,
params.MainnetGenesisHash,
[]testcase{
{0, ID{Hash: 0xfc64ec04, Next: 1150000}}, // Unsynced
{1149999, ID{Hash: 0xfc64ec04, Next: 1150000}}, // Last Frontier block
{1150000, ID{Hash: 0x97c2c34c, Next: 1920000}}, // First Homestead block
{1919999, ID{Hash: 0x97c2c34c, Next: 1920000}}, // Last Homestead block
{1920000, ID{Hash: 0x91d1f948, Next: 2463000}}, // First DAO block
{2462999, ID{Hash: 0x91d1f948, Next: 2463000}}, // Last DAO block
{2463000, ID{Hash: 0x7a64da13, Next: 2675000}}, // First Tangerine block
{2674999, ID{Hash: 0x7a64da13, Next: 2675000}}, // Last Tangerine block
{2675000, ID{Hash: 0x3edd5b10, Next: 4370000}}, // First Spurious block
{4369999, ID{Hash: 0x3edd5b10, Next: 4370000}}, // Last Spurious block
{4370000, ID{Hash: 0xa00bc324, Next: 7280000}}, // First Byzantium block
{7279999, ID{Hash: 0xa00bc324, Next: 7280000}}, // Last Byzantium block
{7280000, ID{Hash: 0x668db0af, Next: 0}}, // First and last Constantinople, first Petersburg block
{7987396, ID{Hash: 0x668db0af, Next: 0}}, // Today Petersburg block
},
},
// Ropsten test cases
{
params.TestnetChainConfig,
params.TestnetGenesisHash,
[]testcase{
{0, ID{Hash: 0x30c7ddbc, Next: 10}}, // Unsynced, last Frontier, Homestead and first Tangerine block
{9, ID{Hash: 0x30c7ddbc, Next: 10}}, // Last Tangerine block
{10, ID{Hash: 0x63760190, Next: 1700000}}, // First Spurious block
{1699999, ID{Hash: 0x63760190, Next: 1700000}}, // Last Spurious block
{1700000, ID{Hash: 0x3ea159c7, Next: 4230000}}, // First Byzantium block
{4229999, ID{Hash: 0x3ea159c7, Next: 4230000}}, // Last Byzantium block
{4230000, ID{Hash: 0x97b544f3, Next: 4939394}}, // First Constantinople block
{4939393, ID{Hash: 0x97b544f3, Next: 4939394}}, // Last Constantinople block
{4939394, ID{Hash: 0xd6e2149b, Next: 0}}, // First Petersburg block
{5822692, ID{Hash: 0xd6e2149b, Next: 0}}, // Today Petersburg block
},
},
// Rinkeby test cases
{
params.RinkebyChainConfig,
params.RinkebyGenesisHash,
[]testcase{
{0, ID{Hash: 0x3b8e0691, Next: 1}}, // Unsynced, last Frontier block
{1, ID{Hash: 0x60949295, Next: 2}}, // First and last Homestead block
{2, ID{Hash: 0x8bde40dd, Next: 3}}, // First and last Tangerine block
{3, ID{Hash: 0xcb3a64bb, Next: 1035301}}, // First Spurious block
{1035300, ID{Hash: 0xcb3a64bb, Next: 1035301}}, // Last Spurious block
{1035301, ID{Hash: 0x8d748b57, Next: 3660663}}, // First Byzantium block
{3660662, ID{Hash: 0x8d748b57, Next: 3660663}}, // Last Byzantium block
{3660663, ID{Hash: 0xe49cab14, Next: 4321234}}, // First Constantinople block
{4321233, ID{Hash: 0xe49cab14, Next: 4321234}}, // Last Constantinople block
{4321234, ID{Hash: 0xafec6b27, Next: 0}}, // First Petersburg block
{4586649, ID{Hash: 0xafec6b27, Next: 0}}, // Today Petersburg block
},
},
// Goerli test cases
{
params.GoerliChainConfig,
params.GoerliGenesisHash,
[]testcase{
{0, ID{Hash: 0xa3f5ab08, Next: 0}}, // Unsynced, last Frontier, Homestead, Tangerine, Spurious, Byzantium, Constantinople and first Petersburg block
{795329, ID{Hash: 0xa3f5ab08, Next: 0}}, // Today Petersburg block
},
},
}
```
Here's a suite of tests of the different states a Mainnet node might be in and the different remote fork identifiers it might be required to validate and decide to accept or reject:
```go
tests := []struct {
head uint64
id ID
err error
}{
// Local is mainnet Petersburg, remote announces the same. No future fork is announced.
{7987396, ID{Hash: 0x668db0af, Next: 0}, nil},
// Local is mainnet Petersburg, remote announces the same. Remote also announces a next fork
// at block 0xffffffff, but that is uncertain.
{7987396, ID{Hash: 0x668db0af, Next: math.MaxUint64}, nil},
// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, but it's not yet aware of Petersburg (e.g. non updated node before the fork).
// In this case we don't know if Petersburg passed yet or not.
{7279999, ID{Hash: 0xa00bc324, Next: 0}, nil},
// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, and it's also aware of Petersburg (e.g. updated node before the fork). We
// don't know if Petersburg passed yet (will pass) or not.
{7279999, ID{Hash: 0xa00bc324, Next: 7280000}, nil},
// Local is mainnet currently in Byzantium only (so it's aware of Petersburg), remote announces
// also Byzantium, and it's also aware of some random fork (e.g. misconfigured Petersburg). As
// neither forks passed at neither nodes, they may mismatch, but we still connect for now.
{7279999, ID{Hash: 0xa00bc324, Next: math.MaxUint64}, nil},
// Local is mainnet Petersburg, remote announces Byzantium + knowledge about Petersburg. Remote
// is simply out of sync, accept.
{7987396, ID{Hash: 0x668db0af, Next: 7280000}, nil},
// Local is mainnet Petersburg, remote announces Spurious + knowledge about Byzantium. Remote
// is definitely out of sync. It may or may not need the Petersburg update, we don't know yet.
{7987396, ID{Hash: 0x3edd5b10, Next: 4370000}, nil},
// Local is mainnet Byzantium, remote announces Petersburg. Local is out of sync, accept.
{7279999, ID{Hash: 0x668db0af, Next: 0}, nil},
// Local is mainnet Spurious, remote announces Byzantium, but is not aware of Petersburg. Local
// out of sync. Local also knows about a future fork, but that is uncertain yet.
{4369999, ID{Hash: 0xa00bc324, Next: 0}, nil},
// Local is mainnet Petersburg. remote announces Byzantium but is not aware of further forks.
// Remote needs software update.
{7987396, ID{Hash: 0xa00bc324, Next: 0}, ErrRemoteStale},
// Local is mainnet Petersburg, and isn't aware of more forks. Remote announces Petersburg +
// 0xffffffff. Local needs software update, reject.
{7987396, ID{Hash: 0x5cddc0e1, Next: 0}, ErrLocalIncompatibleOrStale},
// Local is mainnet Byzantium, and is aware of Petersburg. Remote announces Petersburg +
// 0xffffffff. Local needs software update, reject.
{7279999, ID{Hash: 0x5cddc0e1, Next: 0}, ErrLocalIncompatibleOrStale},
// Local is mainnet Petersburg, remote is Rinkeby Petersburg.
{7987396, ID{Hash: 0xafec6b27, Next: 0}, ErrLocalIncompatibleOrStale},
}
```
Here's a couple of tests to verify the proper RLP encoding (since `FORK_HASH` is a 4 byte binary but `FORK_NEXT` is an 8 byte quantity):
```go
tests := []struct {
id ID
want []byte
}{
{
ID{Hash: checksumToBytes(0), Next: 0},
common.Hex2Bytes("c6840000000080"),
},
{
ID{Hash: checksumToBytes(0xdeadbeef), Next: 0xBADDCAFE},
common.Hex2Bytes("ca84deadbeef84baddcafe"),
},
{
ID{Hash: checksumToBytes(math.MaxUint32), Next: math.MaxUint64},
common.Hex2Bytes("ce84ffffffff88ffffffffffffffff"),
},
}
```
## Implementation
https://github.com/ethereum/go-ethereum/pull/19738
## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).