As part of the discussions surrounding EIP-7594 (peerdas), it was
highlighted that during sampling and/or data requests, the sampler does
not have timing information for when a samplee will have data available.
It is desireable to not introduce a deadline, since this artificially
introduces latency for the typical scenario where data becomes available
earlier than an agreed-upon deadline.
Similarly, when a client issues a request for blocks, it does often not
know what rate limiting policy of the serving end and must either
pessimistically rate limit itself or run the risk of getting
disconnected for spamming the server - outcomes which lead to
unnecessarily slow syncing as well as testnet mess with peer scoring and
disconnection issues.
This PR solves both problems by:
* removing the time-to-first-byte and response timeouts allowing
requesters to optimistically queue requests - the timeouts have
historically not been implemented fully in clients to this date
* introducing a hard limit in the number of concurrent requests that a
client may issue, per protocol
* introducing a recommendation for rate limiting that allows optimal
bandwidth usage without protocol changes or additional messaging
roundtrips
On the server side, an "open" request does not consume significant
resources while it's resting, meaning that allowing the server to manage
resource allocation by slowing down data serving is safe, as long as
concurrency is adequately limited.
On the client side, clients must be prepared to handle slow servers
already and they can simply apply their existing strategy both to
uncertainty and rate-limiting scenarios (how long before timeout, what
to do in "slow peer" scenarios).
Token / leaky buckets are a classic option for rate limiting with
desireable properties both for the case when we're sending requests to
many clients concurrently (getting good burst performance) and when the
requestees are busy (by keeping long-term resource usage in check and
fairly serving clients)
https://github.com/ethereum/consensus-specs/pull/3360 effectively
extends the valid lifetime of an attestation/aggregate to 2 epochs -
this means that an aggregate that was published at the beginning of a
slot now is valid per the gossip rules up to 2 epochs later.
Then net effect of the above change is that peers are allowed to
republish old aggregates and attestations and libp2p will not stop the
spread with the settings we recommend - instead the messages will have
to be stopped with the "attestation cover rule" or similar, even though
they have been observed already.
Significant amounts of this kind of spam have been observed on the
aggregate channel in particular leading to a 5x increase in aggregate
traffic as some clients republish these old messages in spite of the
"attestation cover rule" which should have stopped them - this simple
change will provide an additional layer of protection against such bugs.
A couple gossip validation rules are only specced out for single
un-aggregated attestations, but are also checked by implementations
for aggregates. This adds a copy of the missing gossip validation rules
to the aggregated attestation docs.
Update gossip validation rules to use the highest `finalized_checkpoint`
across _all_ branches (`store.finalized_checkpoint`), instead of the one
on the currently selected branch (`state.finalized_checkpoint`) when
deciding whether to ignore a block / blob because they are already
finalized.