From 54b8ea3b66046537e3404c4e038cdb28d7ad474b Mon Sep 17 00:00:00 2001 From: Eric <5089238+emizzle@users.noreply.github.com> Date: Fri, 26 Apr 2024 18:53:28 +1000 Subject: [PATCH] Initial commit of proposals #1 and #2 --- design/slot-reservations.md | 271 ++++++++++++++++++++++++++++++++++++ 1 file changed, 271 insertions(+) create mode 100644 design/slot-reservations.md diff --git a/design/slot-reservations.md b/design/slot-reservations.md new file mode 100644 index 0000000..f6c4685 --- /dev/null +++ b/design/slot-reservations.md @@ -0,0 +1,271 @@ +# Preventing node and network overload during slot filling and slot repair + +When a new storage request is created, slots in the request can be filled by the +first storage provider (SP) to downloaded the slot data, generate a storage +proof, then supply the proof and collateral to the onchain contract. This is +inherently a race and all SPs except the one who "won" will have wasted funds +downloading data that they ulimately did not need, which may lead eventually +lead to higher costs of storage. Additionally, clients will have to serve data +requests for all the racing SPs downloading data for all the slots in the +request. This not only could cause issues with nodes failing to handle the +request load, but also creates unneccessary congestion on the network. + +## Proposed solution #1: Fill reward, Pay2Play, and sliding window rate of expansion + +One proposed solution to this problem involves using [expanding window +mechanism](https://github.com/status-im/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal) +as the sole means of throttling node request overload and network congestion. +The expanding window will have a randomised rate of expansion factor to prevent +waiting and collusion. Additionally, a fill reward incentivises fast fills and +disincentivises waiting. Pay2Play is a collateral that is put up by the client +to ensure they don't withhold data from the network. + +### Fill reward + +A fill reward will be issued to the SP that fills the slot. The reward will +offset the cost of the SPs collateral by a little bit. The fill reward must be +much less than the collateral. + +This reward will decrease exponentially over time, so that it is inversely +proportional to the expanding window. This means that while the field of +eligible addresses is small, the fill reward will be high. Over time, the field +of eligible addresses will increase exponentially as the fill reward decreases +exponentially. This incentivises SPs to fill the slot as fast as possible, with +the lucky few SPs that closest to the source point of the expanding window +getting a bigger reward. + +### Sliding window randomised rate of expansion factor + +The sliding window mechanism starts off with a random source address, defined as +$hash(nonce || slot number)$ and a +distance, $d = 0$, defined as $xor(a, b)$. Over time, $d$ increases and $2^d$ addresses will be eligible +to participate. At time $t_0$, $d == 0$, so only 1 address (the source address) +is eligible. At time $t_1$, the distance increases to 1, and 2 addresses will be +included. At time $t_2$ and kademlia distance of 2, there are 4 addresses, etc, +until eventually the entire address space of SPs participating in the network +are eligible to fill a slot. + +The client can set the rate of sliding window expansion by setting the +[dispersal +parameter](https://github.com/status-im/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal), $dp$ +when the storage request is created. Therefore the allowed +distance, $d_a$, for eligible SPs can be defined as: + +$d_a = t_e * dp$, + +where $t_e$ is the +elapsed time. + +This presents an issue where if an SP could pre-calculate how long it would take +for themselves or other SPs to be eligible to fill a slot. This is relevant in a +case where an SP gets "lucky" and is close to the source of the expanding +window. They also know that other big data provider addresses are not close to +the source, meaning there is some before other providers will be eligible to +fill the slot. In that case, they could potentially wait an amount of time +before filling the slot, in the hopes that a better opportunity arises. The goal +should always be to fill slots as fast as possible, so any "waiting" behavior +should be discouraged. + +A waiting SP is already disincentivsed by the exponentially decreasing fill +reward, however this may not be a large enough reward to create a change in +behavior. To fully dismantle this attack, the SP's pre-calculation can be +completely prevented by introduce a random factor into the rate calculation that +changes for each time/distance increase. The random factor, $r$, will be a factor +between 0.5 and 1.5, as an example. This means that the expanding window rate +set by the client in the storage request will be randomly increased (up to 150%) +or decreased (50%). The source of randomness for this could be the block hash, +so the value could change on each block. Therefore the allowed +distance for eligible SPs can then be defined as + +$d_a = t_e * dp * r$ + +## Proposed solution #2: slot reservations aka "bloem method" + +A proposed solution to this problem is to limit the number of SPs racing to +download the data to fill the slot, by allowing SPs to first reserve a slot +before any data needs to be downloaded. Reserving a slot requires some +collateral, so there is an initial race for SPs who can can deposit collateral +first to secure a reservation, then a second race amongst the SPs with a +reservation to fill the slot (with collateral and the generated proof). However, +not all SPs in the network can reserve a slot initially: the [expanding window +mechanism](https://github.com/status-im/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal) +dictates which SPs are eligible to reserve the slot. As time progresses for an +unreserved slot (or a slot with less than $R$ reservations), more SPs will be +allowed to reserve the slot, until eventually any SP in the network can reserve +the slot. This ensures fair participation opportunity across SPs in the network. + +### Reservation collateral and reward + +Reservation collateral is paid to reserve a slot. This will decreases over time to +incentivise participation in aging slots. + +Reservation reward is paid to the SP who reserves and eventually fills the slot. This +increases over time, to incentivise participation in aging slots. + +[TODO: INSERT GRAPH OF RESERVATION COLLATERAL AND REWARD OVER TIME] + +### Reservations expiry + +Reservations can expiry after some time to prevent opportunity loss for other +SPs willing to participate. + +### Reservation retries + +After expire, an SP can retry a slot reservation, but if it was the last SP to +reserve the slot, it can only retry once. In other words, SPs can reserve the same slot +only twice in a row. + +### Reservations per slot + +Each slot is allowed to have three reservations, which effectively limits the +quantity of racing to three SPs. + +### Expanding windows per slot + +Each slot will have three reservations per slot, and each reservation will have +its own expanding window (these windows will have a unique starting point). The +purpose of this is to distribute the reservation potentials + +### Attacks + +Name | Attack description +:-------------|:-------------------------------------------------------------------- +Clever SP | SP drops reservation when a better opportunity presents itself +Lazy SP | SP reserves a slot, but doesn't fill it +Censoring SP | acts like a lazy SP for specific CIDs that it tries to censor +Hooligan host | acts like a lazy SP for many request to damage to the network +Greedy SP | SP tries to fill multiple slots in a request +Lazy client | client doesn't release content on the network + +#### Clever SP attack + +In this attack, an SP could reserve a slot, then if a better opporunity comes +along, forfeit the reservation collateral by reserving and filling another slot, +with the idea that the reward earned in the new opportunity would make the +reservation collateral loss from the original slot worthwhile. + +This attack is mitigated by allowing for multiple reservations per slot. All SPs +that have secured a reservation (capped at three) will race to fill the slot. +Thus, if one or more SPs that have reserved the slot decide to pursue other +opportunities, the other SPs that have reserved the slot will still be able to +fill the slot. + +In addition, the expanding window mechanism allows for more slots +to participate (reserve/fill) as time progresses, so there will be a larger pool +of SPs that could potentially fill the slot. + +The increasing reservation reward over time also incentivises late comers to +reserve slots, so if there are initially SPs that reserve slots but fail to fill +the slot due to better opprotunities elsewhere, other SPs will be incentivised +to participate. + +After some time, the slot reservation of the attcking SP will expire, and other +SPs wills will be allowed to reserve and fill the slot. As time will have passed +in this scenario, increased reservation rewards and decreased collateral +requiremnts will incentivise other SPs to participate. + +#### Lazy SP attack + +The "lazy SP attack" is when an SP reserves a slot, but does not fill it. The +vector is very similar to the "clever SP attack". The slot reservations +mechanism mititgates this attack in the same ways, please see the "Clever SP +attack" section above. + +#### Censoring SP attack + +A "censoring SP attack" is performed by an SP that wants to disrupt storage of +particular CIDs by reserving a slot and then not filling it. + +Mitigation of this attack is exactly the same as the "lazy SP attack". + +#### Hooligan SP attack + +In this attack, an SP would attempt to disrupt the network by reserving and +failing to fill random slots in the network + +#### Greedy SP attack + +A "greedy SP attack" is when one SP tries to fill more than M slots (and up to K +slots) of a request in an attempt to control whether or not the contract +fails. In the case of M slots controlled, the attacker could cause the contract +to fail and the client would get only funds not already spent on proof provision +back. All SPs in the contract would forfeit their collateral in this case, +however, so this attack does have a significant cost associated. + +In the case of K slots, the SPs could withhold data from the network, and if no +other SPs or caching nodes hold this data, could prevent retrieval and repair of +the data. + +This particular attack is difficult to mitigate because there is a sybil +component to it where an entity could control many nodes in the network but all +those nodes could collude on the attack. + +At this time, slot reservations does not mitigate against this attack, nor does +it incentivise behaviour that would prevent it. + +#### Lazy client attack + +This attack happens when a client creates a request for storage, but ultimately +does not release the data to the network when it requested. SPs may reserve the slot, +with collateral, and yet would never be able to fill the slot as they cannot +download the data. The result of this attack is that any SPs who reserve the +slot may lose their collateral. + +At this time, slot reservations does not mitigate against this attack, nor does +it incentivise behaviour that would prevent it. + +### Open questions + +#### Reservation collateral goes to 0 at time of contract expiry? + +#### Fill race to start at the same time? + +When SPs reserve a slot, the first three SPs can race to fill the slot. However, +if there is no point in time at which all three SPs can start the race, then it +would allow any SP that secures a reservation to immediately start downloading +and filling the slot. + +Perso note: no, want to fill slots as fast as possible. + +### Trade offs + +The main advantage to this design is that nodes and the network would not be overloaded at +the outset of slots being availble for SP participation. + +The downside of this proposal is that an SP would have to participate in two +races: one for reserving the slot and another for filling the slot once +reserved, which brings additional complexities in the smart contract. +Additionally, there are additional complexities introduced with the reservation +collateral and reward "dutch aucitons" that change over time. It remain unclear +if the additional complexity in the smart contracts for benefits that may not be +substationally greater than having the sliding mechanism window on its own. + +In addition, there are two attack vectors, the "greedy SP attack" and the "lazy +client attack" that are not well covered in the slot reservation design. There +could be even more complexities added to the design to accomodate these two +attacks (see the other proposed solution for the mitigation of these attacks). + +// TODO: add slot start point source of randomness (block hash) into +slot start point in the disperal/sliding window design + +// THOUGHTS: Perhaps the expanding window mechanism should be network-aware such +that there are always a minimum of 2 SPs in a window at a given time, to +encourage competition. If not, an SP could watch the network, and given a new +opportunity to fill a slot, understand that it could wait some time before +reserving the slot knowning that it is the only SP in the allowed address space +in the expanding window for some time. + +The slot reservation mechanism should be able to mitigate client and SP attacks +on the network. + + + + +Allowing slots to be claimed by a storage provider (SP) offering both collateral and proof of +storage creates a race condition where the fastest SP to provide these "wins" +the slot. . In other words, a host would need to download the data before it's +allowed to fill a slot. + +This can be problematic. Because hosts are racing to fill a slot, by all downloading the data, they could overwhelm the capacity of the client that is offering the data to the network. Bandwidth incentives and [dispersal](https://github.com/status-im/codex-research/blob/ad41558900ff8be91811aa5de355148d8d78404f/design/marketplace.md#dispersal) could alleviate this, but this is far from certain. + +This is why it's useful to investigate an alternative: allow a host to claim a slot before it starts downloading the data. A host can claim a slot by providing collateral. This changes the dynamics quite a bit; hosts are now racing to put down collateral, and it's not immediately clear what should happen when a contract fails. The incentives need to be carefully changed to avoid unwanted behavior. \ No newline at end of file