feat: order list proposal

2025-01-10 02:36:09 +00:00 · 2023-11-01 08:48:57 +01:00 · 2023-11-01 08:48:57 +01:00 · 069a2ff1bd
commit 069a2ff1bd
parent f0c0d5bb38
1 changed files with 88 additions and 62 deletions
--- a/design/sales.md
+++ b/design/sales.md
@ -32,12 +32,41 @@ to sell and under which conditions.
      amount
      maximum duration
      minimum price
      maximum collateral price
 Availabilities consist of an amount of storage, the maximum duration and minimum
 price to sell it for. They represent storage that is for sale, but not yet sold.
 This is information local to the node that can be altered without affecting
 global state.
 ## Selling strategy
 We ship a basic algorithm for optimizing the selling process. It does not aim to be 
 the best algorithm, as there might be different strategies that users might want to adapt.
 Instead, we chose to provide a basic implementation and, in the future, expose the 
 internals of decision-making to the users through an API. This will give them the option 
 to plug in more robust and custom strategies. The main design goal of our selling strategy 
 is the maximization of utilized capacity with the most profitable requests. 
 This decision leads to several behaviors that we describe below.
 We do not wait for potentially more profitable requests to arrive sometime later on, 
 as defining this waiting period is not trivial. Because while waiting, you might miss 
 out on current opportunities.
 When we have availability for sale, we aim to fill slots right away. This means it 
 is crucial to have the current market state available to choose from at the moment 
 when the availability is added or returned.
 If the size of our currently available storage space does not suffice to fill the most 
 profitable slots, we choose less profitable ones that fit into our free space.
 Availabilities probably won't ever be fully utilized as the probability of finding 
 the right-sized slot is very low. Hence, the algorithm needs to take it into consideration.
 All the previously mentioned behaviors are limited by user-specified constraints, 
 which are passed as parameters when creating availability. Most of this behavior 
 is implemented through an ordered slot list, as described below.
 ## Adding availability
 When a user adds availability, then the reservations module will check whether
@ -135,34 +164,34 @@ the state is kept on local disk by the Repo and the Datastore. How much space is
 reserved to be sold is persisted on disk by the Repo. The availabilities are
 persisted on disk by the Datastore.
-## Slot queue
+## Ordered slot list
-Once a new request for storage is created on chain, all hosts will receive a
+The ordered slot list is a list where slots that are currently seeking a host to 
-contract event announcing the storage request and decide if they want to act on
+fill them are stored. It is capped at a certain capacity and ordered by 
-the request by matching their availabilities with the incoming request. Because
+profitability (which will be described later). The most profitable slots are at
-there will be many requests being announced over time, each host will create a
+the beginning, while the least profitable ones are at the end.
 queue of matching request slots, adding each new storage slot to the queue.
-### Adding slots to the queue
+### Adding slots to the list
-Slots will be added to the queue when request for storage events are received
+Slots will be added to the list when requests for storage events are received
 from the contracts. Additionally, when slots are freed, a contract event will
-also be received, and the slot will be added to the queue. Duplicates are
+also be received, and the slot will be added to the list. Duplicates are
 ignored.
 When all slots of a request are added to the queue, the order should be randomly
-shuffled, as there will be many hosts in the network that could potentially pick
+shuffled. This is because there will be many hosts in the network that could 
-up the request and will process the first slot in the queue at the same time.
+potentially pick up the request and process the first slot in the queue
-This should avoid some clashes in slot indices chosen by competing hosts.
+simultaneously. Randomly shuffling the order will help avoid clashes in slot
 indices chosen by competing hosts.
-Before slots can be added to the queue, availabilities must be checked to ensure
+If the list were to exceed its capacity with the new slot, the tail would be 
-a matching availability exists. This filtering prevents all slots in the network
+removed, but only if the tail's profitability is lower than that of the new slot.
-from entering the queue.
+Otherwise, the new slot is discarded.
-### Removing slots from the queue
+### Removing slots from the list
 Hosts will also receive contract events for when any contract is started,
-failed, or cancelled. In all of these cases, slots in the queue pertaining to
+failed, or cancelled. In all of these cases, slots in the list pertaining to
 these requests should be removed as they are no longer fillable by the host.
 Note: expired request slots will be checked when a request is processed and its
 state is validated.
@ -179,62 +208,59 @@ Slots in the queue should be sorted in the following order:
  involve bandwidth incentives, profit can be estimated as `duration * reward`
  for now.
-Note: datset size may eventually be included in the profit algorithm and may not
+Note: dataset size may eventually be included in the profit algorithm and may not
 need to be included on its own in the future. Additionally, data dispersal may
-also impact the datset size to be downloaded by the host, and consequently the
+also impact the dataset size to be downloaded by the host, and consequently the
 profitability of servicing a storage request, which will need to be considered
 in the future once profitability can be calculated.
-### Queue processing
+## Slot list processing
-Queue processing will be started only once, when the sales module starts and
+Slot list processing is triggered by three cases:
 will process slots continuously, in order, until the queue is empty. If the
 queue is empty, processing of the queue will resume once items have been added
 to the queue. If the queue is not empty, but there are no availabilities, queue
 processing will resume once availabilites have been added.
-As soon as items are available in the queue, and there are workers available for
+1. When the node is starting.
-processing, an item is popped from the queue and processed.
+2. When there is a change to the availabilities set, either a new one is added 
 or the capacity is changed.
 3. When new slots are added to the slot list.
-When a slot is processed, it is first checked to ensure there is a matching
+Processing works using a pool of workers, which are there to control the speed at
-availability, as these availabilities will have changed over time. Then, the
+which the slots are filled. There are limitations on the number of slots that can 
-sales process will begin. The start of the sales process should ensure that the
+be filled simultaneously due to bandwidth constraints, etc. Each worker fills only
-slot being processed is indeed available (slot state is "free") before
+one slot at a time. The number of workers should be configurable.
 continuing. If it is not available, the sales process will exit and the host
 will continue to process the top slot in the queue. The start of the sales
 process should also check to ensure the host is allowed to fill the slot, due to
 the [sliding window
 mechanism](https://github.com/codex-storage/codex-research/blob/master/design/marketplace.md#dispersal).
 If the host is not allowed to fill the slot, the sales process will exit and the
 host will process the top slot in the queue.
-#### Queue workers
+When processing is triggered, a worker starts iterating through the slot list from 
-Each time an item in the queue is processed, it is assigned to a workers. The
+the beginning (e.g., the most profitable slots) and matches them against the node's
-number of allowed workers can be specified during queue creation. Specifying a
+availabilities. If there is a match, it will mark that given slot as reserved 
-limited number of workers allows the number of concurrent items being processed
+(to prevent other workers from double-processing it) and start the state machine. 
-to be capped to prevent too many slots from being processed at once.
+Once the state machine reaches the Filled state, the worker is returned to the 
 worker pool along with the successful result. If the previous result was successful,
 this process repeats until the previous result is "failure", which occurs when there
 is no match for any of the slots. This way, the process finishes as there is a 
 limited number of availabilities and their capacities.
-During queue processing, only when there is a free worker will an item be popped
+### Asynchronicity
-from the queue and processed. Each time an item is popped and processed, a
+The implementer should keep in mind that there are problems regarding the asynchronicity
-worker is removed from the available workers. If there are no available workers,
+of triggering the processing and accessing/modifying the slot list.
 queue processing will resume once there are workers available.
-#### Adding availabilities
+First problem is related to the fact that the processing can be triggered from 
-When a host adds an availability, a signal is triggered in the slot queue with
+multiple points, and as the processing might be quite time-consuming when filling
-information about the availability. This triggers a lookup of past request for
+slots, it might happen that multiple processing processes could be running at the
-storage events, capped at a certain number of past events or blocks. The slots
+same time, which should not be the case. Realistically, the processing will be 
-of the requests in each of these events are added to the queue, where slots
+mostly called from the "new slots added to list" point.
-without matching availabilities are filtered out (see [Adding slots
+There is a possibility of using locks, but that might lead to growing the async 
-to the queue](#adding-slots-to-the-queue) above). Additionally, when slots of
+dispatcher queue and potential issues connected with that.
-these requests are processed in the queue, they will be checked to ensure that
+Another option could be to have a function called `scheduleProcessing()` which would
-the slots are not filled (see [Queue processing](#queue-processing) above).
+behave in a similar fashion as a singleton, allowing only one running processing at a time. 
 However, it should allow scheduling of one more processing run if there is currently 
 processing running. This is because the processing might be iterating in the middle
 of the ordered list and would not take in consideration the changes that were
 introduced in the beginning of the list.
-### Implementation tips
+The second problem is related to mutations of the slot list while processing it, 
-
+where the list could be changing under the "worker's hand" as the changes come from
-Request queue implementations should keep in mind that requests will likely need
+blockchain events. A potentially sufficient mitigation for this could be to keep the 
-to be accessed randomly (by key, eg request id) and by index (for sorting), so
+iteration over the slot list completely synchronous. Perform all the asynchronous 
-implemented structures should handle these types of operations in as little time
+data fetching before the list iteration and avoid yielding in the loop.
 as possible.
 ## Repo