Commit Graph

10 Commits

Author SHA1 Message Date
Eric 570a1f7b67
[marketplace] Availability improvements (#535)
## Problem
When Availabilities are created, the amount of bytes in the Availability are reserved in the repo, so those bytes on disk cannot be written to otherwise. When a request for storage is received by a node, if a previously created Availability is matched, an attempt will be made to fill a slot in the request (more accurately, the request's slots are added to the SlotQueue, and eventually those slots will be processed). During download, bytes that were reserved for the Availability were released (as they were written to disk). To prevent more bytes from being released than were reserved in the Availability, the Availability was marked as used during the download, so that no other requests would match the Availability, and therefore no new downloads (and byte releases) would begin. The unfortunate downside to this, is that the number of Availabilities a node has determines the download concurrency capacity. If, for example, a node creates a single Availability that covers all available disk space the operator is willing to use, that single Availability would mean that only one download could occur at a time, meaning the node could potentially miss out on storage opportunities.

## Solution
To alleviate the concurrency issue, each time a slot is processed, a Reservation is created, which takes size (aka reserved bytes) away from the Availability and stores them in the Reservation object. This can be done as many times as needed as long as there are enough bytes remaining in the Availability. Therefore, concurrent downloads are no longer limited by the number of Availabilities. Instead, they would more likely be limited to the SlotQueue's `maxWorkers`.

From a database design perspective, an Availability has zero or more Reservations.

Reservations are persisted in the RepoStore's metadata, along with Availabilities. The metadata store key path for Reservations is ` meta / sales / reservations / <availabilityId> / <reservationId>`, while Availabilities are stored one level up, eg `meta / sales / reservations / <availabilityId> `, allowing all Reservations for an Availability to be queried (this is not currently needed, but may be useful when work to restore Availability size is implemented, more on this later).

### Lifecycle
When a reservation is created, its size is deducted from the Availability, and when a reservation is deleted, any remaining size (bytes not written to disk) is returned to the Availability. If the request finishes, is cancelled (expired), or an error occurs, the Reservation is deleted (and any undownloaded bytes returned to the Availability). In addition, when the Sales module starts, any Reservations that are not actively being used in a filled slot, are deleted.

Having a Reservation persisted until after a storage request is completed, will allow for the originally set Availability size to be reclaimed once a request contract has been completed. This is a feature that is yet to be implemented, however the work in this PR is a step in the direction towards enabling this.

### Unknowns
Reservation size is determined by the `StorageAsk.slotSize`. If during download, more bytes than `slotSize` are attempted to be downloaded than this, then the Reservation update will fail, and the state machine will move to a `SaleErrored` state, deleting the Reservation. This will likely prevent the slot from being filled.

### Notes
Based on #514
2023-09-29 14:33:08 +10:00
Eric 1d161d383e
Slot queue (#455)
## Slot queue
Adds a slot queue, as per the [slot queue design](https://github.com/codex-storage/codex-research/blob/master/design/sales.md#slot-queue).

Any time storage is requested, all slots from that request are immediately added to the queue. Finished, Canclled, Failed requests remove all slots with that request id from the queue. SlotFreed events add a new slot to the queue and SlotFilled events remove the slot from the queue. This allows popping of a slot each time one is processed, making things much simpler.

When an entire request of slots is added to the queue, the slot indices are shuffled randomly to hopefully prevent nodes that pick up the same storage requested event from clashing on the first processed slot index. This allowed removal of assigning a random slot index in the SalePreparing state and it also ensured that all SalesAgents will have a slot index assigned to them at the start thus the removal of the optional slotIndex.

Remove slotId from SlotFreed event as it was not being used. RequestId and slotIndex were added to the SlotFreed event earlier and those are now being used

The slot queue invariant that prioritises queue items added to the queue relies on a scoring mechanism to sort them based on the [sort order in the design document](https://github.com/codex-storage/codex-research/blob/master/design/sales.md#sort-order).

When a storage request is handled by the sales module, a slot index was randomly assigned and then the slot was filled. Now, a random slot index is only assigned when adding an entire request to the slot queue. Additionally, the slot is checked that its state is `SlotState.Free` before continuing with the download process.

SlotQueue should always ensure the underlying AsyncHeapQueue has one less than the maximum items, ensuring the SlotQueue can always have space to add an additional item regardless if it’s full or not.

Constructing `SlotQueue.workers` in `SlotQueue.new` calls `newAsyncQueue` which causes side effects, so the construction call had to be moved to `SlotQueue.start`.

Prevent loading request from contract (network request) if there is an existing item in queue for that request.

Check availability before adding request to queue.

Add ability to query market contract for past events. When new availabilities are added, the `onReservationAdded` callback is triggered in which past `StorageRequested` events are queried, and those slots are added to the queue (filtered by availability on `push` and filtered by state in `SalePreparing`).

#### Request Workers
Limit the concurrent requests being processed in the queue by using a limited pool of workers (default = 3). Workers are in a data structure of type `AsyncQueue[SlotQueueWorker]`. This allows us to await a `popFirst` for available workers inside of the main SlotQueue event loop

Add an `onCleanUp` that stops the agents and removes them from the sales module agent list. `onCleanUp` is called from sales end states (eg ignored, cancelled, finished, failed, errored).

Add a `doneProcessing` future to `SlotQueueWorker` to be completed in the `OnProcessSlot` callback. Each `doneProcessing` future created is cancelled and awaited in `SlotQueue.stop` (thanks to `TrackableFuturees`), which forced `stop` to become async.
  - Cancel dispatched workers and the `onProcessSlot` callbacks, prevents zombie callbacks

#### Add TrackableFutures
Allow tracking of futures in a module so they can be cancelled at a later time. Useful for asyncSpawned futures, but works for any future.

### Sales module
The sales module needed to subscribe to request events to ensure that the request queue was managed correctly on each event. In the process of doing this, the sales agents were updated to avoid subscribing to events in each agent, and instead dispatch received events from the sales module to all created sales agents. This would prevent memory leaks on having too many eventemitters subscribed to.
  - prevent removal of agents from sales module while stopping, otherwise the agents seq len is modified while iterating

An additional sales agent state was added, `SalePreparing`, that handles all state machine setup, such as retrieving the request and subscribing to events that were previously in the `SaleDownloading` state.

Once agents have parked in an end state (eg ignored, cancelled, finished, failed, errored), they were not getting cleaned up and the sales module was keeping a handle on their reference. An `onCleanUp` callback was created to be called after the state machine enters an end state, which could prevent a memory leak if the number of requests coming in is high.

Move the SalesAgent callback raises pragmas from the Sales module to the proc definition in SalesAgent. This avoids having to catch `Exception`.
  - remove unneeded error handling as pragmas were moved

Move sales.subscriptions from an object containing named subscriptions to a `seq[Subscription]` directly on the sales object.

Sales tests: shut down repo after sales stop, to fix SIGABRT in CI

### Add async Promise API
  - modelled after JavaScript Promise API
  - alternative to `asyncSpawn` that allows handling of async calls in a synchronous context (including access to the synchronous closure) with less additional procs to be declared
  - Write less code, catch errors that would otherwise defect in asyncspawn, and execute a callback after completion
  - Add cancellation callbacks to utils/then, ensuring cancellations are handled properly

## Dependencies
- bump codex-contracts-eth to support slot queue (https://github.com/codex-storage/codex-contracts-eth/pull/61)
- bump nim-ethers to 0.5.0
- Bump nim-json-rpc submodule to 0bf2bcb

---------

Co-authored-by: Jaremy Creechley <creechley@gmail.com>
2023-07-25 12:50:30 +10:00
markspanbroek d56eb6aee1
Validator (#387)
* [contracts] Add SlotFreed event

* [integration] allow test node to be stopped twice

* [cli] add --validator option

* [contracts] remove dead code

* [contracts] instantiate OnChainMarket and OnChainClock only once

* [contracts] add Validation

* [sales] remove duplicate import

* [market] add missing import

* [market] subscribe to all SlotFilled events

* [market] add freeSlot()

* [sales] fix warnings

* [market] subscribe to SlotFreed events

* [contracts] fix warning

* [validator] keep track of filled slots

* [validation] remove slots that have ended

* [proving] absorb Proofs into Market

Both Proofs and Market are abstractions around
the Marketplace contract, having them separately
is more trouble than it's worth at the moment.

* [market] add markProofAsMissing()

* [clock] speed up waiting for clock in tests

* [validator] mark proofs as missing

* [timer] fix error on node shutdown

* [cli] handle --persistence and --validator separately

* [market] allow retrieval of proof timeout value

* [validator] do not subscribe to SlotFreed events

Freed slots are already handled in
removeSlotsThatHaveEnded(), and onSlotsFreed()
interfered with its iterator.

* [validator] Start validation at the start of a new period

To decrease the likelihood that we hit the validation timeout.

* [validator] do not mark proofs as missing after timeout

* [market] check whether proof can be marked as missing

* [validator] simplify validation

Simulate a transaction to mark proof as missing, instead
of trying to keep track of all the conditions that may
lead to a proof being marked as missing.

* [build] use nim-ethers PR #40

Uses "pending" blocktag instead of "latest" blocktag
for better simulation of transactions before sending
them.

https://github.com/status-im/nim-ethers/pull/40

* [integration] integration test for validator

* [validator] monitor a maximum number of slots

Adds cli parameter --validator-max-slots.

* [market] fix missing collateral argument

After rebasing, add the new argument to fillSlot calls.

* [build] update to nim-ethers 0.2.5

* [validator] use Set instead of Table to keep track of slots

* [validator] add logging

* [validator] add test for slot failure

* [market] use "pending" blocktag to use more up to date block time

* [contracts] remove unused import

* [validator] fix: wait until after period ends

The smart contract checks that 'end < block.timestamp',
so we need to wait until the block timestamp is greater
than the period end.
2023-04-19 15:06:00 +02:00
Adam Uhlíř 131d003a0c
feat: collateral per slot (#390)
Co-authored-by: Eric Mastro <github@egonat.me>
2023-04-14 11:04:17 +02:00
markspanbroek 4ffe7b8e06
Generate proofs when required (#383)
* [maintenance] speedup integration test

* [rest api] add proofProbability parameter to storage requests

* [integration] negotiation test ends when contract starts

* [integration] reusable 2 node setup for tests

* [integration] introduce CodexClient for tests

* [node] submit storage proofs when required

* [contracts] Add Slot type

* [proving] replace onProofRequired & submitProof with onProve

Removes duplication between Sales.onProve() and
Proving.onProofRequired()
2023-03-27 15:47:25 +02:00
Ben Bierens da79660f8e
Enable stylecheck (#353)
* applying styleCheck

* stuck on vendor folder

* Applies style check

* Turns styleCheck back off

* switches to stylecheck:usages

* Fixes empty template casing

* rolls up nim-blscurve, nim-datastore, nim-ethers, nim-leopard, and nim-taskpools.

* bumps nim-confutils and removes unused import from fileutils.nim

* Unused using in fileutils.nim is required by CI

* Reverts bump of nim-confutils module
2023-03-10 08:02:54 +01:00
markspanbroek 4175689745
Load purchase state from chain (#283)
* [purchasing] Simplify test

* [utils] Move StorageRequest.example up one level

* [purchasing] Load purchases from market

* [purchasing] load purchase states

* Implement myRequest() and getState() methods for OnChainMarket

* [proofs] Fix intermittently failing tests

Ensures that examples of proofs in tests are never of length 0;
these are considered invalid proofs by the smart contract logic.

* [contracts] Fix failing test

With the new solidity contracts update, a contract can only
be paid out after it started.

* [market] Add method to get request end time

* [purchasing] wait until purchase is finished

Purchase.wait() would previously wait until purchase
was started, now we wait until it is finished.

* [purchasing] Handle 'finished' and 'failed' states

* [marketplace] move to failed state once request fails

- Add support for subscribing to request failure events.
- Add supporting contract tests for subscribing to request failure events.
- Allow the PurchaseStarted state to move to PurchaseFailure once a request failure event is emitted
- Add supporting tests for moving from PurchaseStarted to PurchaseFailure
- Add state transition tests for PurchaseUnknown.

* [marketplace] Fix test with longer sleepAsync

* [integration] Add function to restart a codex node

* [purchasing] Set client address before requesting storage

To prevent the purchase id (which equals the request id)
from changing once it's been submitted.

* [contracts] Fix: OnChainMarket.getState()

Had the wrong method signature before

* [purchasing] Load purchases on node start

* [purchasing] Rename state 'PurchaseError' to 'PurchaseErrored'

Allows for an exception type called 'PurchaseError'

* [purchasing] Load purchases in background

No longer calls market.getRequest() for every purchase
on node start.

* [contracts] Add `$` for RequestId, SlotId and Nonce

To aid with debugging

* [purchasing] Add Purchasing.stop()

To ensure that all contract interactions have both a
start() and a stop() for

* [tests] Remove sleepAsync where possible

Use `eventually` loop instead, to make sure that we're
not waiting unnecessarily.

* [integration] Fix: handle non-json response in test

* [purchasing] Add purchase state to json

* [integration] Ensure that purchase is submitted before restart

Fixes test failure on slower CI

* [purchasing] re-implement `description` as method

Allows description to be set in the same module where the
state type is defined.

Co-authored-by: Eric Mastro <eric.mastro@gmail.com>

* [contracts] fix typo

Co-authored-by: Eric Mastro <eric.mastro@gmail.com>

* [market] Use more generic error type

Should we decide to change the provider type later

Co-authored-by: Eric Mastro <eric.mastro@gmail.com>

Co-authored-by: Eric Mastro <eric.mastro@gmail.com>
2022-11-08 08:10:17 +01:00
Eric Mastro 8f116cd01e [chore] make RequestId, SlotId, Nonce, PurcahseId distinct
Make RequestId, SlotId, Nonce, PurcahseId distinct types.

Add/modify conversions to support the distinct type (ABI encoding/decoding, JSON encoding, REST decoding).

Update tests
2022-08-26 13:29:09 +10:00
markspanbroek b88561e090
Subscribe to proof submissions (#83)
* Update dagger-contracts

* [proving] rename ProofTiming -> Proofs

* Update nim-ethers to 0.1.4

* [proving] Subscribe to proof submissions

* [proving] support proof submission through the Proving abstraction
2022-04-13 10:41:48 -06:00
Mark Spanbroek bd8f4d5d74 [tests] Extract common examples into separate module 2022-04-04 10:03:46 +02:00