diff --git a/specs/marketplace.md b/specs/marketplace.md index 954d92e..0a47af6 100644 --- a/specs/marketplace.md +++ b/specs/marketplace.md @@ -1,3 +1,5 @@ +# Codex Marketplace Spec + --- title: CODEX-MARKETPLACE name: Codex Storage Marketplace @@ -14,253 +16,349 @@ contributors: ## Abstract Codex Marketplace and its interactions are defined by a smart contract deployed on an EVM-compatible blockchain. -This specification describes these interactions for all the different roles in the network. - -The specification is meant for a Codex client implementor. -The goal is to create a storage marketplace that promotes durability. - -## Motivation -The Codex network aims to create a peer-to-peer storage engine with strong data durability, -data persistence guarantees and node storage incentives. -Support for resource restricted devices, like mobile devices should also be embraced. -The protocol should remove complexity to allow for a simple implementation and -simplify incentive mechanisms. +This specification describes these flows for all the different roles in the network. +The specification is meant for a Codex client implementor. ## Semantics -The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, +The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt). ### Definitions -| Terminology | Description | -| --------------- | --------- | -| storage providers | A Codex node that provides storage services to the marketplace. | -| validator nodes | A Codex node that checks for missing storage proofs and triggers for a reward. | -| client nodes | The most common Codex node that interacts with other nodes to store, locate and retrieve data. | -| slots | Created by client nodes when a new dataset is requested to be stored. Discussed further in the [slots section](#slots). | +| Terminology | Description | +|-------------------------|-------------------------------------------------------------------------------------------------------------------------| +| storage provider (SP) nodes | A node that provides storage services to the marketplace. | +| validator nodes | A node that checks for missing storage proofs and triggers contract call for a reward. | +| client nodes | The most common node that interacts with other nodes to store, locate and retrieve data. | +| storage request | Created by client node when it wants persist data on the network. Represents the dataset and persistence configuration. | +| slots | Storage request dataset is split into several pieces (slots) that are then distributed over different SPs. | -### Storage Request +## Motivation -Client nodes can create storage requests on the Codex network via the Codex marketplace. -The marketplace handles storage requests, the storage slot state, -storage provider rewards, storage provider collaterals, and storage proof state. +The Codex network aims to create a peer-to-peer storage engine with strong data durability, +data persistence guarantees, and node storage incentives. -To create a request to store a dataset on the Codex network, -client nodes MUST split the dataset into data chunks, $(c_1, c_2, c_3, \ldots, c_{n})$. -Using an erasure coding technique, -the data chunks are encoded and placed into separate slots. -The erasure coding technique SHOULD be the [Reed-Soloman algorithm](https://hackmd.io/FB58eZQoTNm-dnhu0Y1XnA). +An important component of Codex network is a Marketplace. It is a place which mediates negotiations of all parties +in order to provide persistence in the network. It also provides ways to enforce agreements and **TODO** with repair. -When the client node is prompted by the user to create a storage request, -it MUST submit a transaction with the desired request parameters. -Once a request is created via the transaction, -all slots MUST be filled by storage providers before the request is officially started. -If the request does not attract enough storage providers after a time defined by `expiry` runs out, -the request is `canceled`. -If canceled, the storage provider SHOULD initiate a transaction call in order to receive its `collateral` along with a portion of the `reward`. -The remaining `reward` is returned to the requester. -The requester MAY create a new request with different values to restart the process. +Marketplace is defined by smart-contract that is deployed to EVM-compatible blockchain. It has several flows which are +linked with roles in the network and which the participating node takes upon itself. It can be one role or multiple at the same time. +This specification describes these flows. -In order to submit the new storage request with the transaction, -the following parameters MUST be specified in the transaction call: +The Marketplace handles storage requests, the storage slot state, +storage provider rewards, storage provider collaterals, and storage proof state. -```solidity +If a node implementation wants to participate in the persistence layer of Codex it needs to choose which role(s) it wants +to support and implement properer flows otherwise it won't be compatible with the rest of the Codex network. - // the Codex node requesting storage - address client; +### Roles - // content identifier - string cid; +There are 3 main roles in the network - client, storage provider (SP) and validator. - // merkle root of the dataset, used to verify storage proofs - byte32 merkleRoot; +Client is a potentially short-lived node in the network, that interacts with it with the main purpose of persisting +its data in the network. - // amount of token from the requester to reward storage providers - uint256 reward; +Storage Provider is long-term participant in the network, that store other's data for profit. It needs to provide a proof +to the smart contract that it possesses the data from time to time. - // amount of tokens required for collateral by storage providers - uint256 collateral; +Validator is a node that helps with enforcing the storage provider's duties to comply with storage requests that they +accepted. When it detects that SP should have submitted a proof but non was submitted on-chain, it triggers on-chain +function which will handle this case. Validator is rewarded for correct invocation of this function. - // frequency that proofs are checked by validator nodes - uint256 proofProbability; - - // amount of desired time for stoage request - uint256 duration; - - // the number of requested slots - uint64 slots; - - // amount of storage per slot - uint256 slotSize; - - // Amount of time before request expires - uint256 expiry; - - // random value to differentiate from other requests - byte32 nonce; +## Storage Request Lifecycle ``` - -`cid` - -An identifier used to locate the dataset -- MUST be a [CIDv1](https://github.com/multiformats/cid#cidv1) with sha-256 based [multihash](https://github.com/multiformats/multihash) -- MUST be generated by the client node - -`reward` - -- it is an REQUIRED amount to be included in the transaction for a storage request. -- it SHOULD be amount of tokens offered per byte per second -- it MUST be a token known to the network. -After tokens are recevied by the Codex Marketplace, -it MUST be released to storage providers who successfully fill slots until the storage request is complete. - -`collateral` - -All storage providers MUST provide token collateral before being able to fill a storage slot. -The following is related to storage provider who has offered `collateral`. - -If a storage provider, filling a slot, -fails to provide enough proofs of storage, the `collateral` MUST be forfeited. -This MAY be managed by updating a smart contract object that tracks the number of missed proofs, -percentage of `collateral` already slashed, or number of slashed `collateral` for slot to be freed. -The storage provider MAY be able to fill the same failed slot, -but MUST replace any `collateral` that was already forfeited. - -A portion of the `collateral` MUST be offered as a reward to validator nodes, -and a portion SHOULD be offered as a reward to other storage providers that repair freed [slots](#slots). - -`proofProbability` - -Determines the inverse probability that a proof is required in a period. -The probability MUST be: - -$\frac{1}{proofProbability}$ - -- Storage providers are REQUIRED to provide proofs of storage per period that are submited to the marketplace smart contract and verified by validator nodes. -- The requester SHOULD provide the value for the frequency of proofs provided by storage providers. - -`duration` - -- it SHOULD be in seconds -- Once the `reward` has depleted from periodic storage provider payments, -the storage request SHOULD end. -The requester MAY renew the storage request by creating a new request with the same `cid` value. -Different storage providers MAY fulfill the request. -- Data MAY be considered lost during contract `duration` when no other storage providers decide to fill empty slots. - -### Fulfilling Requests -In order for a storage request to start, -storage providers MUST enter a storage contract with the requester via the marketplace smart contract. - -When storage providers are selected to fill a slot for the request, -storage providers MUST NOT abandon the slot, unless the slot state is `cancelled`, `complete` or `failed`. -If too many slots are abandoned, the slot state SHOULD be changed to `failed`. - -Below is the smart contract lifecycle for a storage request: + ┌───────────┐ + │ Cancelled │ + └───────────┘ + ▲ + │ Not all + │ Slots filled + │ + ┌───────────┐ ┌──────┴─────────────┐ ┌─────────┐ + │ Submitted ├───►│ Slots Being Filled ├──────────►│ Started │ + └───────────┘ └────────────────────┘ All Slots └────┬────┘ + Filled │ + │ + ┌───────────────────────┘ + Proving ▼ + ┌────────────────────────────────────────────────────────────┐ + │ │ + │ Proof submitted │ + │ ┌─────────────────────────► All good │ + │ │ │ + │ Proof required │ + │ │ │ + │ │ Proof missed │ + │ └─────────────────────────► After some time slashed │ + │ eventually Slot freed │ + │ │ + └────────┬─┬─────────────────────────────────────────────────┘ + │ │ ▲ + │ │ │ + │ │ SP kicked out and Slot Freed ┌───────┴────────┐ +All good │ ├─────────────────────────────►│ Repair process │ +Time ran out │ │ └────────────────┘ + │ │ + │ │ Too much Slots Freed ┌────────┐ + │ └─────────────────────────────►│ Failed │ + ▼ └────────┘ + ┌──────────┐ + │ Finished │ + └──────────┘ +``` ![image](./images/request-contract.png) -### Slots -Slots is a method used by the Codex network to distribute data chucks amongst storage providers. -Data chucks, created by clients nodes, MUST use a method of distributing the dataset for data resiliency. -- Client nodes SHOULD decide how many nodes should fill the slots of a storage contract. -- Storage providers MUST be selected before filling a slot, +## Client role -Each slot represents a chunk of a dataset provided during the storage request. -The first state of a slot is `free`, meaning that the slot is waiting to be reserved by a storage provider. -The Codex marketplace using a slot dispersal mechanism to decide what storage providers can reserve a slot, -see [dispersal section below](#dispersal). +Client role represent nodes that mediate persisting data inside Codex network. -After a slot reservation is secured, the storage provider MUST: -- provide token collateral and proof of storage to fill the slot -- provide proofs of storage periodically -Once filled, the slot state SHOULD be changed from `reserved` to `filled`. +There are 2 parts for client role: -The `reward` payout SHOULD be calculated as periodic payments until the request `duration` is complete. -Once complete, the slot state SHOULD be changed to `finished` and payout occurs. + - Requesting storage from the network - creating storage request. + - Withdrawing funds from storage requests. -A slot MUST become empty after the storage provider fails to provide proofs of storage to the marketplace. -The state of the slot SHOULD change from `filled` to `free` when validator nodes see the slot is missing proofs. +### Creating storage requests -The storage provider assigned to that slot MUST forfeit its `collateral`. -Other storage providers can earn a small portion of the forfeited `collateral` by providing a new proof of storage and `collateral`, -this is referred to as repairing the empty slot. +When the client node is prompted by the user to create a storage request, +it SHOULD receive the input parameters for the storage request from the user. -The slot lifecycle of a storage provider that has filled a slot is demonstrated below: +To create a request to persist a dataset on the Codex network, +client nodes MUST split the dataset into data chunks, $(c_1, c_2, c_3, \ldots, c_{n})$. +Using an erasure coding technique and input parameters, the data chunks are encoded and placed into separate slots. +The erasure coding technique MUST be the [Reed-Soloman algorithm](https://hackmd.io/FB58eZQoTNm-dnhu0Y1XnA). +The final slot's roots and other metadata MUST be placed into Manifest. Manifest's CID is then used as the `cid` of the +stored dataset. ------------ +After the dataset is prepared, it MUST submit a transaction with the desired request parameters which are represented +as `Request` object and its sub-objects. Bellow are described its properties: - proof & proof & - collateral reserved proof missed collateral missed - | | | | | | - v v v v v v - ------------------------------------------------------------------------- - slot: |///////////////////////////////| |///////////////////////| - ------------------------------------------------------------------------ - | | - v v - Update Check maxNumOfSlash - slashCriterion is reached - Lost Collateral - (number of proofs missed) +```solidity +struct Request { + // The Codex node requesting storage + address client; + // Ask describing parameters of Request + Ask ask; + + // Content describing the dataset that will be hosted with the Request + Content content; + // Timeout in seconds during which all the slots have to be filled, otherwise Request will get cancelled + uint256 expiry; - ---------------- time ----------------> + // Random value to differentiate from other requests of same parameters + byte32 nonce; +} + +struct Ask { + // Amount of token that will be awarded to storage providers for finishing the storage request. + // Reward is per slot per second. + uint256 reward; -#### Slot Dispersal + // Amount of tokens required for collateral by storage providers + uint256 collateral; -Storage providers compete with one another to store data from storage requests. -Before a storage provider can download the data, they MUST obtain a reseversation for a slot. -The Codex network uses an expanding window based on the Kademlia distance function to select storage providers that are allowed to reserve a slot. + // Probability how often storage providers needs to submit proof of storage + uint256 proofProbability; -This starts with a random source address hash function that can be contructed as: + // Amount of desired time for storage request in seconds + uint256 duration; - hash(blockHash, requestId, slotIndex, reservationIndex); + // The number of requested slots + uint64 slots; -`blockHash`: unique identifier for a specific EVM-compatible block + // Amount of storage per slot in bytes + uint256 slotSize; -`requestId`: unique identifier for storage request + // Max slots that can be lost without data considered to be lost + uint64 maxSlotLoss; +} -`slotIndex`: index of current empty slot +struct Content { + // Content identifier + string cid; -`reservationIndex`: index of current slot reservation + // Merkle root of the dataset, used to verify storage proofs + byte32 merkleRoot; +} -The unique source address, along with the storage provider's blockchain address, -is used to calculate the expanding window. -The distance between the two addresses can be defined by: +``` -$$ XOR(A,A_0) $$ +Notes about some of the parameters: -The allowed distance over time $t_1$, can be defined as $2^{256} * F(t_1)$. -When the storage provider's distance is greater than the allowed distance, -the storage provider SHOULD be eligible to to obtain a slot reservation. +`cid` -- Note after eligiblity, the storage provider MUST provide `collateral` and -storage proofs to make slot state change `reserved` to `filled`. +An identifier used to locate the Manifest representing the dataset. +- MUST be a [CIDv1](https://github.com/multiformats/cid#cidv1) with sha-256 based [multihash](https://github.com/multiformats/multihash). +- Data it represents SHOULD be discoverable in the network, otherwise Request will get cancelled. -### Filling the Slot +`reward` -When the value of the allowed distance increases, -more storage providers SHOULD be elgiblable to participate in reserving a slot. -The Codex network allows a storage provider is allowed to fill a slot after calculating the storage provider's Kademlia distance is less than the allowed distance. -The total value storage providers MUST obtain can be defined as: +- It is an REQUIRED amount to be included in the transaction for a storage request. +- It SHOULD be an amount of tokens offered per slot per second. +- The Client address MUST have [approval](https://docs.openzeppelin.com/contracts/2.x/api/token/erc20#IERC20-approve-address-uint256-) for transfer of at least the same amount on the ERC20 based token, that the network utilizes. -$$ XOR(A,A_0) < 2^{256} * F(t_1) $$ -- XOR(A,A_0) represents Kademlia distance function -- 2^{256} represents the total number of 256-bit addresses in the address space -- F(t_1) represents the expansion function over time -Eligible storage providers represented below: +`collateral` - start point - | Kademlia distance - t=3 t=2 t=1 v - <------(------(------(------·------)------)------)------> - ^ ^ - | | - this provider is this provider is - allowed at t=2 allowed at t=3 +- Amount of tokens that the storage providers MUST submit when they fill slots. +- Collateral is then slashed or forfeited if the storage providers fail to provide the service requested by the Request (more information bellow). + +`proofProbability` + +Determines the inverse probability that a proof is required in a period: $\frac{1}{proofProbability}$ + +- Storage providers are REQUIRED to provide proofs of storage to the marketplace smart contract when they are prompted to by the smart contract. +- The frequency is non-deterministic in order to prevent from pre-calculation attacks, but it is affected by this parameter. + +`expiry` + +- Parameter is specified as duration in seconds, hence the final deadline timestamp is calculated at the moment when the transaction is mined. + +`nonce` + +- It SHOULD NOT be an empty byte array + +#### Renewal of Storage Request + +It should be noted that Marketplace does not support extending Requests. It is REQUIRED that if user wants to +extend the Request's duration, somebody submits a new Request transaction with the same CID **well before the original +Request finishes**. In this way the data will be still persisted in the network at the time when new (or the current) storage providers +can retrieve the dataset in order to fill slots of the new Request. + +### Withdrawing funds + +The client node SHOULD monitor the status of Requests that it created. The node can utilize on-chain state in order to +fetch the list of the active Requests linked to the client node's blockchain address using function `myRequests()`, that +returns array of `RequestId`s. This list is kept up to date by the smart contract itself. + +When Request reaches states `Cancelled` (not all slots filled after `expiry` timeout) or `Failed` (too many slots gets freed and data is non-recoverable) +the client node SHOULD initiate withdrawal of the remaining funds from the contract using function `withdrawFunds(requestId)`. + + - `Cancelled` state MAY be detected using timeout specified from function `requestExpiresAt(requestId)` **and** not detecting emitted `RequestFulfilled(requestId)` event. + - `Failed` state MAY be detected using `RequestFailed(requestId)` event emitted from the smart contract. + - `Finished` state MAY be detected setting timeout specified from function `getRequestEnd(requestId)`. + +## Storage Provider role + +Storage Provider (SP) role represents nodes that persist data across the network by hosting Slots of Requests +that Client nodes requested. + +There are several parts to hosting a slot: + + - Filling a slot + - Proving + - Repairing slots + - Collecting Request's reward and collateral + +### Filling slot + +When new Request is created `StorageRequested(requestId, ask, expiry)` event is emitted with following properties: + + - `requestId` - ID of Request. + - `ask` - Specification of Request parameters. For details see above. + - `expiry` - Unix timestamp that specifies when the Request will be cancelled if all slots are not filled by then. + +It is then up to the Storage Provider node to decide based on the emitted parameters if it wants to participate in the +Request and try to fill its slot(s). This decision SHOULD be done based on parameters specified by the node operator. +If the node decide to ignore this Request, no action is need, otherwise the node HAVE TO follow the remaining steps. + +Node MUST decide which Slot specified by slot's index it wants to try to fill in. Node MAY try filling multiple +slots. In order to fill a slot, node first MUST download the slot's data using slot's root that can be retrieved +from Manifest specified in `request.content.cid` (**TODO: Manifest RFC**). This object can be retrieved from the smart contract using `getRequest(requestId)`. +Then node MUST generate proof over the downloaded data (**TODO: Proving RFC**). + +When proof is ready it then MUST create transaction for smart contract call `fillSlot()` with following REQUIRED: + + - Parameters: + - `requestId` - ID of the Request. + - `slotIndex` - Index that the node is trying to fill. + - `proof` - `Groth16Proof` proof structure, generated over the dataset. + - The Ethereum address of the node from which the transaction originates MUST have [approval](https://docs.openzeppelin.com/contracts/2.x/api/token/erc20#IERC20-approve-address-uint256-) for transfer of at least the amount required as collateral for the Request on the ERC20 based token, that the network utilizes. + +If the proof is invalid, or slot was already filled by other node then the transaction +will revert, otherwise `SlotFilled(requestId, slotIndex)` event is emitted. If the transaction is successful then the +node SHOULD transition into __proving__ state as it will need to submit proof of data possession when prompted by the +contract. + +It should be noted that if the node see the `SlotFilled` emitted for slot that he is downloading the dataset or +generating proof for, then node SHOULD stop and choose different non-filled slot to try to fill. + +### Proving + +Once node fills a slot it MUST periodically, yet non-deterministically provide proof to the smart contract that it +stores the data it should. Node MAY detect that proof is required using the `isProofRequired(slotId)` or that it will +be required using the `willProofBeRequired(slotId)` in case the node is in [downtime](https://github.com/codex-storage/codex-research/blob/41c4b4409d2092d0a5475aca0f28995034e58d14/design/storage-proof-timing.md). + +Once node knows it has to provide a proof it MUST obtain the proof challenge using `getChallenge(slotId)` which then +NEEDS to be incorporated into the proof generation as described in Proving RFC (**TODO: Proving RFC**). + +#### Slashing + +There is a slashing scheme in place that is orchestrated by the smart contract to incentive correct behavior +and proper proof submissions by the storage provider nodes. This scheme is configured on smart contract level and is +the same for all the participants in the network. The concrete values of this scheme can be obtained by `getConfig()` contract call. + +The slashing works in the following way: + + - Node MAY miss at most `config.collateral.slashCriterion` proofs before it is slashed. + - It is then slashed `config.collateral.slashPercentage` percentage **of the originally asked collateral**. + - If the number of times the node was slashed reaches above `config.collateral.maxNumberOfSlashes`, then the slot is freed, the remaining of node's collateral is burned and the slot is offered to other nodes for repair. Contract also emits the `SlotFreed(requestId, slotIndex)` event. + +If the number of concurrent freed slots reaches above the `request.ask.maxSlotLoss`, then the dataset is lost and the Request is failed. +The collateral of all the nodes that hosted Request's slots is burned and the event `RequestFailed(requestId)` is emitted. + +### Repair + +When slot is freed because of too many missed proofs, which SHOULD be detected by listening on the `SlotFreed(requestId, slotIndex)` event, then +storage provider node can decide if it wants to participate in the repairing of the slot. The node SHOULD, similarly like with slot's filling, +consider the node's operator configuration when making the decision. + +The repair process is the same as with the filling slots, with one difference that the node MUST use the erasure coding to +reconstruct the original dataset. As this requires retrieving more data of the dataset from the network, the node that +will successfully fill the repair node will be granted additional reward. (**TODO: Implementation**) + +The repair process is then as follows: + +1. Node detects `SlotFreed` event and decide to repair it. +1. Node MUST download the required chunks and MUST use the [Reed-Soloman algorithm](https://hackmd.io/FB58eZQoTNm-dnhu0Y1XnA) to reconstruct the original slot's data. +1. Node MUST generate proof over the reconstructed data. +1. Node MUST submit transaction with call to `fillSlot()` with the same parameters and collateral allowance as described in the [Filling slot](#filling-slot). + +### Collecting funds + +Storage Provider node SHOULD monitor Requests and slots it hosts. In case it needs to discover what slots it is hosting, +for example, because the node had to restart, then it SHOULD use the contract call `mySlots()`, which returns slots IDs +associated with the Ethereum address from which the contract call originates. This list is kept up to date by the smart contract itself. + +When node slot's Requests reaches states `Cancelled`, `Finished` or `Failed` it SHOULD call the contract's `freeSlot(slotId)` function. +These states can be detected using: + + - `Cancelled` state MAY be detected by setting timeout using `expiry` **and** not detecting `RequestFulfilled(requestId)` event. There is also `RequestCancelled` event emitted, yet that is not guaranteed to be emitted at the time of expiry. + - `Finished` state MAY be detected by setting timeout specified from function `getRequestEnd(requestId)`. + - `Failed` state MAY be detected by listening to the `RequestFailed(requestId)` event emitted. + +For each of these states, different funds are collected: + +- For `Cancelled` the collateral is returned together with proportional payout based on time that the node actually hosted the dataset before expiry was reached. +- For `Finished` the full reward for hosting the slot together with collateral is gathered. +- For `Failed` no funds are collected as reward is returned to the client and collateral is burned, but this call removes the slot from the `mySlots()` tracking. + +## Validator role + +Validator role represents nodes that verify that the Storage Provider nodes submit proofs when they are required. + +This is because in blockchain we cannot act on things that **do not happen** and somebody needs to create a transaction +in order for the smart contract to act on it. The validator nodes get then rewarded for each time they correctly +mark proof as missing. + +Validator nodes MUST observe the slot's space by listening on the `SlotFilled` event, which SHOULD prompt the validator +to add the slot to the watched slots. Then after the end of every period validator has at most `config.proofs.timeout` seconds +(config can be retrieved with `getConfig()`) to validate all the slots and if it finds a slot that missed its proof, then +it SHOULD submit transaction with call to the function `markProofAsMissing(slotId, period)` that validates the correctness +and if right, will reward the validator with a reward. ## Copyright