From 8fd4e07776fffdbd8aef57a7fef03e4cf20ca9f8 Mon Sep 17 00:00:00 2001 From: Slava <20563034+veaceslavdoina@users.noreply.github.com> Date: Sat, 28 Sep 2024 23:03:27 +0300 Subject: [PATCH] Change image tag to img (#17) --- learn/whitepaper.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/learn/whitepaper.md b/learn/whitepaper.md index f2408a9..3f783f6 100644 --- a/learn/whitepaper.md +++ b/learn/whitepaper.md @@ -1,3 +1,10 @@ +$$ +\newcommand{sp}{\text{SP}} +\newcommand{sc}{\text{SC}} +\newcommand{\postwo}[0]{\text{Poseidon2}} +$$ + + ## 1. Introduction Data production has been growing at an astounding pace, with significant implications. Data is a critical asset for businesses, driving decision-making, strategic planning, and innovation. Individuals increasingly intertwine their physical lives with the digital world, meticulously documenting every aspect of their lives, taking pictures and videos, sharing their views and perspectives on current events, using digital means for communication and artistic expression, etc. Digital personas have become as important as their physical counterparts, and this tendency is only increasing. @@ -134,7 +141,7 @@ Erasure coding plays two main roles in Codex: _i)_ allowing data to be recovered **Erasure Coding for Redundancy.** As described before, a dataset $D$ is initially split into $k$ slots of size $s = \left\lceil \frac{b}{k} \right\rceil$ (Figure 1). Since $b$ may not actually be divisible by $k$, Codex will add _padding blocks_ as required so that the number of blocks in $D$ is $b_p = s \times k$.
- +
**Figure 1.** A padded dataset $D$ split into $k$ slots. @@ -142,7 +149,7 @@ Erasure coding plays two main roles in Codex: _i)_ allowing data to be recovered Codex then erasure-codes $D$ by _interleaving_ blocks taken from each slot (Figure 2), one at a time. The procedure runs $s$ interleaving steps, where $s$ is the number of blocks in a slot.
- +
**Figure 2.** Erasure-coded dataset $D_e$ with $k + m$ slots and interleaving process. @@ -160,7 +167,7 @@ A smarter approach would be by _sampling_: instead of downloading the entire fil Although the decay is always geometric, the impact of having a loss fraction that is low (e.g. less than $1\%$) can be significant: as depicted in Figure 3, for $l_i = 0.01$ we get a $p_{\text{detect}}$ that is smaller than $0.5$ even after drawing $50$ samples. If that does not sound too bad, consider an adversarial setting in which an SP purposefully drops a very small fraction of a large file, perhaps one single block out of a million. For fractions that small ($10^{-6}$), one would require hundreds of thousands of samples to get reasonable detection probabilities, e.g. $p_{\text{detect}} > 0.99$.
- +
**Figure 3.** Number of samples $j$ required by a verifier to assert data loss ($p_{\text{detect}}$) for various loss fractions ($l_i$). @@ -224,7 +231,7 @@ Datasets stored in Codex need to be advertised over a Distributed Hash Table (DH A CID unequivocally identifies a piece of data by encoding a flavour of a hash of its content together with the type of hashing method used to compute it. In the case of a Codex dataset $D_e$ (Figure 4), this hash is taken to be the root of the SHA256 Merkle tree constructed over its blocks $\{b_1, \cdots, b_{s \times (k + m)}\}$.
- +
**Figure 4.** CIDs for Codex datasets. @@ -275,7 +282,7 @@ An SC that wishes Codex to store a dataset $D_e$ needs to provide $5$ main param As discussed in Sec. 5, these parameters may impact durability guarantees directly, and the system offers complete flexibility so that applications can tailor spending and parameters to specific needs. Applications built on Codex will need to provide guidance to their users so they can pick the correct parameters for their needs, not unlike Ethereum wallets help users determine gas fees.
- +
**Figure 6.** Storage requests and their processing by SPs. @@ -287,7 +294,7 @@ As depicted in Figure 6, every storage request posted by an SC gets recorded on- To help mitigate these issues, the Codex marketplace implements a time-based, _expanding window_ mechanism to allow SPs to compete for slots. As depicted in Figure 7, each storage request is assigned a random position in a $z$-bit ID space by taking a hashing function $h$ and computing, for slot $S_i$, the value $h(u\,\|\, i)$, where $u$ is a random nonce. This will effectively disperse storage requests for slots approximately uniformly at random over the ID space.
- +
**Figure 7.** Slots placed at random in a $z$-bit space. @@ -295,7 +302,7 @@ To help mitigate these issues, the Codex marketplace implements a time-based, _e We then allow only hosts whose blockchain IDs are within a certain "distance" of a slot to compete in filling it (Figure 8).
- +
**Figure 8.** SP eligibility as a function of time and its distance to a slot. @@ -381,7 +388,7 @@ We model the system using a CTMC with a multi-dimensional state space representi States $S_{N-K+1,f}$ for each $f$ are absorbing states. By calculating the expected time of absorption, we can quantify the reliability of the system.
- +
**Figure 9.** $p_{\text{loss}}$ (y axis) as a function of $n$ for various values of $R_0$ and expansion factors ($R_{\text{inv}}$).