mirror of
https://github.com/logos-storage/logos-storage-docs-obsidian.git
synced 2026-01-04 06:13:09 +00:00
483 lines
20 KiB
Markdown
483 lines
20 KiB
Markdown
## 1. Purpose and Scope
|
||
|
||
The **Store Module** serves as the core storage abstraction in **Nim-Codex**, providing a unified interface for storing and retrieving **content-addressed blocks** and associated metadata.
|
||
|
||
The primary design goal is to decouple storage operations from the underlying datastore semantics by introducing the `BlockStore` interface. This interface standardizes methods for storing and retrieving both **ephemeral** and **persistent** blocks, ensuring a consistent API across different storage backends.
|
||
|
||
Additionally, the module integrates a **maintenance engine** responsible for cleaning up expired ephemeral data according to configured policies.
|
||
|
||
This module is built on top of the generic [`DataStore (DS)` interface](https://github.com/codex-storage/nim-datastore/blob/master/datastore/datastore.nim), which is implemented by multiple backends such as SQLite, LevelDB, and the filesystem.
|
||
|
||
The DS provides a **KV-store** abstraction (`Get`, `Put`, `Delete`, `Query`), with backend-dependent guarantees. At a minimum, row-level consistency and basic batching are expected.
|
||
|
||
It also supports:
|
||
|
||
- **Namespace mounting** for isolating backend usage
|
||
- **Layering backends** (e.g., caching in front of persistent stores)
|
||
- Flexible stacking and composition of storage proxies
|
||
|
||
---
|
||
|
||
## 2. Limitations
|
||
|
||
The current implementation has several shortcomings:
|
||
|
||
- **No dataset-level operations** or advanced batching support
|
||
- **Lack of consistent locking and concurrency control**, which may lead to inconsistencies during:
|
||
- Crashes
|
||
- Long-running operations on block groups (e.g., reference count updates, expiration updates)
|
||
|
||
---
|
||
|
||
## 3. `BlockStore` Interface
|
||
|
||
| Method | Description | Input | Output |
|
||
| --- | --- | --- | --- |
|
||
| `getBlock(cid: Cid)` | Retrieve block by CID | CID | `Future[?!Block]` |
|
||
| `getBlock(treeCid: Cid, index: Natural)` | Retrieve block from a Merkle tree by leaf index | Tree CID, index | `Future[?!Block]` |
|
||
| `getBlock(address: BlockAddress)` | Retrieve block via unified address | BlockAddress | `Future[?!Block]` |
|
||
| `getBlockAndProof(treeCid: Cid, index: Natural)` | Retrieve block with Merkle proof | Tree CID, index | `Future[?!(Block, CodexProof)]` |
|
||
| `getCid(treeCid: Cid, index: Natural)` | Retrieve leaf CID from tree metadata | Tree CID, index | `Future[?!Cid]` |
|
||
| `getCidAndProof(treeCid: Cid, index: Natural)` | Retrieve leaf CID with inclusion proof | Tree CID, index | `Future[?!(Cid, CodexProof)]` |
|
||
| `putBlock(blk: Block, ttl: Duration)` | Store block with quota enforcement | Block, optional TTL | `Future[?!void]` |
|
||
| `putCidAndProof(treeCid: Cid, index: Natural, blkCid: Cid, proof: CodexProof)` | Store leaf metadata with ref counting | Tree CID, index, block CID, proof | `Future[?!void]` |
|
||
| `hasBlock(...)` | Check block existence (CID or tree leaf) | CID / Tree CID + index | `Future[?!bool]` |
|
||
| `delBlock(...)` | Delete block/tree leaf (with ref count checks) | CID / Tree CID + index | `Future[?!void]` |
|
||
| `ensureExpiry(...)` | Update expiry for block/tree leaf | CID / Tree CID + index, expiry timestamp | `Future[?!void]` |
|
||
| `listBlocks(blockType: BlockType)` | Iterate over stored blocks | Block type | `Future[?!SafeAsyncIter[Cid]]` |
|
||
| `getBlockExpirations(maxNumber, offset)` | Retrieve block expiry metadata | Pagination params | `Future[?!SafeAsyncIter[BlockExpiration]]` |
|
||
| `blockRefCount(cid: Cid)` | Get block reference count | CID | `Future[?!Natural]` |
|
||
| `reserve(bytes: NBytes)` | Reserve storage quota | Bytes | `Future[?!void]` |
|
||
| `release(bytes: NBytes)` | Release reserved quota | Bytes | `Future[?!void]` |
|
||
| `start()` | Initialize store | — | `Future[void]` |
|
||
| `stop()` | Gracefully shut down store | — | `Future[void]` |
|
||
| `close()` | Close underlying datastores | — | `Future[void]` |
|
||
|
||
---
|
||
|
||
## 4. Functional Requirements
|
||
|
||
### Available Today
|
||
|
||
- **Atomic Block Operations**
|
||
- Store, retrieve, and delete operations must be atomic.
|
||
- Support retrieval via:
|
||
- Direct CID
|
||
- Tree-based addressing (`treeCid + index`)
|
||
- Unified block address
|
||
|
||
- **Metadata Management**
|
||
- Store protocol-level metadata (e.g., storage proofs, quota usage).
|
||
- Store block-level metadata (e.g., reference counts, total block count).
|
||
|
||
- **Multi-Datastore Support**
|
||
- Pluggable datastore interface supporting various backends.
|
||
- Typed datastore operations for metadata type safety.
|
||
|
||
- **Lifecycle & Maintenance**
|
||
- BlockMaintainer service for removing expired data.
|
||
- Configurable maintenance intervals (default: 10 min).
|
||
- Batch processing (default: 1000 blocks/cycle).
|
||
|
||
### Future Requirements
|
||
|
||
- **Transaction Rollback & Error Recovery**
|
||
- Rollback support for failed multi-step operations.
|
||
- Consistent state restoration after failures.
|
||
|
||
- **Dataset-Level Operations**
|
||
- Handle Dataset level meta data.
|
||
- Batch operations for dataset block groups.
|
||
|
||
- **Concurrency Control**
|
||
- Consistent locking and coordination mechanisms to prevent inconsistencies during crashes or long-running operations.
|
||
|
||
- **Lifecycle & Maintenance**
|
||
- Cooperative scheduling to avoid blocking.
|
||
- State tracking for large datasets.
|
||
|
||
---
|
||
|
||
## 5. Non-Functional Requirements
|
||
|
||
### Available Today
|
||
|
||
- **Security**
|
||
- Verify block content integrity upon retrieval.
|
||
- Enforce quotas to prevent disk exhaustion.
|
||
- Safe orphaned data cleanup.
|
||
|
||
- **Scalability**
|
||
- Configurable storage quotas (default: 20 GiB).
|
||
- Pagination for metadata queries.
|
||
- Reference counting–based garbage collection.
|
||
|
||
- **Reliability**
|
||
- Metrics collection (`codex_repostore_*`).
|
||
- Graceful shutdown with resource cleanup.
|
||
|
||
### Future Requirements
|
||
|
||
- **Performance**
|
||
- Batch metadata updates.
|
||
- Efficient key lookups with configurable prefix lengths.
|
||
- Support for both fast and slower storage tiers.
|
||
- Streaming APIs optimized for extremely large datasets.
|
||
|
||
- **Security**
|
||
- Finer-grained quota enforcement across tenants/namespaces.
|
||
|
||
- **Reliability**
|
||
- Stronger rollback semantics for multi-node consistency.
|
||
- Auto-recovery from inconsistent states.
|
||
|
||
---
|
||
|
||
## 6. Internal Design
|
||
|
||
### Store Implementations
|
||
|
||
The Store module provides **three concrete implementations** of the `BlockStore` interface, each optimized for a specific role in the Nim-Codex architecture: **RepoStore**, **NetworkStore**, and **CacheStore**.
|
||
|
||
#### RepoStore
|
||
|
||
The **RepoStore (RS)** is a persistent `BlockStore` implementation that interfaces directly with low-level storage backends, such as hard drives and databases.
|
||
|
||
It uses two distinct `DataStore (DS)` backends:
|
||
|
||
- **FileSystem** — for storing raw block data
|
||
- **LevelDB** — for storing associated metadata
|
||
|
||
This separation ensures optimal performance, allowing block data operations to run efficiently while metadata updates benefit from a fast key-value database.
|
||
|
||
**Characteristics**:
|
||
|
||
- Persistent storage via datastore backends
|
||
- Quota management with precise usage tracking
|
||
- TTL (time-to-live) support with automated expiration
|
||
- Metadata storage for block size, reference count, and expiry
|
||
- Transaction-like operations implemented through reference counting
|
||
|
||
**Configuration**:
|
||
|
||
- `quotaMaxBytes`: Maximum storage quota
|
||
- `blockTtl`: Default TTL for stored blocks
|
||
- `postFixLen`: CID key postfix length for sharding
|
||
|
||
```python
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ RepoStore │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ ┌─────────────┐ ┌──────────────────────────┐ │
|
||
│ │ repoDs │ │ metaDs │ │
|
||
│ │ (Datastore) │ │ (TypedDatastore) │ │
|
||
│ │ │ │ │ │
|
||
│ │ Block Data: │ │ Metadata: │ │
|
||
│ │ - Raw bytes │ │ - BlockMetadata │ │
|
||
│ │ - CID-keyed │ │ - LeafMetadata │ │
|
||
│ │ │ │ - QuotaUsage │ │
|
||
│ │ │ │ - Block counts │ │
|
||
│ └─────────────┘ └──────────────────────────┘ │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
|
||
```
|
||
|
||
---
|
||
|
||
#### NetworkStore
|
||
|
||
The **NetworkStore** is a composite `BlockStore` that combines local persistence with network-based retrieval for distributed content access.
|
||
|
||
It follows a **local-first** strategy — attempting to retrieve or store blocks locally first, and falling back to network retrieval via the Block Exchange Engine if the block is not available locally.
|
||
|
||
**Characteristics**:
|
||
|
||
- Integrates local storage with network retrieval
|
||
- Works seamlessly with the block exchange engine for peer-to-peer access
|
||
- Transparent block fetching from remote sources
|
||
- Local caching of blocks retrieved from the network for future access
|
||
|
||
```python
|
||
┌────────────────────────────────────────────────────────────┐
|
||
│ NetworkStore │
|
||
├────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────────────┐ ┌──────────────────────┐ │
|
||
│ │ LocalStore - RS │ │ BlockExcEngine │ │
|
||
│ │ • Store blocks │ │ • Request blocks │ │
|
||
│ │ • Get blocks │ │ • Resolve blocks │ │
|
||
│ └─────────────────┘ └──────────────────────┘ │
|
||
│ │ │ │
|
||
│ └──────────────┬───────────────┘ │
|
||
│ │ │
|
||
│ ┌─────────────┐ │
|
||
│ │BS Interface │ │
|
||
│ │ │ │
|
||
│ │ • getBlock │ │
|
||
│ │ • putBlock │ │
|
||
│ │ • hasBlock │ │
|
||
│ │ • delBlock │ │
|
||
│ └─────────────┘ │
|
||
└────────────────────────────────────────────────────────────┘
|
||
|
||
```
|
||
|
||
---
|
||
|
||
#### CacheStore
|
||
|
||
The **CacheStore** is an in-memory `BlockStore` implementation designed for fast access to frequently used blocks.
|
||
|
||
This store maintains **two separate LRU caches**:
|
||
|
||
1. **Block Cache** — `LruCache[Cid, Block]`
|
||
- Stores actual block data indexed by CID
|
||
- Acts as the primary cache for block content
|
||
2. **CID/Proof Cache** — `LruCache[(Cid, Natural), (Cid, CodexProof)]`
|
||
- Maps `(treeCid, index)` to `(blockCid, proof)`
|
||
- Supports direct access to block proofs keyed by `treeCid` and index
|
||
|
||
**Characteristics**:
|
||
|
||
- O(1) access times for cached data
|
||
- LRU eviction policy for memory management
|
||
- Configurable maximum cache size
|
||
- No persistence — cache contents are lost on restart
|
||
- No TTL — blocks remain in cache until evicted
|
||
|
||
**Configuration**:
|
||
|
||
- `cacheSize`: Maximum total cache size (bytes)
|
||
- `chunkSize`: Minimum block size unit
|
||
|
||
---
|
||
|
||
### Storage Layout
|
||
|
||
| Key Pattern | Data Type | Description | Example |
|
||
| --- | --- | --- | --- |
|
||
| `repo/manifests/{XX}/{full-cid}` | Raw bytes | Manifest block data | `repo/manifests/Cd/bafy...Cd → [data]` |
|
||
| `repo/blocks/{XX}/{full-cid}` | Raw bytes | Block data | `repo/blocks/Ab/bafy...Ab → [data]` |
|
||
| `meta/ttl/{cid}` | BlockMetadata | Expiry, size, refCount | `meta/ttl/bafy... → {...}` |
|
||
| `meta/proof/{treeCid}/{index}` | LeafMetadata | Merkle proof for leaf | `meta/proof/bafy.../42 → {...}` |
|
||
| `meta/total` | Natural | Total stored blocks | `meta/total → 12039` |
|
||
| `meta/quota/used` | NBytes | Used quota | `meta/quota/used → 52428800` |
|
||
| `meta/quota/reserved` | NBytes | Reserved quota | `meta/quota/reserved → 104857600` |
|
||
|
||
---
|
||
|
||
### WorkFlows
|
||
Following flow chart summarises how put, get, and delete operations interact with the shared block storage, metadata store, and quota management systems
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
A[Block Operations] --> B[putBlock]
|
||
A --> C[getBlock]
|
||
A --> D[delBlock]
|
||
|
||
B --> E[Store to RepoDS]
|
||
B --> F[Update Metadata]
|
||
B --> G[Update Quota]
|
||
|
||
C --> H[Query RepoDS]
|
||
C --> I[Handle Tree Access]
|
||
C --> J[Return Block Data]
|
||
|
||
D --> K[Check Reference Count]
|
||
D --> L[Remove from RepoDS]
|
||
D --> M[Update Counters]
|
||
|
||
E --> N[(Block Storage)]
|
||
H --> N
|
||
L --> N
|
||
|
||
F --> O[(Metadata Store)]
|
||
I --> O
|
||
K --> O
|
||
|
||
G --> P[(Quota Management)]
|
||
M --> P
|
||
```
|
||
|
||
#### PutBlock
|
||
The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
A[putBlock: blk, ttl] --> B[Calculate expiry = now + ttl]
|
||
B --> C[storeBlock: blk, expiry]
|
||
C --> D{Block empty?}
|
||
D -->|Yes| E[Return AlreadyInStore]
|
||
D -->|No| F[Create metadata & block keys]
|
||
F --> G{Block metadata exists?}
|
||
G -->|Yes| H{Size matches?}
|
||
H -->|Yes| I[Return AlreadyInStore]
|
||
H -->|No| J[Return Error]
|
||
G -->|No| K[Create new metadata]
|
||
K --> L[Store block data]
|
||
L --> M{Store successful?}
|
||
M -->|No| N[Return Error]
|
||
M -->|Yes| O[Update quota usage]
|
||
O --> P{Quota update OK?}
|
||
P -->|No| Q[Rollback: Delete block]
|
||
Q --> R[Return Error]
|
||
P -->|Yes| S[Update total blocks count]
|
||
S --> T[Trigger onBlockStored callback]
|
||
T --> U[Return Success]
|
||
```
|
||
|
||
---
|
||
|
||
#### GetBlock
|
||
The following flow chart explains how a block is retrieved by CID or tree reference, resolving metadata if necessary, and returning the block or an error
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
A[getBlock: cid/address] --> B{Input type?}
|
||
B -->|BlockAddress with leaf| C[getLeafMetadata: treeCid, index]
|
||
B -->|CID| D[Direct CID access]
|
||
C --> E{Leaf metadata found?}
|
||
E -->|No| F[Return BlockNotFoundError]
|
||
E -->|Yes| G[Extract block CID from metadata]
|
||
G --> D
|
||
D --> H{CID empty?}
|
||
H -->|Yes| I[Return empty block]
|
||
H -->|No| J[Create prefix key]
|
||
J --> K[Query datastore: repoDs.get]
|
||
K --> L{Block found?}
|
||
L -->|No| M{Error type?}
|
||
M -->|DatastoreKeyNotFound| N[Return BlockNotFoundError]
|
||
M -->|Other| O[Return Error]
|
||
L -->|Yes| P[Create Block with verification]
|
||
P --> Q[Return Block]
|
||
```
|
||
|
||
---
|
||
|
||
#### DelBlock
|
||
The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
A[delBlock: cid] --> B[delBlockInternal: cid]
|
||
B --> C{CID empty?}
|
||
C -->|Yes| D[Return Deleted]
|
||
C -->|No| E[tryDeleteBlock: cid, now]
|
||
E --> F{Metadata exists?}
|
||
F -->|No| G[Check if block exists in repo]
|
||
G --> H{Block exists?}
|
||
H -->|Yes| I[Warn & remove orphaned block]
|
||
H -->|No| J[Return NotFound]
|
||
I --> J
|
||
F -->|Yes| K{refCount = 0 OR expired?}
|
||
K -->|No| L[Return InUse]
|
||
K -->|Yes| M[Delete block & metadata]
|
||
M --> N[Return Deleted]
|
||
D --> O[Handle result]
|
||
J --> O
|
||
L --> O
|
||
N --> O
|
||
O --> P{Result type?}
|
||
P -->|InUse| Q[Return Error: Cannot delete dataset block]
|
||
P -->|NotFound| R[Return Success: Ignore]
|
||
P -->|Deleted| S[Update total blocks count]
|
||
S --> T[Update quota usage]
|
||
T --> U[Return Success]
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Data Models
|
||
|
||
### Stores
|
||
|
||
```
|
||
RepoStore* = ref object of BlockStore
|
||
postFixLen*: int
|
||
repoDs*: Datastore
|
||
metaDs*: TypedDatastore
|
||
clock*: Clock
|
||
quotaMaxBytes*: NBytes
|
||
quotaUsage*: QuotaUsage
|
||
totalBlocks*: Natural
|
||
blockTtl*: Duration
|
||
started*: bool
|
||
|
||
NetworkStore* = ref object of BlockStore
|
||
engine*: BlockExcEngine
|
||
localStore*: BlockStore
|
||
|
||
CacheStore* = ref object of BlockStore
|
||
currentSize*: NBytes
|
||
size*: NBytes
|
||
cache: LruCache[Cid, Block]
|
||
cidAndProofCache: LruCache[(Cid, Natural), (Cid, CodexProof)]
|
||
|
||
```
|
||
|
||
### Metadata Types
|
||
|
||
```
|
||
BlockMetadata* {.serialize.} = object
|
||
expiry*: SecondsSince1970
|
||
size*: NBytes
|
||
refCount*: Natural
|
||
|
||
LeafMetadata* {.serialize.} = object
|
||
blkCid*: Cid
|
||
proof*: CodexProof
|
||
|
||
BlockExpiration* {.serialize.} = object
|
||
cid*: Cid
|
||
expiry*: SecondsSince1970
|
||
|
||
QuotaUsage* {.serialize.} = object
|
||
used*: NBytes
|
||
reserved*: NBytes
|
||
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Dependencies
|
||
|
||
### External Dependencies
|
||
|
||
| Package | Purpose | Used For | Components Using |
|
||
| --- | --- | --- | --- |
|
||
| `pkg/chronos` | Async runtime | Async procedures, futures, cancellation handling | All stores (async operations) |
|
||
| `pkg/libp2p` | P2P networking | CID types, multicodec, multihash | BlockStore, NetworkStore, KeyUtils |
|
||
| `pkg/questionable` | Error handling | `?!` Result types, optional values | All components (error propagation) |
|
||
| `pkg/datastore` | Storage abstraction | Key-value storage interface, queries | RepoStore (main storage) |
|
||
| `pkg/datastore/typedds` | Typed datastore | Type-safe serialization/deserialization | RepoStore (metadata storage) |
|
||
| `pkg/lrucache` | Memory caching | LRU cache implementation | CacheStore (block caching) |
|
||
| `pkg/metrics` | Monitoring | Prometheus metrics, gauges, counters | RepoStore (quota tracking) |
|
||
| `pkg/serde/json` | Serialization | JSON encoding/decoding | RepoStore coders |
|
||
| `pkg/stew/byteutils` | Byte utilities | Hex conversion, byte arrays | RepoStore coders |
|
||
| `pkg/stew/endians2` | Endian conversion | Little/big endian conversion | RepoStore coders |
|
||
| `pkg/chronicles` | Logging | Structured logging | QueryIterHelper |
|
||
| `pkg/upraises` | Exception safety | Exception effect tracking | KeyUtils, TreeHelper |
|
||
| `std/options` | Optional types | Option[T] for nullable values | CacheStore |
|
||
| `std/sugar` | Syntax sugar | Lambda expressions, proc shortcuts | KeyUtils, RepoStore coders |
|
||
|
||
### Internal Dependencies
|
||
|
||
| Module | Purpose | Used For | Components Using |
|
||
| --- | --- | --- | --- |
|
||
| `../blocktype` | Block types | Block/Manifest type definitions, codec info | All stores (type checking) |
|
||
| `../blockexchange` | Block exchange | Network block exchange protocols | NetworkStore |
|
||
| `../merkletree` | Merkle trees | Tree structures, proofs, hashing | BlockStore, RepoStore, TreeHelper |
|
||
| `../manifest` | Data manifests | Manifest structures, CID identification | KeyUtils, CacheStore |
|
||
| `../clock` | Time abstraction | Clock interface for time operations | BlockStore, RepoStore |
|
||
| `../systemclock` | System clock | Wall-clock time implementation | RepoStore, Maintenance |
|
||
| `../units` | Units/Types | NBytes, Duration, time units | RepoStore (quota/TTL) |
|
||
| `../errors` | Custom errors | CodexError, BlockNotFoundError | RepoStore |
|
||
| `../namespaces` | Key namespaces | Database key namespace constants | KeyUtils |
|
||
| `../utils` | General utils | Common utility functions | BlockStore, RepoStore |
|
||
| `../utils/asynciter` | Async iterators | Async iteration patterns | TreeHelper, QueryIterHelper |
|
||
| `../utils/safeasynciter` | Safe async iter | Error-safe async iteration | NetworkStore, Maintenance |
|
||
| `../utils/asyncheapqueue` | Async queue | Priority queue for async operations | NetworkStore |
|
||
| `../utils/timer` | Timer utilities | Periodic timer operations | Maintenance |
|
||
| `../utils/json` | JSON utilities | JSON helper functions | RepoStore coders |
|
||
| `../chunker` | Data chunking | Breaking data into chunks | CacheStore |
|
||
| `../logutils` | Logging utils | Logging macros and utilities | All stores | |