Store Module Specifications (#14)

* Initial draft of store module specs

* improve requirements section
This commit is contained in:
munna0908 2025-09-17 11:17:04 +05:30 committed by GitHub
parent 0fb75836e7
commit e0835e0b26
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -0,0 +1,483 @@
## 1. Purpose and Scope
The **Store Module** serves as the core storage abstraction in **Nim-Codex**, providing a unified interface for storing and retrieving **content-addressed blocks** and associated metadata.
The primary design goal is to decouple storage operations from the underlying datastore semantics by introducing the `BlockStore` interface. This interface standardizes methods for storing and retrieving both **ephemeral** and **persistent** blocks, ensuring a consistent API across different storage backends.
Additionally, the module integrates a **maintenance engine** responsible for cleaning up expired ephemeral data according to configured policies.
This module is built on top of the generic [`DataStore (DS)` interface](https://github.com/codex-storage/nim-datastore/blob/master/datastore/datastore.nim), which is implemented by multiple backends such as SQLite, LevelDB, and the filesystem.
The DS provides a **KV-store** abstraction (`Get`, `Put`, `Delete`, `Query`), with backend-dependent guarantees. At a minimum, row-level consistency and basic batching are expected.
It also supports:
- **Namespace mounting** for isolating backend usage
- **Layering backends** (e.g., caching in front of persistent stores)
- Flexible stacking and composition of storage proxies
---
## 2. Limitations
The current implementation has several shortcomings:
- **No dataset-level operations** or advanced batching support
- **Lack of consistent locking and concurrency control**, which may lead to inconsistencies during:
- Crashes
- Long-running operations on block groups (e.g., reference count updates, expiration updates)
---
## 3. `BlockStore` Interface
| Method | Description | Input | Output |
| --- | --- | --- | --- |
| `getBlock(cid: Cid)` | Retrieve block by CID | CID | `Future[?!Block]` |
| `getBlock(treeCid: Cid, index: Natural)` | Retrieve block from a Merkle tree by leaf index | Tree CID, index | `Future[?!Block]` |
| `getBlock(address: BlockAddress)` | Retrieve block via unified address | BlockAddress | `Future[?!Block]` |
| `getBlockAndProof(treeCid: Cid, index: Natural)` | Retrieve block with Merkle proof | Tree CID, index | `Future[?!(Block, CodexProof)]` |
| `getCid(treeCid: Cid, index: Natural)` | Retrieve leaf CID from tree metadata | Tree CID, index | `Future[?!Cid]` |
| `getCidAndProof(treeCid: Cid, index: Natural)` | Retrieve leaf CID with inclusion proof | Tree CID, index | `Future[?!(Cid, CodexProof)]` |
| `putBlock(blk: Block, ttl: Duration)` | Store block with quota enforcement | Block, optional TTL | `Future[?!void]` |
| `putCidAndProof(treeCid: Cid, index: Natural, blkCid: Cid, proof: CodexProof)` | Store leaf metadata with ref counting | Tree CID, index, block CID, proof | `Future[?!void]` |
| `hasBlock(...)` | Check block existence (CID or tree leaf) | CID / Tree CID + index | `Future[?!bool]` |
| `delBlock(...)` | Delete block/tree leaf (with ref count checks) | CID / Tree CID + index | `Future[?!void]` |
| `ensureExpiry(...)` | Update expiry for block/tree leaf | CID / Tree CID + index, expiry timestamp | `Future[?!void]` |
| `listBlocks(blockType: BlockType)` | Iterate over stored blocks | Block type | `Future[?!SafeAsyncIter[Cid]]` |
| `getBlockExpirations(maxNumber, offset)` | Retrieve block expiry metadata | Pagination params | `Future[?!SafeAsyncIter[BlockExpiration]]` |
| `blockRefCount(cid: Cid)` | Get block reference count | CID | `Future[?!Natural]` |
| `reserve(bytes: NBytes)` | Reserve storage quota | Bytes | `Future[?!void]` |
| `release(bytes: NBytes)` | Release reserved quota | Bytes | `Future[?!void]` |
| `start()` | Initialize store | — | `Future[void]` |
| `stop()` | Gracefully shut down store | — | `Future[void]` |
| `close()` | Close underlying datastores | — | `Future[void]` |
---
## 4. Functional Requirements
### Available Today
- **Atomic Block Operations**
- Store, retrieve, and delete operations must be atomic.
- Support retrieval via:
- Direct CID
- Tree-based addressing (`treeCid + index`)
- Unified block address
- **Metadata Management**
- Store protocol-level metadata (e.g., storage proofs, quota usage).
- Store block-level metadata (e.g., reference counts, total block count).
- **Multi-Datastore Support**
- Pluggable datastore interface supporting various backends.
- Typed datastore operations for metadata type safety.
- **Lifecycle & Maintenance**
- BlockMaintainer service for removing expired data.
- Configurable maintenance intervals (default: 10 min).
- Batch processing (default: 1000 blocks/cycle).
### Future Requirements
- **Transaction Rollback & Error Recovery**
- Rollback support for failed multi-step operations.
- Consistent state restoration after failures.
- **Dataset-Level Operations**
- Handle Dataset level meta data.
- Batch operations for dataset block groups.
- **Concurrency Control**
- Consistent locking and coordination mechanisms to prevent inconsistencies during crashes or long-running operations.
- **Lifecycle & Maintenance**
- Cooperative scheduling to avoid blocking.
- State tracking for large datasets.
---
## 5. Non-Functional Requirements
### Available Today
- **Security**
- Verify block content integrity upon retrieval.
- Enforce quotas to prevent disk exhaustion.
- Safe orphaned data cleanup.
- **Scalability**
- Configurable storage quotas (default: 20 GiB).
- Pagination for metadata queries.
- Reference countingbased garbage collection.
- **Reliability**
- Metrics collection (`codex_repostore_*`).
- Graceful shutdown with resource cleanup.
### Future Requirements
- **Performance**
- Batch metadata updates.
- Efficient key lookups with configurable prefix lengths.
- Support for both fast and slower storage tiers.
- Streaming APIs optimized for extremely large datasets.
- **Security**
- Finer-grained quota enforcement across tenants/namespaces.
- **Reliability**
- Stronger rollback semantics for multi-node consistency.
- Auto-recovery from inconsistent states.
---
## 6. Internal Design
### Store Implementations
The Store module provides **three concrete implementations** of the `BlockStore` interface, each optimized for a specific role in the Nim-Codex architecture: **RepoStore**, **NetworkStore**, and **CacheStore**.
#### RepoStore
The **RepoStore (RS)** is a persistent `BlockStore` implementation that interfaces directly with low-level storage backends, such as hard drives and databases.
It uses two distinct `DataStore (DS)` backends:
- **FileSystem** — for storing raw block data
- **LevelDB** — for storing associated metadata
This separation ensures optimal performance, allowing block data operations to run efficiently while metadata updates benefit from a fast key-value database.
**Characteristics**:
- Persistent storage via datastore backends
- Quota management with precise usage tracking
- TTL (time-to-live) support with automated expiration
- Metadata storage for block size, reference count, and expiry
- Transaction-like operations implemented through reference counting
**Configuration**:
- `quotaMaxBytes`: Maximum storage quota
- `blockTtl`: Default TTL for stored blocks
- `postFixLen`: CID key postfix length for sharding
```python
┌─────────────────────────────────────────────────────────────┐
│ RepoStore │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────────────────┐ │
│ │ repoDs │ │ metaDs │ │
│ │ (Datastore) │ │ (TypedDatastore) │ │
│ │ │ │ │ │
│ │ Block Data: │ │ Metadata: │ │
│ │ - Raw bytes │ │ - BlockMetadata │ │
│ │ - CID-keyed │ │ - LeafMetadata │ │
│ │ │ │ - QuotaUsage │ │
│ │ │ │ - Block counts │ │
│ └─────────────┘ └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
---
#### NetworkStore
The **NetworkStore** is a composite `BlockStore` that combines local persistence with network-based retrieval for distributed content access.
It follows a **local-first** strategy — attempting to retrieve or store blocks locally first, and falling back to network retrieval via the Block Exchange Engine if the block is not available locally.
**Characteristics**:
- Integrates local storage with network retrieval
- Works seamlessly with the block exchange engine for peer-to-peer access
- Transparent block fetching from remote sources
- Local caching of blocks retrieved from the network for future access
```python
┌────────────────────────────────────────────────────────────┐
│ NetworkStore │
├────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ LocalStore - RS │ │ BlockExcEngine │ │
│ │ • Store blocks │ │ • Request blocks │ │
│ │ • Get blocks │ │ • Resolve blocks │ │
│ └─────────────────┘ └──────────────────────┘ │
│ │ │ │
│ └──────────────┬───────────────┘ │
│ │ │
│ ┌─────────────┐ │
│ │BS Interface │ │
│ │ │ │
│ │ • getBlock │ │
│ │ • putBlock │ │
│ │ • hasBlock │ │
│ │ • delBlock │ │
│ └─────────────┘ │
└────────────────────────────────────────────────────────────┘
```
---
#### CacheStore
The **CacheStore** is an in-memory `BlockStore` implementation designed for fast access to frequently used blocks.
This store maintains **two separate LRU caches**:
1. **Block Cache**`LruCache[Cid, Block]`
- Stores actual block data indexed by CID
- Acts as the primary cache for block content
2. **CID/Proof Cache**`LruCache[(Cid, Natural), (Cid, CodexProof)]`
- Maps `(treeCid, index)` to `(blockCid, proof)`
- Supports direct access to block proofs keyed by `treeCid` and index
**Characteristics**:
- O(1) access times for cached data
- LRU eviction policy for memory management
- Configurable maximum cache size
- No persistence — cache contents are lost on restart
- No TTL — blocks remain in cache until evicted
**Configuration**:
- `cacheSize`: Maximum total cache size (bytes)
- `chunkSize`: Minimum block size unit
---
### Storage Layout
| Key Pattern | Data Type | Description | Example |
| --- | --- | --- | --- |
| `repo/manifests/{XX}/{full-cid}` | Raw bytes | Manifest block data | `repo/manifests/Cd/bafy...Cd → [data]` |
| `repo/blocks/{XX}/{full-cid}` | Raw bytes | Block data | `repo/blocks/Ab/bafy...Ab → [data]` |
| `meta/ttl/{cid}` | BlockMetadata | Expiry, size, refCount | `meta/ttl/bafy... → {...}` |
| `meta/proof/{treeCid}/{index}` | LeafMetadata | Merkle proof for leaf | `meta/proof/bafy.../42 → {...}` |
| `meta/total` | Natural | Total stored blocks | `meta/total → 12039` |
| `meta/quota/used` | NBytes | Used quota | `meta/quota/used → 52428800` |
| `meta/quota/reserved` | NBytes | Reserved quota | `meta/quota/reserved → 104857600` |
---
### WorkFlows
Following flow chart summarises how put, get, and delete operations interact with the shared block storage, metadata store, and quota management systems
```mermaid
flowchart TD
A[Block Operations] --> B[putBlock]
A --> C[getBlock]
A --> D[delBlock]
B --> E[Store to RepoDS]
B --> F[Update Metadata]
B --> G[Update Quota]
C --> H[Query RepoDS]
C --> I[Handle Tree Access]
C --> J[Return Block Data]
D --> K[Check Reference Count]
D --> L[Remove from RepoDS]
D --> M[Update Counters]
E --> N[(Block Storage)]
H --> N
L --> N
F --> O[(Metadata Store)]
I --> O
K --> O
G --> P[(Quota Management)]
M --> P
```
#### PutBlock
The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates
```mermaid
flowchart TD
A[putBlock: blk, ttl] --> B[Calculate expiry = now + ttl]
B --> C[storeBlock: blk, expiry]
C --> D{Block empty?}
D -->|Yes| E[Return AlreadyInStore]
D -->|No| F[Create metadata & block keys]
F --> G{Block metadata exists?}
G -->|Yes| H{Size matches?}
H -->|Yes| I[Return AlreadyInStore]
H -->|No| J[Return Error]
G -->|No| K[Create new metadata]
K --> L[Store block data]
L --> M{Store successful?}
M -->|No| N[Return Error]
M -->|Yes| O[Update quota usage]
O --> P{Quota update OK?}
P -->|No| Q[Rollback: Delete block]
Q --> R[Return Error]
P -->|Yes| S[Update total blocks count]
S --> T[Trigger onBlockStored callback]
T --> U[Return Success]
```
---
#### GetBlock
The following flow chart explains how a block is retrieved by CID or tree reference, resolving metadata if necessary, and returning the block or an error
```mermaid
flowchart TD
A[getBlock: cid/address] --> B{Input type?}
B -->|BlockAddress with leaf| C[getLeafMetadata: treeCid, index]
B -->|CID| D[Direct CID access]
C --> E{Leaf metadata found?}
E -->|No| F[Return BlockNotFoundError]
E -->|Yes| G[Extract block CID from metadata]
G --> D
D --> H{CID empty?}
H -->|Yes| I[Return empty block]
H -->|No| J[Create prefix key]
J --> K[Query datastore: repoDs.get]
K --> L{Block found?}
L -->|No| M{Error type?}
M -->|DatastoreKeyNotFound| N[Return BlockNotFoundError]
M -->|Other| O[Return Error]
L -->|Yes| P[Create Block with verification]
P --> Q[Return Block]
```
---
#### DelBlock
The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates
```mermaid
flowchart TD
A[delBlock: cid] --> B[delBlockInternal: cid]
B --> C{CID empty?}
C -->|Yes| D[Return Deleted]
C -->|No| E[tryDeleteBlock: cid, now]
E --> F{Metadata exists?}
F -->|No| G[Check if block exists in repo]
G --> H{Block exists?}
H -->|Yes| I[Warn & remove orphaned block]
H -->|No| J[Return NotFound]
I --> J
F -->|Yes| K{refCount = 0 OR expired?}
K -->|No| L[Return InUse]
K -->|Yes| M[Delete block & metadata]
M --> N[Return Deleted]
D --> O[Handle result]
J --> O
L --> O
N --> O
O --> P{Result type?}
P -->|InUse| Q[Return Error: Cannot delete dataset block]
P -->|NotFound| R[Return Success: Ignore]
P -->|Deleted| S[Update total blocks count]
S --> T[Update quota usage]
T --> U[Return Success]
```
---
## 7. Data Models
### Stores
```
RepoStore* = ref object of BlockStore
postFixLen*: int
repoDs*: Datastore
metaDs*: TypedDatastore
clock*: Clock
quotaMaxBytes*: NBytes
quotaUsage*: QuotaUsage
totalBlocks*: Natural
blockTtl*: Duration
started*: bool
NetworkStore* = ref object of BlockStore
engine*: BlockExcEngine
localStore*: BlockStore
CacheStore* = ref object of BlockStore
currentSize*: NBytes
size*: NBytes
cache: LruCache[Cid, Block]
cidAndProofCache: LruCache[(Cid, Natural), (Cid, CodexProof)]
```
### Metadata Types
```
BlockMetadata* {.serialize.} = object
expiry*: SecondsSince1970
size*: NBytes
refCount*: Natural
LeafMetadata* {.serialize.} = object
blkCid*: Cid
proof*: CodexProof
BlockExpiration* {.serialize.} = object
cid*: Cid
expiry*: SecondsSince1970
QuotaUsage* {.serialize.} = object
used*: NBytes
reserved*: NBytes
```
---
## 8. Dependencies
### External Dependencies
| Package | Purpose | Used For | Components Using |
| --- | --- | --- | --- |
| `pkg/chronos` | Async runtime | Async procedures, futures, cancellation handling | All stores (async operations) |
| `pkg/libp2p` | P2P networking | CID types, multicodec, multihash | BlockStore, NetworkStore, KeyUtils |
| `pkg/questionable` | Error handling | `?!` Result types, optional values | All components (error propagation) |
| `pkg/datastore` | Storage abstraction | Key-value storage interface, queries | RepoStore (main storage) |
| `pkg/datastore/typedds` | Typed datastore | Type-safe serialization/deserialization | RepoStore (metadata storage) |
| `pkg/lrucache` | Memory caching | LRU cache implementation | CacheStore (block caching) |
| `pkg/metrics` | Monitoring | Prometheus metrics, gauges, counters | RepoStore (quota tracking) |
| `pkg/serde/json` | Serialization | JSON encoding/decoding | RepoStore coders |
| `pkg/stew/byteutils` | Byte utilities | Hex conversion, byte arrays | RepoStore coders |
| `pkg/stew/endians2` | Endian conversion | Little/big endian conversion | RepoStore coders |
| `pkg/chronicles` | Logging | Structured logging | QueryIterHelper |
| `pkg/upraises` | Exception safety | Exception effect tracking | KeyUtils, TreeHelper |
| `std/options` | Optional types | Option[T] for nullable values | CacheStore |
| `std/sugar` | Syntax sugar | Lambda expressions, proc shortcuts | KeyUtils, RepoStore coders |
### Internal Dependencies
| Module | Purpose | Used For | Components Using |
| --- | --- | --- | --- |
| `../blocktype` | Block types | Block/Manifest type definitions, codec info | All stores (type checking) |
| `../blockexchange` | Block exchange | Network block exchange protocols | NetworkStore |
| `../merkletree` | Merkle trees | Tree structures, proofs, hashing | BlockStore, RepoStore, TreeHelper |
| `../manifest` | Data manifests | Manifest structures, CID identification | KeyUtils, CacheStore |
| `../clock` | Time abstraction | Clock interface for time operations | BlockStore, RepoStore |
| `../systemclock` | System clock | Wall-clock time implementation | RepoStore, Maintenance |
| `../units` | Units/Types | NBytes, Duration, time units | RepoStore (quota/TTL) |
| `../errors` | Custom errors | CodexError, BlockNotFoundError | RepoStore |
| `../namespaces` | Key namespaces | Database key namespace constants | KeyUtils |
| `../utils` | General utils | Common utility functions | BlockStore, RepoStore |
| `../utils/asynciter` | Async iterators | Async iteration patterns | TreeHelper, QueryIterHelper |
| `../utils/safeasynciter` | Safe async iter | Error-safe async iteration | NetworkStore, Maintenance |
| `../utils/asyncheapqueue` | Async queue | Priority queue for async operations | NetworkStore |
| `../utils/timer` | Timer utilities | Periodic timer operations | Maintenance |
| `../utils/json` | JSON utilities | JSON helper functions | RepoStore coders |
| `../chunker` | Data chunking | Breaking data into chunks | CacheStore |
| `../logutils` | Logging utils | Logging macros and utilities | All stores |