20 KiB
1. Purpose and Scope
The Store Module serves as the core storage abstraction in Nim-Codex, providing a unified interface for storing and retrieving content-addressed blocks and associated metadata.
The primary design goal is to decouple storage operations from the underlying datastore semantics by introducing the BlockStore interface. This interface standardizes methods for storing and retrieving both ephemeral and persistent blocks, ensuring a consistent API across different storage backends.
Additionally, the module integrates a maintenance engine responsible for cleaning up expired ephemeral data according to configured policies.
This module is built on top of the generic DataStore (DS) interface, which is implemented by multiple backends such as SQLite, LevelDB, and the filesystem.
The DS provides a KV-store abstraction (Get, Put, Delete, Query), with backend-dependent guarantees. At a minimum, row-level consistency and basic batching are expected.
It also supports:
- Namespace mounting for isolating backend usage
- Layering backends (e.g., caching in front of persistent stores)
- Flexible stacking and composition of storage proxies
2. Limitations
The current implementation has several shortcomings:
- No dataset-level operations or advanced batching support
- Lack of consistent locking and concurrency control, which may lead to inconsistencies during:
- Crashes
- Long-running operations on block groups (e.g., reference count updates, expiration updates)
3. BlockStore Interface
| Method | Description | Input | Output |
|---|---|---|---|
getBlock(cid: Cid) |
Retrieve block by CID | CID | Future[?!Block] |
getBlock(treeCid: Cid, index: Natural) |
Retrieve block from a Merkle tree by leaf index | Tree CID, index | Future[?!Block] |
getBlock(address: BlockAddress) |
Retrieve block via unified address | BlockAddress | Future[?!Block] |
getBlockAndProof(treeCid: Cid, index: Natural) |
Retrieve block with Merkle proof | Tree CID, index | Future[?!(Block, CodexProof)] |
getCid(treeCid: Cid, index: Natural) |
Retrieve leaf CID from tree metadata | Tree CID, index | Future[?!Cid] |
getCidAndProof(treeCid: Cid, index: Natural) |
Retrieve leaf CID with inclusion proof | Tree CID, index | Future[?!(Cid, CodexProof)] |
putBlock(blk: Block, ttl: Duration) |
Store block with quota enforcement | Block, optional TTL | Future[?!void] |
putCidAndProof(treeCid: Cid, index: Natural, blkCid: Cid, proof: CodexProof) |
Store leaf metadata with ref counting | Tree CID, index, block CID, proof | Future[?!void] |
hasBlock(...) |
Check block existence (CID or tree leaf) | CID / Tree CID + index | Future[?!bool] |
delBlock(...) |
Delete block/tree leaf (with ref count checks) | CID / Tree CID + index | Future[?!void] |
ensureExpiry(...) |
Update expiry for block/tree leaf | CID / Tree CID + index, expiry timestamp | Future[?!void] |
listBlocks(blockType: BlockType) |
Iterate over stored blocks | Block type | Future[?!SafeAsyncIter[Cid]] |
getBlockExpirations(maxNumber, offset) |
Retrieve block expiry metadata | Pagination params | Future[?!SafeAsyncIter[BlockExpiration]] |
blockRefCount(cid: Cid) |
Get block reference count | CID | Future[?!Natural] |
reserve(bytes: NBytes) |
Reserve storage quota | Bytes | Future[?!void] |
release(bytes: NBytes) |
Release reserved quota | Bytes | Future[?!void] |
start() |
Initialize store | — | Future[void] |
stop() |
Gracefully shut down store | — | Future[void] |
close() |
Close underlying datastores | — | Future[void] |
4. Functional Requirements
Available Today
-
Atomic Block Operations
- Store, retrieve, and delete operations must be atomic.
- Support retrieval via:
- Direct CID
- Tree-based addressing (
treeCid + index) - Unified block address
-
Metadata Management
- Store protocol-level metadata (e.g., storage proofs, quota usage).
- Store block-level metadata (e.g., reference counts, total block count).
-
Multi-Datastore Support
- Pluggable datastore interface supporting various backends.
- Typed datastore operations for metadata type safety.
-
Lifecycle & Maintenance
- BlockMaintainer service for removing expired data.
- Configurable maintenance intervals (default: 10 min).
- Batch processing (default: 1000 blocks/cycle).
Future Requirements
-
Transaction Rollback & Error Recovery
- Rollback support for failed multi-step operations.
- Consistent state restoration after failures.
-
Dataset-Level Operations
- Handle Dataset level meta data.
- Batch operations for dataset block groups.
-
Concurrency Control
- Consistent locking and coordination mechanisms to prevent inconsistencies during crashes or long-running operations.
-
Lifecycle & Maintenance
- Cooperative scheduling to avoid blocking.
- State tracking for large datasets.
5. Non-Functional Requirements
Available Today
-
Security
- Verify block content integrity upon retrieval.
- Enforce quotas to prevent disk exhaustion.
- Safe orphaned data cleanup.
-
Scalability
- Configurable storage quotas (default: 20 GiB).
- Pagination for metadata queries.
- Reference counting–based garbage collection.
-
Reliability
- Metrics collection (
codex_repostore_*). - Graceful shutdown with resource cleanup.
- Metrics collection (
Future Requirements
-
Performance
- Batch metadata updates.
- Efficient key lookups with configurable prefix lengths.
- Support for both fast and slower storage tiers.
- Streaming APIs optimized for extremely large datasets.
-
Security
- Finer-grained quota enforcement across tenants/namespaces.
-
Reliability
- Stronger rollback semantics for multi-node consistency.
- Auto-recovery from inconsistent states.
6. Internal Design
Store Implementations
The Store module provides three concrete implementations of the BlockStore interface, each optimized for a specific role in the Nim-Codex architecture: RepoStore, NetworkStore, and CacheStore.
RepoStore
The RepoStore (RS) is a persistent BlockStore implementation that interfaces directly with low-level storage backends, such as hard drives and databases.
It uses two distinct DataStore (DS) backends:
- FileSystem — for storing raw block data
- LevelDB — for storing associated metadata
This separation ensures optimal performance, allowing block data operations to run efficiently while metadata updates benefit from a fast key-value database.
Characteristics:
- Persistent storage via datastore backends
- Quota management with precise usage tracking
- TTL (time-to-live) support with automated expiration
- Metadata storage for block size, reference count, and expiry
- Transaction-like operations implemented through reference counting
Configuration:
quotaMaxBytes: Maximum storage quotablockTtl: Default TTL for stored blockspostFixLen: CID key postfix length for sharding
┌─────────────────────────────────────────────────────────────┐
│ RepoStore │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────────────────┐ │
│ │ repoDs │ │ metaDs │ │
│ │ (Datastore) │ │ (TypedDatastore) │ │
│ │ │ │ │ │
│ │ Block Data: │ │ Metadata: │ │
│ │ - Raw bytes │ │ - BlockMetadata │ │
│ │ - CID-keyed │ │ - LeafMetadata │ │
│ │ │ │ - QuotaUsage │ │
│ │ │ │ - Block counts │ │
│ └─────────────┘ └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
NetworkStore
The NetworkStore is a composite BlockStore that combines local persistence with network-based retrieval for distributed content access.
It follows a local-first strategy — attempting to retrieve or store blocks locally first, and falling back to network retrieval via the Block Exchange Engine if the block is not available locally.
Characteristics:
- Integrates local storage with network retrieval
- Works seamlessly with the block exchange engine for peer-to-peer access
- Transparent block fetching from remote sources
- Local caching of blocks retrieved from the network for future access
┌────────────────────────────────────────────────────────────┐
│ NetworkStore │
├────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌──────────────────────┐ │
│ │ LocalStore - RS │ │ BlockExcEngine │ │
│ │ • Store blocks │ │ • Request blocks │ │
│ │ • Get blocks │ │ • Resolve blocks │ │
│ └─────────────────┘ └──────────────────────┘ │
│ │ │ │
│ └──────────────┬───────────────┘ │
│ │ │
│ ┌─────────────┐ │
│ │BS Interface │ │
│ │ │ │
│ │ • getBlock │ │
│ │ • putBlock │ │
│ │ • hasBlock │ │
│ │ • delBlock │ │
│ └─────────────┘ │
└────────────────────────────────────────────────────────────┘
CacheStore
The CacheStore is an in-memory BlockStore implementation designed for fast access to frequently used blocks.
This store maintains two separate LRU caches:
- Block Cache —
LruCache[Cid, Block]- Stores actual block data indexed by CID
- Acts as the primary cache for block content
- CID/Proof Cache —
LruCache[(Cid, Natural), (Cid, CodexProof)]- Maps
(treeCid, index)to(blockCid, proof) - Supports direct access to block proofs keyed by
treeCidand index
- Maps
Characteristics:
- O(1) access times for cached data
- LRU eviction policy for memory management
- Configurable maximum cache size
- No persistence — cache contents are lost on restart
- No TTL — blocks remain in cache until evicted
Configuration:
cacheSize: Maximum total cache size (bytes)chunkSize: Minimum block size unit
Storage Layout
| Key Pattern | Data Type | Description | Example |
|---|---|---|---|
repo/manifests/{XX}/{full-cid} |
Raw bytes | Manifest block data | repo/manifests/Cd/bafy...Cd → [data] |
repo/blocks/{XX}/{full-cid} |
Raw bytes | Block data | repo/blocks/Ab/bafy...Ab → [data] |
meta/ttl/{cid} |
BlockMetadata | Expiry, size, refCount | meta/ttl/bafy... → {...} |
meta/proof/{treeCid}/{index} |
LeafMetadata | Merkle proof for leaf | meta/proof/bafy.../42 → {...} |
meta/total |
Natural | Total stored blocks | meta/total → 12039 |
meta/quota/used |
NBytes | Used quota | meta/quota/used → 52428800 |
meta/quota/reserved |
NBytes | Reserved quota | meta/quota/reserved → 104857600 |
WorkFlows
Following flow chart summarises how put, get, and delete operations interact with the shared block storage, metadata store, and quota management systems
flowchart TD
A[Block Operations] --> B[putBlock]
A --> C[getBlock]
A --> D[delBlock]
B --> E[Store to RepoDS]
B --> F[Update Metadata]
B --> G[Update Quota]
C --> H[Query RepoDS]
C --> I[Handle Tree Access]
C --> J[Return Block Data]
D --> K[Check Reference Count]
D --> L[Remove from RepoDS]
D --> M[Update Counters]
E --> N[(Block Storage)]
H --> N
L --> N
F --> O[(Metadata Store)]
I --> O
K --> O
G --> P[(Quota Management)]
M --> P
PutBlock
The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates
flowchart TD
A[putBlock: blk, ttl] --> B[Calculate expiry = now + ttl]
B --> C[storeBlock: blk, expiry]
C --> D{Block empty?}
D -->|Yes| E[Return AlreadyInStore]
D -->|No| F[Create metadata & block keys]
F --> G{Block metadata exists?}
G -->|Yes| H{Size matches?}
H -->|Yes| I[Return AlreadyInStore]
H -->|No| J[Return Error]
G -->|No| K[Create new metadata]
K --> L[Store block data]
L --> M{Store successful?}
M -->|No| N[Return Error]
M -->|Yes| O[Update quota usage]
O --> P{Quota update OK?}
P -->|No| Q[Rollback: Delete block]
Q --> R[Return Error]
P -->|Yes| S[Update total blocks count]
S --> T[Trigger onBlockStored callback]
T --> U[Return Success]
GetBlock
The following flow chart explains how a block is retrieved by CID or tree reference, resolving metadata if necessary, and returning the block or an error
flowchart TD
A[getBlock: cid/address] --> B{Input type?}
B -->|BlockAddress with leaf| C[getLeafMetadata: treeCid, index]
B -->|CID| D[Direct CID access]
C --> E{Leaf metadata found?}
E -->|No| F[Return BlockNotFoundError]
E -->|Yes| G[Extract block CID from metadata]
G --> D
D --> H{CID empty?}
H -->|Yes| I[Return empty block]
H -->|No| J[Create prefix key]
J --> K[Query datastore: repoDs.get]
K --> L{Block found?}
L -->|No| M{Error type?}
M -->|DatastoreKeyNotFound| N[Return BlockNotFoundError]
M -->|Other| O[Return Error]
L -->|Yes| P[Create Block with verification]
P --> Q[Return Block]
DelBlock
The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates
flowchart TD
A[delBlock: cid] --> B[delBlockInternal: cid]
B --> C{CID empty?}
C -->|Yes| D[Return Deleted]
C -->|No| E[tryDeleteBlock: cid, now]
E --> F{Metadata exists?}
F -->|No| G[Check if block exists in repo]
G --> H{Block exists?}
H -->|Yes| I[Warn & remove orphaned block]
H -->|No| J[Return NotFound]
I --> J
F -->|Yes| K{refCount = 0 OR expired?}
K -->|No| L[Return InUse]
K -->|Yes| M[Delete block & metadata]
M --> N[Return Deleted]
D --> O[Handle result]
J --> O
L --> O
N --> O
O --> P{Result type?}
P -->|InUse| Q[Return Error: Cannot delete dataset block]
P -->|NotFound| R[Return Success: Ignore]
P -->|Deleted| S[Update total blocks count]
S --> T[Update quota usage]
T --> U[Return Success]
7. Data Models
Stores
RepoStore* = ref object of BlockStore
postFixLen*: int
repoDs*: Datastore
metaDs*: TypedDatastore
clock*: Clock
quotaMaxBytes*: NBytes
quotaUsage*: QuotaUsage
totalBlocks*: Natural
blockTtl*: Duration
started*: bool
NetworkStore* = ref object of BlockStore
engine*: BlockExcEngine
localStore*: BlockStore
CacheStore* = ref object of BlockStore
currentSize*: NBytes
size*: NBytes
cache: LruCache[Cid, Block]
cidAndProofCache: LruCache[(Cid, Natural), (Cid, CodexProof)]
Metadata Types
BlockMetadata* {.serialize.} = object
expiry*: SecondsSince1970
size*: NBytes
refCount*: Natural
LeafMetadata* {.serialize.} = object
blkCid*: Cid
proof*: CodexProof
BlockExpiration* {.serialize.} = object
cid*: Cid
expiry*: SecondsSince1970
QuotaUsage* {.serialize.} = object
used*: NBytes
reserved*: NBytes
8. Dependencies
External Dependencies
| Package | Purpose | Used For | Components Using |
|---|---|---|---|
pkg/chronos |
Async runtime | Async procedures, futures, cancellation handling | All stores (async operations) |
pkg/libp2p |
P2P networking | CID types, multicodec, multihash | BlockStore, NetworkStore, KeyUtils |
pkg/questionable |
Error handling | ?! Result types, optional values |
All components (error propagation) |
pkg/datastore |
Storage abstraction | Key-value storage interface, queries | RepoStore (main storage) |
pkg/datastore/typedds |
Typed datastore | Type-safe serialization/deserialization | RepoStore (metadata storage) |
pkg/lrucache |
Memory caching | LRU cache implementation | CacheStore (block caching) |
pkg/metrics |
Monitoring | Prometheus metrics, gauges, counters | RepoStore (quota tracking) |
pkg/serde/json |
Serialization | JSON encoding/decoding | RepoStore coders |
pkg/stew/byteutils |
Byte utilities | Hex conversion, byte arrays | RepoStore coders |
pkg/stew/endians2 |
Endian conversion | Little/big endian conversion | RepoStore coders |
pkg/chronicles |
Logging | Structured logging | QueryIterHelper |
pkg/upraises |
Exception safety | Exception effect tracking | KeyUtils, TreeHelper |
std/options |
Optional types | Option[T] for nullable values | CacheStore |
std/sugar |
Syntax sugar | Lambda expressions, proc shortcuts | KeyUtils, RepoStore coders |
Internal Dependencies
| Module | Purpose | Used For | Components Using |
|---|---|---|---|
../blocktype |
Block types | Block/Manifest type definitions, codec info | All stores (type checking) |
../blockexchange |
Block exchange | Network block exchange protocols | NetworkStore |
../merkletree |
Merkle trees | Tree structures, proofs, hashing | BlockStore, RepoStore, TreeHelper |
../manifest |
Data manifests | Manifest structures, CID identification | KeyUtils, CacheStore |
../clock |
Time abstraction | Clock interface for time operations | BlockStore, RepoStore |
../systemclock |
System clock | Wall-clock time implementation | RepoStore, Maintenance |
../units |
Units/Types | NBytes, Duration, time units | RepoStore (quota/TTL) |
../errors |
Custom errors | CodexError, BlockNotFoundError | RepoStore |
../namespaces |
Key namespaces | Database key namespace constants | KeyUtils |
../utils |
General utils | Common utility functions | BlockStore, RepoStore |
../utils/asynciter |
Async iterators | Async iteration patterns | TreeHelper, QueryIterHelper |
../utils/safeasynciter |
Safe async iter | Error-safe async iteration | NetworkStore, Maintenance |
../utils/asyncheapqueue |
Async queue | Priority queue for async operations | NetworkStore |
../utils/timer |
Timer utilities | Periodic timer operations | Maintenance |
../utils/json |
JSON utilities | JSON helper functions | RepoStore coders |
../chunker |
Data chunking | Breaking data into chunks | CacheStore |
../logutils |
Logging utils | Logging macros and utilities | All stores |