From e0835e0b26966a89793beb2492069839194af219 Mon Sep 17 00:00:00 2001 From: munna0908 <88337208+munna0908@users.noreply.github.com> Date: Wed, 17 Sep 2025 11:17:04 +0530 Subject: [PATCH] Store Module Specifications (#14) * Initial draft of store module specs * improve requirements section --- .../Specs/Component Specification - Store.md | 483 ++++++++++++++++++ 1 file changed, 483 insertions(+) create mode 100644 10 Notes/Specs/Component Specification - Store.md diff --git a/10 Notes/Specs/Component Specification - Store.md b/10 Notes/Specs/Component Specification - Store.md new file mode 100644 index 0000000..6dc01b2 --- /dev/null +++ b/10 Notes/Specs/Component Specification - Store.md @@ -0,0 +1,483 @@ +## 1. Purpose and Scope + +The **Store Module** serves as the core storage abstraction in **Nim-Codex**, providing a unified interface for storing and retrieving **content-addressed blocks** and associated metadata. + +The primary design goal is to decouple storage operations from the underlying datastore semantics by introducing the `BlockStore` interface. This interface standardizes methods for storing and retrieving both **ephemeral** and **persistent** blocks, ensuring a consistent API across different storage backends. + +Additionally, the module integrates a **maintenance engine** responsible for cleaning up expired ephemeral data according to configured policies. + +This module is built on top of the generic [`DataStore (DS)` interface](https://github.com/codex-storage/nim-datastore/blob/master/datastore/datastore.nim), which is implemented by multiple backends such as SQLite, LevelDB, and the filesystem. + +The DS provides a **KV-store** abstraction (`Get`, `Put`, `Delete`, `Query`), with backend-dependent guarantees. At a minimum, row-level consistency and basic batching are expected. + +It also supports: + +- **Namespace mounting** for isolating backend usage +- **Layering backends** (e.g., caching in front of persistent stores) +- Flexible stacking and composition of storage proxies + +--- + +## 2. Limitations + +The current implementation has several shortcomings: + +- **No dataset-level operations** or advanced batching support +- **Lack of consistent locking and concurrency control**, which may lead to inconsistencies during: + - Crashes + - Long-running operations on block groups (e.g., reference count updates, expiration updates) + +--- + +## 3. `BlockStore` Interface + +| Method | Description | Input | Output | +| --- | --- | --- | --- | +| `getBlock(cid: Cid)` | Retrieve block by CID | CID | `Future[?!Block]` | +| `getBlock(treeCid: Cid, index: Natural)` | Retrieve block from a Merkle tree by leaf index | Tree CID, index | `Future[?!Block]` | +| `getBlock(address: BlockAddress)` | Retrieve block via unified address | BlockAddress | `Future[?!Block]` | +| `getBlockAndProof(treeCid: Cid, index: Natural)` | Retrieve block with Merkle proof | Tree CID, index | `Future[?!(Block, CodexProof)]` | +| `getCid(treeCid: Cid, index: Natural)` | Retrieve leaf CID from tree metadata | Tree CID, index | `Future[?!Cid]` | +| `getCidAndProof(treeCid: Cid, index: Natural)` | Retrieve leaf CID with inclusion proof | Tree CID, index | `Future[?!(Cid, CodexProof)]` | +| `putBlock(blk: Block, ttl: Duration)` | Store block with quota enforcement | Block, optional TTL | `Future[?!void]` | +| `putCidAndProof(treeCid: Cid, index: Natural, blkCid: Cid, proof: CodexProof)` | Store leaf metadata with ref counting | Tree CID, index, block CID, proof | `Future[?!void]` | +| `hasBlock(...)` | Check block existence (CID or tree leaf) | CID / Tree CID + index | `Future[?!bool]` | +| `delBlock(...)` | Delete block/tree leaf (with ref count checks) | CID / Tree CID + index | `Future[?!void]` | +| `ensureExpiry(...)` | Update expiry for block/tree leaf | CID / Tree CID + index, expiry timestamp | `Future[?!void]` | +| `listBlocks(blockType: BlockType)` | Iterate over stored blocks | Block type | `Future[?!SafeAsyncIter[Cid]]` | +| `getBlockExpirations(maxNumber, offset)` | Retrieve block expiry metadata | Pagination params | `Future[?!SafeAsyncIter[BlockExpiration]]` | +| `blockRefCount(cid: Cid)` | Get block reference count | CID | `Future[?!Natural]` | +| `reserve(bytes: NBytes)` | Reserve storage quota | Bytes | `Future[?!void]` | +| `release(bytes: NBytes)` | Release reserved quota | Bytes | `Future[?!void]` | +| `start()` | Initialize store | — | `Future[void]` | +| `stop()` | Gracefully shut down store | — | `Future[void]` | +| `close()` | Close underlying datastores | — | `Future[void]` | + +--- + +## 4. Functional Requirements + +### Available Today + +- **Atomic Block Operations** + - Store, retrieve, and delete operations must be atomic. + - Support retrieval via: + - Direct CID + - Tree-based addressing (`treeCid + index`) + - Unified block address + +- **Metadata Management** + - Store protocol-level metadata (e.g., storage proofs, quota usage). + - Store block-level metadata (e.g., reference counts, total block count). + +- **Multi-Datastore Support** + - Pluggable datastore interface supporting various backends. + - Typed datastore operations for metadata type safety. + +- **Lifecycle & Maintenance** + - BlockMaintainer service for removing expired data. + - Configurable maintenance intervals (default: 10 min). + - Batch processing (default: 1000 blocks/cycle). + +### Future Requirements + +- **Transaction Rollback & Error Recovery** + - Rollback support for failed multi-step operations. + - Consistent state restoration after failures. + +- **Dataset-Level Operations** + - Handle Dataset level meta data. + - Batch operations for dataset block groups. + +- **Concurrency Control** + - Consistent locking and coordination mechanisms to prevent inconsistencies during crashes or long-running operations. + +- **Lifecycle & Maintenance** + - Cooperative scheduling to avoid blocking. + - State tracking for large datasets. + +--- + +## 5. Non-Functional Requirements + +### Available Today + +- **Security** + - Verify block content integrity upon retrieval. + - Enforce quotas to prevent disk exhaustion. + - Safe orphaned data cleanup. + +- **Scalability** + - Configurable storage quotas (default: 20 GiB). + - Pagination for metadata queries. + - Reference counting–based garbage collection. + +- **Reliability** + - Metrics collection (`codex_repostore_*`). + - Graceful shutdown with resource cleanup. + +### Future Requirements + +- **Performance** + - Batch metadata updates. + - Efficient key lookups with configurable prefix lengths. + - Support for both fast and slower storage tiers. + - Streaming APIs optimized for extremely large datasets. + +- **Security** + - Finer-grained quota enforcement across tenants/namespaces. + +- **Reliability** + - Stronger rollback semantics for multi-node consistency. + - Auto-recovery from inconsistent states. + +--- + +## 6. Internal Design + +### Store Implementations + +The Store module provides **three concrete implementations** of the `BlockStore` interface, each optimized for a specific role in the Nim-Codex architecture: **RepoStore**, **NetworkStore**, and **CacheStore**. + +#### RepoStore + +The **RepoStore (RS)** is a persistent `BlockStore` implementation that interfaces directly with low-level storage backends, such as hard drives and databases. + +It uses two distinct `DataStore (DS)` backends: + +- **FileSystem** — for storing raw block data +- **LevelDB** — for storing associated metadata + +This separation ensures optimal performance, allowing block data operations to run efficiently while metadata updates benefit from a fast key-value database. + +**Characteristics**: + +- Persistent storage via datastore backends +- Quota management with precise usage tracking +- TTL (time-to-live) support with automated expiration +- Metadata storage for block size, reference count, and expiry +- Transaction-like operations implemented through reference counting + +**Configuration**: + +- `quotaMaxBytes`: Maximum storage quota +- `blockTtl`: Default TTL for stored blocks +- `postFixLen`: CID key postfix length for sharding + +```python +┌─────────────────────────────────────────────────────────────┐ +│ RepoStore │ +├─────────────────────────────────────────────────────────────┤ +│ ┌─────────────┐ ┌──────────────────────────┐ │ +│ │ repoDs │ │ metaDs │ │ +│ │ (Datastore) │ │ (TypedDatastore) │ │ +│ │ │ │ │ │ +│ │ Block Data: │ │ Metadata: │ │ +│ │ - Raw bytes │ │ - BlockMetadata │ │ +│ │ - CID-keyed │ │ - LeafMetadata │ │ +│ │ │ │ - QuotaUsage │ │ +│ │ │ │ - Block counts │ │ +│ └─────────────┘ └──────────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ + +``` + +--- + +#### NetworkStore + +The **NetworkStore** is a composite `BlockStore` that combines local persistence with network-based retrieval for distributed content access. + +It follows a **local-first** strategy — attempting to retrieve or store blocks locally first, and falling back to network retrieval via the Block Exchange Engine if the block is not available locally. + +**Characteristics**: + +- Integrates local storage with network retrieval +- Works seamlessly with the block exchange engine for peer-to-peer access +- Transparent block fetching from remote sources +- Local caching of blocks retrieved from the network for future access + +```python +┌────────────────────────────────────────────────────────────┐ +│ NetworkStore │ +├────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────┐ ┌──────────────────────┐ │ +│ │ LocalStore - RS │ │ BlockExcEngine │ │ +│ │ • Store blocks │ │ • Request blocks │ │ +│ │ • Get blocks │ │ • Resolve blocks │ │ +│ └─────────────────┘ └──────────────────────┘ │ +│ │ │ │ +│ └──────────────┬───────────────┘ │ +│ │ │ +│ ┌─────────────┐ │ +│ │BS Interface │ │ +│ │ │ │ +│ │ • getBlock │ │ +│ │ • putBlock │ │ +│ │ • hasBlock │ │ +│ │ • delBlock │ │ +│ └─────────────┘ │ +└────────────────────────────────────────────────────────────┘ + +``` + +--- + +#### CacheStore + +The **CacheStore** is an in-memory `BlockStore` implementation designed for fast access to frequently used blocks. + +This store maintains **two separate LRU caches**: + +1. **Block Cache** — `LruCache[Cid, Block]` + - Stores actual block data indexed by CID + - Acts as the primary cache for block content +2. **CID/Proof Cache** — `LruCache[(Cid, Natural), (Cid, CodexProof)]` + - Maps `(treeCid, index)` to `(blockCid, proof)` + - Supports direct access to block proofs keyed by `treeCid` and index + +**Characteristics**: + +- O(1) access times for cached data +- LRU eviction policy for memory management +- Configurable maximum cache size +- No persistence — cache contents are lost on restart +- No TTL — blocks remain in cache until evicted + +**Configuration**: + +- `cacheSize`: Maximum total cache size (bytes) +- `chunkSize`: Minimum block size unit + +--- + +### Storage Layout + +| Key Pattern | Data Type | Description | Example | +| --- | --- | --- | --- | +| `repo/manifests/{XX}/{full-cid}` | Raw bytes | Manifest block data | `repo/manifests/Cd/bafy...Cd → [data]` | +| `repo/blocks/{XX}/{full-cid}` | Raw bytes | Block data | `repo/blocks/Ab/bafy...Ab → [data]` | +| `meta/ttl/{cid}` | BlockMetadata | Expiry, size, refCount | `meta/ttl/bafy... → {...}` | +| `meta/proof/{treeCid}/{index}` | LeafMetadata | Merkle proof for leaf | `meta/proof/bafy.../42 → {...}` | +| `meta/total` | Natural | Total stored blocks | `meta/total → 12039` | +| `meta/quota/used` | NBytes | Used quota | `meta/quota/used → 52428800` | +| `meta/quota/reserved` | NBytes | Reserved quota | `meta/quota/reserved → 104857600` | + +--- + +### WorkFlows +Following flow chart summarises how put, get, and delete operations interact with the shared block storage, metadata store, and quota management systems + +```mermaid + flowchart TD + A[Block Operations] --> B[putBlock] + A --> C[getBlock] + A --> D[delBlock] + + B --> E[Store to RepoDS] + B --> F[Update Metadata] + B --> G[Update Quota] + + C --> H[Query RepoDS] + C --> I[Handle Tree Access] + C --> J[Return Block Data] + + D --> K[Check Reference Count] + D --> L[Remove from RepoDS] + D --> M[Update Counters] + + E --> N[(Block Storage)] + H --> N + L --> N + + F --> O[(Metadata Store)] + I --> O + K --> O + + G --> P[(Quota Management)] + M --> P +``` + +#### PutBlock +The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates + +```mermaid + flowchart TD + A[putBlock: blk, ttl] --> B[Calculate expiry = now + ttl] + B --> C[storeBlock: blk, expiry] + C --> D{Block empty?} + D -->|Yes| E[Return AlreadyInStore] + D -->|No| F[Create metadata & block keys] + F --> G{Block metadata exists?} + G -->|Yes| H{Size matches?} + H -->|Yes| I[Return AlreadyInStore] + H -->|No| J[Return Error] + G -->|No| K[Create new metadata] + K --> L[Store block data] + L --> M{Store successful?} + M -->|No| N[Return Error] + M -->|Yes| O[Update quota usage] + O --> P{Quota update OK?} + P -->|No| Q[Rollback: Delete block] + Q --> R[Return Error] + P -->|Yes| S[Update total blocks count] + S --> T[Trigger onBlockStored callback] + T --> U[Return Success] +``` + +--- + +#### GetBlock +The following flow chart explains how a block is retrieved by CID or tree reference, resolving metadata if necessary, and returning the block or an error + +```mermaid +flowchart TD + A[getBlock: cid/address] --> B{Input type?} + B -->|BlockAddress with leaf| C[getLeafMetadata: treeCid, index] + B -->|CID| D[Direct CID access] + C --> E{Leaf metadata found?} + E -->|No| F[Return BlockNotFoundError] + E -->|Yes| G[Extract block CID from metadata] + G --> D + D --> H{CID empty?} + H -->|Yes| I[Return empty block] + H -->|No| J[Create prefix key] + J --> K[Query datastore: repoDs.get] + K --> L{Block found?} + L -->|No| M{Error type?} + M -->|DatastoreKeyNotFound| N[Return BlockNotFoundError] + M -->|Other| O[Return Error] + L -->|Yes| P[Create Block with verification] + P --> Q[Return Block] +``` + +--- + +#### DelBlock +The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates + +```mermaid + flowchart TD + A[delBlock: cid] --> B[delBlockInternal: cid] + B --> C{CID empty?} + C -->|Yes| D[Return Deleted] + C -->|No| E[tryDeleteBlock: cid, now] + E --> F{Metadata exists?} + F -->|No| G[Check if block exists in repo] + G --> H{Block exists?} + H -->|Yes| I[Warn & remove orphaned block] + H -->|No| J[Return NotFound] + I --> J + F -->|Yes| K{refCount = 0 OR expired?} + K -->|No| L[Return InUse] + K -->|Yes| M[Delete block & metadata] + M --> N[Return Deleted] + D --> O[Handle result] + J --> O + L --> O + N --> O + O --> P{Result type?} + P -->|InUse| Q[Return Error: Cannot delete dataset block] + P -->|NotFound| R[Return Success: Ignore] + P -->|Deleted| S[Update total blocks count] + S --> T[Update quota usage] + T --> U[Return Success] +``` + +--- + +## 7. Data Models + +### Stores + +``` +RepoStore* = ref object of BlockStore + postFixLen*: int + repoDs*: Datastore + metaDs*: TypedDatastore + clock*: Clock + quotaMaxBytes*: NBytes + quotaUsage*: QuotaUsage + totalBlocks*: Natural + blockTtl*: Duration + started*: bool + +NetworkStore* = ref object of BlockStore + engine*: BlockExcEngine + localStore*: BlockStore + +CacheStore* = ref object of BlockStore + currentSize*: NBytes + size*: NBytes + cache: LruCache[Cid, Block] + cidAndProofCache: LruCache[(Cid, Natural), (Cid, CodexProof)] + +``` + +### Metadata Types + +``` +BlockMetadata* {.serialize.} = object + expiry*: SecondsSince1970 + size*: NBytes + refCount*: Natural + +LeafMetadata* {.serialize.} = object + blkCid*: Cid + proof*: CodexProof + +BlockExpiration* {.serialize.} = object + cid*: Cid + expiry*: SecondsSince1970 + +QuotaUsage* {.serialize.} = object + used*: NBytes + reserved*: NBytes + +``` + +--- + +## 8. Dependencies + +### External Dependencies + +| Package | Purpose | Used For | Components Using | +| --- | --- | --- | --- | +| `pkg/chronos` | Async runtime | Async procedures, futures, cancellation handling | All stores (async operations) | +| `pkg/libp2p` | P2P networking | CID types, multicodec, multihash | BlockStore, NetworkStore, KeyUtils | +| `pkg/questionable` | Error handling | `?!` Result types, optional values | All components (error propagation) | +| `pkg/datastore` | Storage abstraction | Key-value storage interface, queries | RepoStore (main storage) | +| `pkg/datastore/typedds` | Typed datastore | Type-safe serialization/deserialization | RepoStore (metadata storage) | +| `pkg/lrucache` | Memory caching | LRU cache implementation | CacheStore (block caching) | +| `pkg/metrics` | Monitoring | Prometheus metrics, gauges, counters | RepoStore (quota tracking) | +| `pkg/serde/json` | Serialization | JSON encoding/decoding | RepoStore coders | +| `pkg/stew/byteutils` | Byte utilities | Hex conversion, byte arrays | RepoStore coders | +| `pkg/stew/endians2` | Endian conversion | Little/big endian conversion | RepoStore coders | +| `pkg/chronicles` | Logging | Structured logging | QueryIterHelper | +| `pkg/upraises` | Exception safety | Exception effect tracking | KeyUtils, TreeHelper | +| `std/options` | Optional types | Option[T] for nullable values | CacheStore | +| `std/sugar` | Syntax sugar | Lambda expressions, proc shortcuts | KeyUtils, RepoStore coders | + +### Internal Dependencies + +| Module | Purpose | Used For | Components Using | +| --- | --- | --- | --- | +| `../blocktype` | Block types | Block/Manifest type definitions, codec info | All stores (type checking) | +| `../blockexchange` | Block exchange | Network block exchange protocols | NetworkStore | +| `../merkletree` | Merkle trees | Tree structures, proofs, hashing | BlockStore, RepoStore, TreeHelper | +| `../manifest` | Data manifests | Manifest structures, CID identification | KeyUtils, CacheStore | +| `../clock` | Time abstraction | Clock interface for time operations | BlockStore, RepoStore | +| `../systemclock` | System clock | Wall-clock time implementation | RepoStore, Maintenance | +| `../units` | Units/Types | NBytes, Duration, time units | RepoStore (quota/TTL) | +| `../errors` | Custom errors | CodexError, BlockNotFoundError | RepoStore | +| `../namespaces` | Key namespaces | Database key namespace constants | KeyUtils | +| `../utils` | General utils | Common utility functions | BlockStore, RepoStore | +| `../utils/asynciter` | Async iterators | Async iteration patterns | TreeHelper, QueryIterHelper | +| `../utils/safeasynciter` | Safe async iter | Error-safe async iteration | NetworkStore, Maintenance | +| `../utils/asyncheapqueue` | Async queue | Priority queue for async operations | NetworkStore | +| `../utils/timer` | Timer utilities | Periodic timer operations | Maintenance | +| `../utils/json` | JSON utilities | JSON helper functions | RepoStore coders | +| `../chunker` | Data chunking | Breaking data into chunks | CacheStore | +| `../logutils` | Logging utils | Logging macros and utilities | All stores | \ No newline at end of file