logos-storage-docs-obsidian/10 Notes/Specs/Component Specification - Store.md
2025-09-01 16:52:21 +05:30

20 KiB
Raw Blame History

1. Purpose and Scope

The Store Module serves as the core storage abstraction in Nim-Codex, providing a unified interface for storing and retrieving content-addressed blocks and associated metadata.

The primary design goal is to decouple storage operations from the underlying datastore semantics by introducing the BlockStore interface. This interface standardizes methods for storing and retrieving both ephemeral and persistent blocks, ensuring a consistent API across different storage backends.

Additionally, the module integrates a maintenance engine responsible for cleaning up expired ephemeral data according to configured policies.

This module is built on top of the generic DataStore (DS) interface, which is implemented by multiple backends such as SQLite, LevelDB, and the filesystem.

The DS provides a KV-store abstraction (Get, Put, Delete, Query), with backend-dependent guarantees. At a minimum, row-level consistency and basic batching are expected.

It also supports:

  • Namespace mounting for isolating backend usage
  • Layering backends (e.g., caching in front of persistent stores)
  • Flexible stacking and composition of storage proxies

2. Limitations

The current implementation has several shortcomings:

  • No dataset-level operations or advanced batching support
  • Lack of consistent locking and concurrency control, which may lead to inconsistencies during:
    • Crashes
    • Long-running operations on block groups (e.g., reference count updates, expiration updates)

3. BlockStore Interface

Method Description Input Output
getBlock(cid: Cid) Retrieve block by CID CID Future[?!Block]
getBlock(treeCid: Cid, index: Natural) Retrieve block from a Merkle tree by leaf index Tree CID, index Future[?!Block]
getBlock(address: BlockAddress) Retrieve block via unified address BlockAddress Future[?!Block]
getBlockAndProof(treeCid: Cid, index: Natural) Retrieve block with Merkle proof Tree CID, index Future[?!(Block, CodexProof)]
getCid(treeCid: Cid, index: Natural) Retrieve leaf CID from tree metadata Tree CID, index Future[?!Cid]
getCidAndProof(treeCid: Cid, index: Natural) Retrieve leaf CID with inclusion proof Tree CID, index Future[?!(Cid, CodexProof)]
putBlock(blk: Block, ttl: Duration) Store block with quota enforcement Block, optional TTL Future[?!void]
putCidAndProof(treeCid: Cid, index: Natural, blkCid: Cid, proof: CodexProof) Store leaf metadata with ref counting Tree CID, index, block CID, proof Future[?!void]
hasBlock(...) Check block existence (CID or tree leaf) CID / Tree CID + index Future[?!bool]
delBlock(...) Delete block/tree leaf (with ref count checks) CID / Tree CID + index Future[?!void]
ensureExpiry(...) Update expiry for block/tree leaf CID / Tree CID + index, expiry timestamp Future[?!void]
listBlocks(blockType: BlockType) Iterate over stored blocks Block type Future[?!SafeAsyncIter[Cid]]
getBlockExpirations(maxNumber, offset) Retrieve block expiry metadata Pagination params Future[?!SafeAsyncIter[BlockExpiration]]
blockRefCount(cid: Cid) Get block reference count CID Future[?!Natural]
reserve(bytes: NBytes) Reserve storage quota Bytes Future[?!void]
release(bytes: NBytes) Release reserved quota Bytes Future[?!void]
start() Initialize store Future[void]
stop() Gracefully shut down store Future[void]
close() Close underlying datastores Future[void]

4. Functional Requirements

Available Today

  • Atomic Block Operations

    • Store, retrieve, and delete operations must be atomic.
    • Support retrieval via:
      • Direct CID
      • Tree-based addressing (treeCid + index)
      • Unified block address
  • Metadata Management

    • Store protocol-level metadata (e.g., storage proofs, quota usage).
    • Store block-level metadata (e.g., reference counts, total block count).
  • Multi-Datastore Support

    • Pluggable datastore interface supporting various backends.
    • Typed datastore operations for metadata type safety.
  • Lifecycle & Maintenance

    • BlockMaintainer service for removing expired data.
    • Configurable maintenance intervals (default: 10 min).
    • Batch processing (default: 1000 blocks/cycle).

Future Requirements

  • Transaction Rollback & Error Recovery

    • Rollback support for failed multi-step operations.
    • Consistent state restoration after failures.
  • Dataset-Level Operations

    • Handle Dataset level meta data.
    • Batch operations for dataset block groups.
  • Concurrency Control

    • Consistent locking and coordination mechanisms to prevent inconsistencies during crashes or long-running operations.
  • Lifecycle & Maintenance

    • Cooperative scheduling to avoid blocking.
    • State tracking for large datasets.

5. Non-Functional Requirements

Available Today

  • Security

    • Verify block content integrity upon retrieval.
    • Enforce quotas to prevent disk exhaustion.
    • Safe orphaned data cleanup.
  • Scalability

    • Configurable storage quotas (default: 20 GiB).
    • Pagination for metadata queries.
    • Reference countingbased garbage collection.
  • Reliability

    • Metrics collection (codex_repostore_*).
    • Graceful shutdown with resource cleanup.

Future Requirements

  • Performance

    • Batch metadata updates.
    • Efficient key lookups with configurable prefix lengths.
    • Support for both fast and slower storage tiers.
    • Streaming APIs optimized for extremely large datasets.
  • Security

    • Finer-grained quota enforcement across tenants/namespaces.
  • Reliability

    • Stronger rollback semantics for multi-node consistency.
    • Auto-recovery from inconsistent states.

6. Internal Design

Store Implementations

The Store module provides three concrete implementations of the BlockStore interface, each optimized for a specific role in the Nim-Codex architecture: RepoStore, NetworkStore, and CacheStore.

RepoStore

The RepoStore (RS) is a persistent BlockStore implementation that interfaces directly with low-level storage backends, such as hard drives and databases.

It uses two distinct DataStore (DS) backends:

  • FileSystem — for storing raw block data
  • LevelDB — for storing associated metadata

This separation ensures optimal performance, allowing block data operations to run efficiently while metadata updates benefit from a fast key-value database.

Characteristics:

  • Persistent storage via datastore backends
  • Quota management with precise usage tracking
  • TTL (time-to-live) support with automated expiration
  • Metadata storage for block size, reference count, and expiry
  • Transaction-like operations implemented through reference counting

Configuration:

  • quotaMaxBytes: Maximum storage quota
  • blockTtl: Default TTL for stored blocks
  • postFixLen: CID key postfix length for sharding
┌─────────────────────────────────────────────────────────────┐
                        RepoStore                            
├─────────────────────────────────────────────────────────────┤
  ┌─────────────┐              ┌──────────────────────────┐  
    repoDs                          metaDs               
   (Datastore)                 (TypedDatastore)          
                                                         
   Block Data:                Metadata:                  
   - Raw bytes                - BlockMetadata            
   - CID-keyed                - LeafMetadata             
                              - QuotaUsage               
                              - Block counts             
  └─────────────┘              └──────────────────────────┘  
└─────────────────────────────────────────────────────────────┘


NetworkStore

The NetworkStore is a composite BlockStore that combines local persistence with network-based retrieval for distributed content access.

It follows a local-first strategy — attempting to retrieve or store blocks locally first, and falling back to network retrieval via the Block Exchange Engine if the block is not available locally.

Characteristics:

  • Integrates local storage with network retrieval
  • Works seamlessly with the block exchange engine for peer-to-peer access
  • Transparent block fetching from remote sources
  • Local caching of blocks retrieved from the network for future access
┌────────────────────────────────────────────────────────────┐
                      NetworkStore                          
├────────────────────────────────────────────────────────────┤
                                                            
  ┌─────────────────┐           ┌──────────────────────┐    
   LocalStore - RS               BlockExcEngine         
    Store blocks               Request blocks         
    Get blocks                 Resolve blocks         
  └─────────────────┘           └──────────────────────┘    
                                                          
           └──────────────┬───────────────┘                 
                                                           
                   ┌─────────────┐                          
                   BS Interface                           
                                                          
                     getBlock                            
                     putBlock                            
                     hasBlock                            
                     delBlock                            
                   └─────────────┘                          
└────────────────────────────────────────────────────────────┘


CacheStore

The CacheStore is an in-memory BlockStore implementation designed for fast access to frequently used blocks.

This store maintains two separate LRU caches:

  1. Block CacheLruCache[Cid, Block]
    • Stores actual block data indexed by CID
    • Acts as the primary cache for block content
  2. CID/Proof CacheLruCache[(Cid, Natural), (Cid, CodexProof)]
    • Maps (treeCid, index) to (blockCid, proof)
    • Supports direct access to block proofs keyed by treeCid and index

Characteristics:

  • O(1) access times for cached data
  • LRU eviction policy for memory management
  • Configurable maximum cache size
  • No persistence — cache contents are lost on restart
  • No TTL — blocks remain in cache until evicted

Configuration:

  • cacheSize: Maximum total cache size (bytes)
  • chunkSize: Minimum block size unit

Storage Layout

Key Pattern Data Type Description Example
repo/manifests/{XX}/{full-cid} Raw bytes Manifest block data repo/manifests/Cd/bafy...Cd → [data]
repo/blocks/{XX}/{full-cid} Raw bytes Block data repo/blocks/Ab/bafy...Ab → [data]
meta/ttl/{cid} BlockMetadata Expiry, size, refCount meta/ttl/bafy... → {...}
meta/proof/{treeCid}/{index} LeafMetadata Merkle proof for leaf meta/proof/bafy.../42 → {...}
meta/total Natural Total stored blocks meta/total → 12039
meta/quota/used NBytes Used quota meta/quota/used → 52428800
meta/quota/reserved NBytes Reserved quota meta/quota/reserved → 104857600

WorkFlows

Following flow chart summarises how put, get, and delete operations interact with the shared block storage, metadata store, and quota management systems

  flowchart TD
      A[Block Operations] --> B[putBlock]
      A --> C[getBlock]
      A --> D[delBlock]

      B --> E[Store to RepoDS]
      B --> F[Update Metadata]
      B --> G[Update Quota]

      C --> H[Query RepoDS]
      C --> I[Handle Tree Access]
      C --> J[Return Block Data]

      D --> K[Check Reference Count]
      D --> L[Remove from RepoDS]
      D --> M[Update Counters]

      E --> N[(Block Storage)]
      H --> N
      L --> N

      F --> O[(Metadata Store)]
      I --> O
      K --> O

      G --> P[(Quota Management)]
      M --> P

PutBlock

The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates

 flowchart TD
      A[putBlock: blk, ttl] --> B[Calculate expiry = now + ttl]
      B --> C[storeBlock: blk, expiry]
      C --> D{Block empty?}
      D -->|Yes| E[Return AlreadyInStore]
      D -->|No| F[Create metadata & block keys]
      F --> G{Block metadata exists?}
      G -->|Yes| H{Size matches?}
      H -->|Yes| I[Return AlreadyInStore]
      H -->|No| J[Return Error]
      G -->|No| K[Create new metadata]
      K --> L[Store block data]
      L --> M{Store successful?}
      M -->|No| N[Return Error]
      M -->|Yes| O[Update quota usage]
      O --> P{Quota update OK?}
      P -->|No| Q[Rollback: Delete block]
      Q --> R[Return Error]
      P -->|Yes| S[Update total blocks count]
      S --> T[Trigger onBlockStored callback]
      T --> U[Return Success]

GetBlock

The following flow chart explains how a block is retrieved by CID or tree reference, resolving metadata if necessary, and returning the block or an error

flowchart TD
      A[getBlock: cid/address] --> B{Input type?}
      B -->|BlockAddress with leaf| C[getLeafMetadata: treeCid, index]
      B -->|CID| D[Direct CID access]
      C --> E{Leaf metadata found?}
      E -->|No| F[Return BlockNotFoundError]
      E -->|Yes| G[Extract block CID from metadata]
      G --> D
      D --> H{CID empty?}
      H -->|Yes| I[Return empty block]
      H -->|No| J[Create prefix key]
      J --> K[Query datastore: repoDs.get]
      K --> L{Block found?}
      L -->|No| M{Error type?}
      M -->|DatastoreKeyNotFound| N[Return BlockNotFoundError]
      M -->|Other| O[Return Error]
      L -->|Yes| P[Create Block with verification]
      P --> Q[Return Block]

DelBlock

The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates

  flowchart TD
      A[delBlock: cid] --> B[delBlockInternal: cid]
      B --> C{CID empty?}
      C -->|Yes| D[Return Deleted]
      C -->|No| E[tryDeleteBlock: cid, now]
      E --> F{Metadata exists?}
      F -->|No| G[Check if block exists in repo]
      G --> H{Block exists?}
      H -->|Yes| I[Warn & remove orphaned block]
      H -->|No| J[Return NotFound]
      I --> J
      F -->|Yes| K{refCount = 0 OR expired?}
      K -->|No| L[Return InUse]
      K -->|Yes| M[Delete block & metadata]
      M --> N[Return Deleted]
      D --> O[Handle result]
      J --> O
      L --> O
      N --> O
      O --> P{Result type?}
      P -->|InUse| Q[Return Error: Cannot delete dataset block]
      P -->|NotFound| R[Return Success: Ignore]
      P -->|Deleted| S[Update total blocks count]
      S --> T[Update quota usage]
      T --> U[Return Success]

7. Data Models

Stores

RepoStore* = ref object of BlockStore
  postFixLen*: int
  repoDs*: Datastore
  metaDs*: TypedDatastore
  clock*: Clock
  quotaMaxBytes*: NBytes
  quotaUsage*: QuotaUsage
  totalBlocks*: Natural
  blockTtl*: Duration
  started*: bool

NetworkStore* = ref object of BlockStore
  engine*: BlockExcEngine
  localStore*: BlockStore

CacheStore* = ref object of BlockStore
  currentSize*: NBytes
  size*: NBytes
  cache: LruCache[Cid, Block]
  cidAndProofCache: LruCache[(Cid, Natural), (Cid, CodexProof)]

Metadata Types

BlockMetadata* {.serialize.} = object
  expiry*: SecondsSince1970
  size*: NBytes
  refCount*: Natural

LeafMetadata* {.serialize.} = object
  blkCid*: Cid
  proof*: CodexProof

BlockExpiration* {.serialize.} = object
  cid*: Cid
  expiry*: SecondsSince1970

QuotaUsage* {.serialize.} = object
  used*: NBytes
  reserved*: NBytes


8. Dependencies

External Dependencies

Package Purpose Used For Components Using
pkg/chronos Async runtime Async procedures, futures, cancellation handling All stores (async operations)
pkg/libp2p P2P networking CID types, multicodec, multihash BlockStore, NetworkStore, KeyUtils
pkg/questionable Error handling ?! Result types, optional values All components (error propagation)
pkg/datastore Storage abstraction Key-value storage interface, queries RepoStore (main storage)
pkg/datastore/typedds Typed datastore Type-safe serialization/deserialization RepoStore (metadata storage)
pkg/lrucache Memory caching LRU cache implementation CacheStore (block caching)
pkg/metrics Monitoring Prometheus metrics, gauges, counters RepoStore (quota tracking)
pkg/serde/json Serialization JSON encoding/decoding RepoStore coders
pkg/stew/byteutils Byte utilities Hex conversion, byte arrays RepoStore coders
pkg/stew/endians2 Endian conversion Little/big endian conversion RepoStore coders
pkg/chronicles Logging Structured logging QueryIterHelper
pkg/upraises Exception safety Exception effect tracking KeyUtils, TreeHelper
std/options Optional types Option[T] for nullable values CacheStore
std/sugar Syntax sugar Lambda expressions, proc shortcuts KeyUtils, RepoStore coders

Internal Dependencies

Module Purpose Used For Components Using
../blocktype Block types Block/Manifest type definitions, codec info All stores (type checking)
../blockexchange Block exchange Network block exchange protocols NetworkStore
../merkletree Merkle trees Tree structures, proofs, hashing BlockStore, RepoStore, TreeHelper
../manifest Data manifests Manifest structures, CID identification KeyUtils, CacheStore
../clock Time abstraction Clock interface for time operations BlockStore, RepoStore
../systemclock System clock Wall-clock time implementation RepoStore, Maintenance
../units Units/Types NBytes, Duration, time units RepoStore (quota/TTL)
../errors Custom errors CodexError, BlockNotFoundError RepoStore
../namespaces Key namespaces Database key namespace constants KeyUtils
../utils General utils Common utility functions BlockStore, RepoStore
../utils/asynciter Async iterators Async iteration patterns TreeHelper, QueryIterHelper
../utils/safeasynciter Safe async iter Error-safe async iteration NetworkStore, Maintenance
../utils/asyncheapqueue Async queue Priority queue for async operations NetworkStore
../utils/timer Timer utilities Periodic timer operations Maintenance
../utils/json JSON utilities JSON helper functions RepoStore coders
../chunker Data chunking Breaking data into chunks CacheStore
../logutils Logging utils Logging macros and utilities All stores