Adds bunch of documents about uploading codex content and codex block exchange protocol

This commit is contained in:
Marcin Czenko 2025-02-17 12:18:36 +01:00
parent ef3ce6a2f6
commit 336d9068a3
No known key found for this signature in database
GPG Key ID: 33DEA0C8E30937C0
10 changed files with 681 additions and 3 deletions

View File

@ -8,6 +8,7 @@
"finished": "checkbox",
"finished-date": "text",
"related-to": "multitext",
"authors": "multitext"
"authors": "multitext",
"related": "multitext"
}
}

View File

@ -46,7 +46,7 @@ openssl sha1 experiment-3/data40k.bin
SHA1(experiment-3/data40k.bin)= 1cc46da027e7ff6f1970a2e58880dbc6a08992a0
```
The info, sha1 hash of the `info` dictionary is: `1902d602db8c350f4f6d809ed01eff32f030da95`.
The `info` hash (sha1) of the bencoded `info` dictionary is: `1902d602db8c350f4f6d809ed01eff32f030da95`.
Now, let's assume our input is a magnet link corresponding to the above torrent file:
@ -314,4 +314,8 @@ From the above analysis, we see that for Codex protocol to be able to directly h
2. To handle BitTorrent version 1 traffic, it is sufficient to support [[BEP9 - Extension for Peers to Send Metadata Files]].
3. We already use Merkle inclusion proofs, so we have great deal of flexibility of how we want to support torrents version 2. The `info` dictionary already provides us with the original file roots via `pieces root`, so we basically have to make sure we can provide the relevant intermediate layers aligned to the piece length (which basically means the leaves of our Merkle trees need to be computed over the chunks of the same size as in BitTorrent - `16 kiB`) and the `pieces root` will be built on top of that. Moreover, having inclusion proofs in place we should be able to improve version 1 torrents as well. With original pieces hashes coming from the `info` dictionary, we can secure authenticity of the content, and with Codex inclusion proofs we can enhance torrents version 1 with early validation.
More detailed discussion will follow after learning more low level details of the Codex client.
More detailed discussion will follow after learning more low level details of the Codex client.
### Followup
[[Uploading and downloading content in Codex]] is were we document the contant upload and download in the Codex client.

42
10 Notes/Block Storage.md Normal file
View File

@ -0,0 +1,42 @@
---
tags:
- codex/block-storage
related:
- "[[Uploading and downloading content in Codex]]"
- "[[Codex Blocks]]"
- "[[Codex Block Exchange Protocol]]"
---
#codex/block-storage
| related | [[Uploading and downloading content in Codex]], [[Codex Blocks]], [[Codex Block Exchange Protocol]] |
| ------- | --------------------------------------------------------------------------------------------------- |
To store blocks and the corresponding metadata, we use `RepoStore`.
`RepoStore` is a proxy to two underlying stores:
- `repoDS` - to store the blocks themselves - by default it is `FSDatastore` as indicated by option `repoKind` in `CodexConf` (`codex/conf.nim`). Other types of storage are also available: `SQLiteDatastore`, `LevelDbDatastore`.
- `metaDS` - to store the blocks' metadata - `LevelDbDatastore` (`vendor/nim-datastore/datastore/leveldb/leveldbds.nim`).
The stores are initialized in `CodexServer.new` (`codex/codex.nim`) and injected into `repoStore` (type `RepoStore` defined in `codex/stores/repostore/types.nim`):
```nim
repoStore = RepoStore.new(
repoDs = repoData,
metaDs = LevelDbDatastore.new(config.dataDir / CodexMetaNamespace).expect(
"Should create metadata store!"
),
quotaMaxBytes = config.storageQuota,
blockTtl = config.blockTtl,
)
```
The default value for `storageQuota` is given by `config.storageQuota` and `config.blockTtl` (`codex/stores/repostore/types.nim`):
```nim
const
DefaultBlockTtl* = 24.hours
DefaultQuotaBytes* = 8.GiBs
```
`repoStore` together with `engine` (`BlockExcEngine`) are parts of `NetworkStore`, which together with `switch`, `engine`, `discovery`, and `prover` is then provided to `codexNode` (`CodexNodeRef`).

View File

@ -0,0 +1,355 @@
---
tags:
- codex/block-exchange
- codex/libp2p
- libp2p
related:
- "[[Codex Blocks]]"
- "[[Uploading and downloading content in Codex]]"
- "[[Protocol of data exchange between Codex nodes]]"
---
| related | [[Codex Blocks]], [[Uploading and downloading content in Codex]], [[Protocol of data exchange between Codex nodes]] |
| ------- | ------------------------------------------------------------------------------------------------------------------- |
Codex block exchange protocol is built on top of [[libp2p]].
To understand how Codex Block Exchange protocol is built on top of libp2p (Codex protocol for short), is it good to grasp some basics of how protocols are generally implemented on top of libp2p. [Simple ping tutorial](https://vacp2p.github.io/nim-libp2p/docs/tutorial_1_connect/) and [Custom protocol in libp2p](https://vacp2p.github.io/nim-libp2p/docs/tutorial_2_customproto/), and [Protobuf usage](https://vacp2p.github.io/nim-libp2p/docs/tutorial_3_protobuf/) together with introduction to the [Switch](https://docs.libp2p.io/concepts/multiplex/switch/) component are good introductory reads, without which it may be kind of hard to understand the high-level structure of the Codex client.
To quickly summarize, starting a P2P node with a custom protocol using libp2p can be described as follows.
We derive our protocol type (e.g. `TestProto`) from `LPProtocol`:
```nim
const TestCodec = "/test/proto/1.0.0"
type TestProto = ref object of LPProtocol
```
We initialize our `TestProto` providing our `codecs` and a `handler`, which will be called for each incoming peer asking for this protocol:
```nim
proc new(T: typedesc[TestProto]): T =
# every incoming connections will in be handled in this closure
proc handle(conn: Connection, proto: string) {.async.} =
# Read up to 1024 bytes from this connection, and transform them into
# a string
echo "Got from remote - ", string.fromBytes(await conn.readLp(1024))
# We must close the connections ourselves when we're done with it
await conn.close()
return T.new(codecs = @[TestCodec], handler = handle)
```
Then, we create a *switch*, e.g:
```nim
proc createSwitch(ma: MultiAddress, rng: ref HmacDrbgContext): Switch =
var switch = SwitchBuilder
.new()
.withRng(rng)
# Give the application RNG
.withAddress(ma)
# Our local address(es)
.withTcpTransport()
# Use TCP as transport
.withMplex()
# Use Mplex as muxer
.withNoise()
# Use Noise as secure manager
.build()
return switch
```
And finally, we tie everything up and trigger a simple communication between two peers:
```nim
proc main() {.async.} =
let
rng = newRng()
localAddress = MultiAddress.init("/ip4/0.0.0.0/tcp/0").tryGet()
testProto = TestProto.new()
switch1 = createSwitch(localAddress, rng)
switch2 = createSwitch(localAddress, rng)
switch1.mount(testProto)
await switch1.start()
await switch2.start()
let conn =
await switch2.dial(switch1.peerInfo.peerId, switch1.peerInfo.addrs, TestCodec)
await testProto.hello(conn)
# We must close the connection ourselves when we're done with it
await conn.close()
await allFutures(switch1.stop(), switch2.stop())
# close connections and shutdown all transports
```
Now, let's find the analogous steps in our Codex client.
### Codex
In `codex.nim`, we have:
```nim
let
keyPath =
if isAbsolute(config.netPrivKeyFile):
config.netPrivKeyFile
else:
config.dataDir / config.netPrivKeyFile
privateKey = setupKey(keyPath).expect("Should setup private key!")
server = try:
CodexServer.new(config, privateKey)
except Exception as exc:
error "Failed to start Codex", msg = exc.msg
quit QuitFailure
```
and later:
```nim
waitFor server.start()
```
It is in `CodexServer.new` (`codex/codex.nim`) where we create our *switch*:
```nim
proc new*(
T: type CodexServer, config: CodexConf, privateKey: CodexPrivateKey
): CodexServer =
## create CodexServer including setting up datastore, repostore, etc
let switch = SwitchBuilder
.new()
.withPrivateKey(privateKey)
.withAddresses(config.listenAddrs)
.withRng(Rng.instance())
.withNoise()
.withMplex(5.minutes, 5.minutes)
.withMaxConnections(config.maxPeers)
.withAgentVersion(config.agentString)
.withSignedPeerRecord(true)
.withTcpTransport({ServerFlags.ReuseAddr})
.build()
```
A moment later, in the same proc, we have:
```nim
let network = BlockExcNetwork.new(switch)
```
and then close to the end:
```nim
switch.mount(network)
```
Finally, in `CodexServer.start`
```nim
proc start*(s: CodexServer) {.async.} =
trace "Starting codex node", config = $s.config
await s.repoStore.start()
s.maintenance.start()
await s.codexNode.switch.start()
```
Thus, `BlockExcNetwork` is our protocol type (`codex/blockexchange/network/network.nim`):
```nim
type BlockExcNetwork* = ref object of LPProtocol
peers*: Table[PeerId, NetworkPeer]
switch*: Switch
handlers*: BlockExcHandlers
request*: BlockExcRequest
getConn: ConnProvider
inflightSema: AsyncSemaphore
```
In the constructor, `BlockExcNetwork.new`, a couple of request functions are defined and attached to `self.request` (`BlockExcRequest` type). Then, `self.init()` is called:
```nim
method init*(b: BlockExcNetwork) =
## Perform protocol initialization
##
proc peerEventHandler(peerId: PeerId, event: PeerEvent) {.async.} =
if event.kind == PeerEventKind.Joined:
b.setupPeer(peerId)
else:
b.dropPeer(peerId)
b.switch.addPeerEventHandler(peerEventHandler, PeerEventKind.Joined)
b.switch.addPeerEventHandler(peerEventHandler, PeerEventKind.Left)
proc handle(conn: Connection, proto: string) {.async, gcsafe, closure.} =
let peerId = conn.peerId
let blockexcPeer = b.getOrCreatePeer(peerId)
await blockexcPeer.readLoop(conn) # attach read loop
b.handler = handle
b.codec = Codec
```
Here we see the familiar `handler` and `codec` being set. Let's have closer look at `getOrCreatePeer`:
```nim
proc getOrCreatePeer(b: BlockExcNetwork, peer: PeerId): NetworkPeer =
## Creates or retrieves a BlockExcNetwork Peer
##
if peer in b.peers:
return b.peers.getOrDefault(peer, nil)
var getConn: ConnProvider = proc(): Future[Connection] {.async, gcsafe, closure.} =
try:
return await b.switch.dial(peer, Codec)
except CancelledError as error:
raise error
except CatchableError as exc:
trace "Unable to connect to blockexc peer", exc = exc.msg
if not isNil(b.getConn):
getConn = b.getConn
let rpcHandler = proc(p: NetworkPeer, msg: Message) {.async.} =
b.rpcHandler(p, msg)
# create new pubsub peer
let blockExcPeer = NetworkPeer.new(peer, getConn, rpcHandler)
debug "Created new blockexc peer", peer
b.peers[peer] = blockExcPeer
return blockExcPeer
```
Here we recognize the familiar `dial` operation, and we see a new abstraction - `NetworkPeer` - representing the peer. In the `NetworkPeer` we find the `readLoop` defined above in the protocol handler:
```nim
proc readLoop*(b: NetworkPeer, conn: Connection) {.async.} =
if isNil(conn):
return
try:
while not conn.atEof or not conn.closed:
let
data = await conn.readLp(MaxMessageSize.int)
msg = Message.protobufDecode(data).mapFailure().tryGet()
await b.handler(b, msg)
except CancelledError:
trace "Read loop cancelled"
except CatchableError as err:
warn "Exception in blockexc read loop", msg = err.msg
finally:
await conn.close()
```
We read from the connection, decode the message (see [[Protocol of data exchange between Codex nodes]]), forward it down to the `handler`, which is the `rpcHandler` we see above in `getOrCreatePeer`. `rpcHandler` is then defined at the protocol level (`BlockExcNetwork`) as:
```nim
proc rpcHandler(b: BlockExcNetwork, peer: NetworkPeer, msg: Message) {.raises: [].} =
## handle rpc messages
##
if msg.wantList.entries.len > 0:
asyncSpawn b.handleWantList(peer, msg.wantList)
if msg.payload.len > 0:
asyncSpawn b.handleBlocksDelivery(peer, msg.payload)
if msg.blockPresences.len > 0:
asyncSpawn b.handleBlockPresence(peer, msg.blockPresences)
if account =? Account.init(msg.account):
asyncSpawn b.handleAccount(peer, account)
if payment =? SignedState.init(msg.payment):
asyncSpawn b.handlePayment(peer, payment)
```
Let's focus for a moment on `handleBlocksDelivery`:
```nim
proc handleBlocksDelivery(
b: BlockExcNetwork, peer: NetworkPeer, blocksDelivery: seq[BlockDelivery]
) {.async.} =
## Handle incoming blocks
##
if not b.handlers.onBlocksDelivery.isNil:
await b.handlers.onBlocksDelivery(peer.id, blocksDelivery)
```
What are `handlers`? It is an instance of `BlockExcHandlers` set in `BlockExcEngine.new` (`codex/blockexchange/engine/engine.nim`). There, `onBlockDelivery` member is set to:
```nim
proc blocksDeliveryHandler(
peer: PeerId, blocksDelivery: seq[BlockDelivery]
): Future[void] {.gcsafe.} =
engine.blocksDeliveryHandler(peer, blocksDelivery)
```
which forwards the `seq[BlockDelivery]` payload to:
```nim
proc blocksDeliveryHandler*(
b: BlockExcEngine, peer: PeerId, blocksDelivery: seq[BlockDelivery]
) {.async.} =
trace "Received blocks from peer", peer, blocks = (blocksDelivery.mapIt(it.address))
var validatedBlocksDelivery: seq[BlockDelivery]
for bd in blocksDelivery:
logScope:
peer = peer
address = bd.address
if err =? b.validateBlockDelivery(bd).errorOption:
warn "Block validation failed", msg = err.msg
continue
if err =? (await b.localStore.putBlock(bd.blk)).errorOption:
error "Unable to store block", err = err.msg
continue
if bd.address.leaf:
without proof =? bd.proof:
error "Proof expected for a leaf block delivery"
continue
if err =? (
await b.localStore.putCidAndProof(
bd.address.treeCid, bd.address.index, bd.blk.cid, proof
)
).errorOption:
error "Unable to store proof and cid for a block"
continue
validatedBlocksDelivery.add(bd)
await b.resolveBlocks(validatedBlocksDelivery)
codex_block_exchange_blocks_received.inc(validatedBlocksDelivery.len.int64)
let peerCtx = b.peers.get(peer)
if peerCtx != nil:
await b.payForBlocks(peerCtx, blocksDelivery)
## shouldn't we remove them from the want-list instead of this:
peerCtx.cleanPresence(blocksDelivery.mapIt(it.address))
```
Here we see that each received block is [[Codex Block Validation|validated]] and then `resolveBlocks` is called:
```nim
proc resolveBlocks*(b: BlockExcEngine, blocksDelivery: seq[BlockDelivery]) {.async.} =
b.pendingBlocks.resolve(blocksDelivery)
await b.scheduleTasks(blocksDelivery)
await b.cancelBlocks(blocksDelivery.mapIt(it.address))
```

39
10 Notes/Codex Blocks.md Normal file
View File

@ -0,0 +1,39 @@
---
tags:
- codex/blocks
related:
- "[[Codex Block Exchange Protocol]]"
- "[[Uploading and downloading content in Codex]]"
---
#codex/blocks
| related | [[Codex Block Exchange Protocol]], [[Uploading and downloading content in Codex]] |
| ------- | --------------------------------------------------------------------------------- |
In the Codex client, blocks are represented by the following data types (`codex/blocktype.nim`):
```nim
type
Block* = ref object of RootObj
cid*: Cid
data*: seq[byte]
BlockAddress* = object
case leaf*: bool
of true:
treeCid* {.serialize.}: Cid
index* {.serialize.}: Natural
else:
cid* {.serialize.}: Cid
```
And then we have also *block metadata* (`codex/stores/repostore/types.nim`):
```nim
BlockMetadata* {.serialize.} = object
expiry*: SecondsSince1970
size*: NBytes
refCount*: Natural
```

View File

@ -0,0 +1,62 @@
Defined in `codex/blockexchange/protobuf/message.proto`:
```
// Protocol of data exchange between Codex nodes.
// Extended version of https://github.com/ipfs/specs/blob/main/BITSWAP.md
syntax = "proto3";
package blockexc.message.pb;
message Message {
message Wantlist {
enum WantType {
wantBlock = 0;
wantHave = 1;
}
message Entry {
bytes block = 1; // the block cid
int32 priority = 2; // the priority (normalized). default to 1
bool cancel = 3; // whether this revokes an entry
WantType wantType = 4; // Note: defaults to enum 0, ie Block
bool sendDontHave = 5; // Note: defaults to false
}
repeated Entry entries = 1; // a list of wantlist entries
bool full = 2; // whether this is the full wantlist. default to false
}
message Block {
bytes prefix = 1; // CID prefix (cid version, multicodec and multihash prefix (type + length)
bytes data = 2;
}
enum BlockPresenceType {
presenceHave = 0;
presenceDontHave = 1;
}
message BlockPresence {
bytes cid = 1;
BlockPresenceType type = 2;
bytes price = 3; // Amount of assets to pay for the block (UInt256)
}
message AccountMessage {
bytes address = 1; // Ethereum address to which payments should be made
}
message StateChannelUpdate {
bytes update = 1; // Signed Nitro state, serialized as JSON
}
Wantlist wantlist = 1;
repeated Block payload = 3;
repeated BlockPresence blockPresences = 4;
int32 pendingBytes = 5;
AccountMessage account = 6;
StateChannelUpdate payment = 7;
}
```

View File

@ -0,0 +1,159 @@
#codex/upload #codex/download
| related | [[Codex Blocks]], [[Block Storage]], [[Codex Block Exchange Protocol]] |
| ------- | ---------------------------------------------------------------------- |
We upload the content with API `/api/codex/v1/data`. The handler defined in `codex/rest/api.nim` calls `CodexNodeRef.store` and then returns the `Cid` of the manifest file corresponding to the contents:
```nim
without cid =? (
await node.store(
AsyncStreamWrapper.new(reader = AsyncStreamReader(reader)),
filename = filename,
mimetype = mimetype,
)
), error:
error "Error uploading file", exc = error.msg
return RestApiResponse.error(Http500, error.msg)
codex_api_uploads.inc()
trace "Uploaded file", cid
return RestApiResponse.response($cid)
```
> See [Using Codex](https://docs.codex.storage/learn/using) in the Codex docs on how to use the Codex client.
`node.store` (`codex/node.nim`) reads data from the stream, and from each `chunk` it does the following:
```nim
while (let chunk = await chunker.getBytes(); chunk.len > 0):
without mhash =? MultiHash.digest($hcodec, chunk).mapFailure, err:
return failure(err)
without cid =? Cid.init(CIDv1, dataCodec, mhash).mapFailure, err:
return failure(err)
without blk =? bt.Block.new(cid, chunk, verify = false):
return failure("Unable to init block from chunk!")
cids.add(cid)
if err =? (await self.networkStore.putBlock(blk)).errorOption:
error "Unable to store block", cid = blk.cid, err = err.msg
return failure(&"Unable to store block {blk.cid}")
```
The default chunk size is given by the default block size defined in `codex/codextypes.nim`:
```nim
const
# Size of blocks for storage / network exchange,
DefaultBlockSize* = NBytes 1024 * 64
```
### storing blocks
Now, the `netoworkStore.putBlock`:
```nim
method putBlock*(
self: NetworkStore, blk: Block, ttl = Duration.none
): Future[?!void] {.async.} =
## Store block locally and notify the network
##
let res = await self.localStore.putBlock(blk, ttl)
if res.isErr:
return res
await self.engine.resolveBlocks(@[blk])
return success()
```
We first store the stuff locally:
```nim
method putBlock*(
self: RepoStore, blk: Block, ttl = Duration.none
): Future[?!void] {.async.} =
## Put a block to the blockstore
##
logScope:
cid = blk.cid
let expiry = self.clock.now() + (ttl |? self.blockTtl).seconds
without res =? await self.storeBlock(blk, expiry), err:
return failure(err)
if res.kind == Stored:
trace "Block Stored"
if err =? (await self.updateQuotaUsage(plusUsed = res.used)).errorOption:
# rollback changes
without delRes =? await self.tryDeleteBlock(blk.cid), err:
return failure(err)
return failure(err)
if err =? (await self.updateTotalBlocksCount(plusCount = 1)).errorOption:
return failure(err)
if onBlock =? self.onBlockStored:
await onBlock(blk.cid)
else:
trace "Block already exists"
return success()
```
The `storeBlock` defined in `codex/stores/repostore/operations.nim` is where we store the block and the metadata.
```nim
proc storeBlock*(
self: RepoStore, blk: Block, minExpiry: SecondsSince1970
): Future[?!StoreResult] {.async.} =
if blk.isEmpty:
return success(StoreResult(kind: AlreadyInStore))
without metaKey =? createBlockExpirationMetadataKey(blk.cid), err:
return failure(err)
without blkKey =? makePrefixKey(self.postFixLen, blk.cid), err:
return failure(err)
await self.metaDs.modifyGet(
...
)
```
The `modifyGet` deserve separate treatment.
### modifyGet
`modifyGet` is called on the `metaDs`. `metaDs` has a wrapper type:
```nim
TypedDatastore* = ref object of RootObj
ds*: Datastore
```
And `ds` above is:
```nim
type
LevelDbDatastore* = ref object of Datastore
db: LevelDb
locks: TableRef[Key, AsyncLock]
```
There is a cascade of callbacks going from `RepoStore` through `TypedDatastore` down to `LevelDbDataStore` as presented on the following sequence diagram:
![[repostore_storeblock.svg]]
> The diagram above can also be viewed [online](https://www.mermaidchart.com/app/projects/2564c095-670f-4258-b8ea-1c5d0b546845/diagrams/b0ba3207-7833-4fd0-b5b1-9d076585d93a/version/v0.1/edit) (requires [Mermaid Chart](https://www.mermaidchart.com) account)
`LevelDbDataStore` directly interacts with the underlying storage and ensures atomicity of the `modifyGet` operation. `TypedDatastore` performs *encoding* and *decoding* of the data. Finally, `RepoStore` handles metadata creation or update, and also writes the actual block to the underlying block storage via its `repoDS` instance variable.
This concludes the local block storage. We leave the description of `engine.resolveBlocks(@[blk])` for later, when describing the block exchange protocol.
## Downloading content
TBD...

13
10 Notes/libp2p.md Normal file
View File

@ -0,0 +1,13 @@
---
tags:
- libp2p
- codex/libp2p
related:
- "[[Codex Block Exchange Protocol]]"
---
#libp2p #codex/libp2p
| related | [[Codex Block Exchange Protocol]] |
| ------- | --------------------------------- |
In Codex, we use [nim-libp2p](https://github.com/vacp2p/nim-libp2p) - a Nim implementation of [libp2p](https://libp2p.io). Accompanying [documentation](https://vacp2p.github.io/nim-libp2p/docs/) is provided to help Nim developers to get started with libp2p.

View File

@ -0,0 +1,2 @@
| related | |
| ------- | --- |

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 27 KiB