multicodec/multicodec-packed.md
2016-09-23 15:47:08 +08:00

5.2 KiB

multicodec-packed

compact self-describing codecs. Save space by using predefined multicodec tables.

Table of Contents

Motivation

Multicodecs are self-describing protocol/encoding streams. Multicodec-packed is a different representation of multicodec, which uses an agreed-upon "protocol table". It is designed for use in short strings, such as keys or identifiers (i.e CID).

Protocol Description - How does the protocol work?

multicodec-packed is a self-describing multiformat, it wraps other formats with a tiny bit of self-description. A multicodec-packed identifier is both a varint and the code identifying the following data, this means that the most significant bit of every multicodec-packed code is reserved to signal the continuation.

This way, a chunk of data identified by multicodec will look like this:

<multicodec-packed-varint><encoded-data>
# To reduce the cognitive load, we sometimes might write the same line as:
<mcp><data>

Another useful scenario is when using the multicodec-packed as part of the keys to access data, example:

# suppose we have a value and a key to retrieve it
"<key>" -> <value>

# we can use multicodec-packed with the key to know what codec the value is in
"<mcp><key>" -> <value>

It is worth noting that multicodec-packed works very well in conjunction with multihash and multiaddr, as you can prefix those values with a multicodec-packed to tell what they are.

Multicodec-Packed Protocol Tables

Multicodec-packed uses "protocol tables" to agree upon the mapping from one multicodec-packed code (a single varint). These tables map an <mcp-code> to a full multicodec protocol path. These tables can be application specific, though -- like with other multiformats -- we will keep a globally agreed upon table with common protocols and formats.

Standard mcp protocol table

This is the standard multicodec-packed protocol table.

WARNING: WIP. this table is not ready for wide use.

TODO:

  • See if IANA has a ready-made table for us to use here. Even just a listing of the most popular formats would be good enough.
code  codec

# Miscellaneous
0x00  raw binary data

# Multiformats
0x40  multicodec
0x41  multihash
0x42  multiaddr

# Serialization formats (cbor, ion, protobuf, etc)
# TODO

# VCS'es formats (git, hg, SVN, etc)
# TODO

# Blockchain block types (bitcoin, ethereum, stellar, etc)
# TODO

Implementations

FAQ

Q. I have questions on multicodec, not listed here.

That's not a question. But, have you checked the proper multicodec FAQ? Maybe your question is answered there. This FAQ is only specifically for multicodec-packed.

Q. Why?

Because multicodec is too long for identifiers. We needed something shorter.

Q. Why varints?

So that we have no limitation on protocols. Implementation note: you do not need to implement varints until the standard multicodec table has more than 127 functions.

Q. What kind of varints?

An Most Significant Bit unsigned varint, as defined by the multiformats/unsigned-varint.

Q. Don't we have to agree on a table of protocols?

Yes, but we already have to agree on what protocols themselves are, so this is not so hard. The table even leaves some room for custom protocol paths, or you can use your own tables. The standard table is only for common things.

Maintainers

Captain: @jbenet.

Contribute

Contributions welcome. Please check out the issues.

Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS Code of Conduct.

License

MIT