multicodec/README.md

# multicodec

[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io)
[![](https://img.shields.io/badge/project-multiformats-blue.svg?style=flat-square)](http://github.com/multiformats/multiformats)
[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](http://webchat.freenode.net/?channels=%23ipfs)

> Make data and streams self-described by prefixing them with human readable or binary packed codecs. `multicodec` offers a base table, but can also be extended with extra tables by application basis.

## Table of Contents

- [Motivation](#motivation)
  - [How does it work? - Protocol Description](#how-does-it-work---protocol-description)
  - [The protocol path](#the-protocol-path)
- [Multicodec table](#multicodec-table)
- [Implementations](#implementations)
  - [Implementation details]()
- [FAQ](#faq)
- [Maintainers](#maintainers)
- [Contribute](#contribute)
- [License](#license)

## Motivation

Multicodecs are self-describing protocol/encoding streams. (Note that a file is a stream). It's designed to address the perennial problem:

> I have a bitstring, what codec is the data coded with!?

Instead of arguing about which data serialization library is the best, let's just pick the simplest one now, and build _upgradability_ into the system. Choices are never _forever_. Eventually all systems are changed. So, embrace this fact of reality, and build change into your system now.

Multicodec frees you from the tyranny of past mistakes. Instead of trying to figure it all out beforehand, or continue using something that we can all agree no longer fits, why not allow the system to _evolve_ and _grow_ with the use cases of today, not yesterday.

To decode an incoming stream of data, a program must either (a) know the format of the data a priori, or (b) learn the format from the data itself. (a) precludes running protocols that may provide one of many kinds of formats without prior agreement on which. multistream makes (b) neat using self-description.

Moreover, this self-description allows straightforward layering of protocols without having to implement support in the parent (or encapsulating) one.

## How does it work? - Protocol Description

`multicodec` is a _self-describing multiformat_, it wraps other formats with a tiny bit of self-description:

```sh
<varint-len><multicodec><encoded-data>
```

For example, let's encode a json doc:

```JavaScript
// encode some json
const buf = new Buffer(JSON.stringify({ hello: 'world' }))

const prefixedBuf = multicodec.addPrefix('json', str) // prepends multicodec ('json')
console.log(prefixedBuf)
// <Buffer 06 2f 6a 73 6f 6e 2f 7b 22 68 65 6c 6c 6f 22 3a 22 77 6f 72 6c 64 22 7d>

const.log(prefixedBuf.toString('hex'))
// 062f6a736f6e2f7b2268656c6c6f223a22776f726c64227d

// let's get the Multicodec, and get the data back

const codec = multicodec.getMulticodec(prefixedBuf)
console.log(codec)
// json

console.log(multicodec.rmPrefix(prefixedBuf).toString())
// "{ \"hello\": \"world\" }
```

So, `buf` is:

```
hex:   062f6a736f6e2f7b2268656c6c6f223a22776f726c64227d
ascii: /json/"{\"hello\":\"world\"}
```

Note that on the ascii version, the varint at the beginning is not being represented, you should account that.

For a binary packed version of the multicodecs, see [multicodec-packed](./multicodec-packed.md).

## The protocol path

`multicodec` allows us to specify different protocols in a universal namespace, that way being able to recognize, multiplex, and embed them easily. We use the notion of a `path` instead of an `id` because it is meant to be a Unix-friendly URI.

A good path name should be decipherable -- meaning that if some machine or developer -- who has no idea about your protocol -- encounters the path string, they should be able to look it up and resolve how to use it.

An example of a good path name is:

```
/bittorrent.org/1.0
```

An example of a _great_ path name is:

```
/ipfs/Qmaa4Rw81a3a1VEx4LxB7HADUAXvZFhCoRdBzsMZyZmqHD/ipfs.protocol
/http/w3id.org/ipfs/1.1.0
```

These path names happen to be resolvable -- not just in a "multicodec muxer(e.g [multistream](https://github.com/multiformats/multistream))" but -- in the internet as a whole (provided the program (or OS) knows how to use the `/ipfs` and `/http` protocols).

## Multicodec table

| prefix                        | codec         | description             | [packed](https://github.com/multiformats/multicodec/blob/master/multicodec-packed.md)|
|-------------------------------|---------------|-------------------------|-----------|
| **Miscelaneous**                                                                    |
| 0x052f62696e2f                | /bin/         | raw binary              | 0x00      |
| **Bases encodings**                                                                 |
| 0x042f62322f                  | /b2/          | ascii base2             |           |
| 0x052f6231362f                | /b16/         | ascii base16            |           |
| 0x052f6233322f                | /b32/         | ascii base32            |           |
| 0x052f6235382f                | /b58/         | ascii base58            |           |
| 0x052f6236342f                | /b64/         | ascii base64            |           |
| **Serialization formats**                                                           |
| 0x062f6a736f6e2f              | /json/        |                         |           |
| 0x062f63626f722f              | /cbor/        |                         |           |
| 0x062f62736f6e2f              | /bson/        |                         |           |
| 0x072f626a736f6e2f            | /bjson/       |                         |           |
| 0x082f75626a736f6e2f          | /ubjson/      |                         |           |
| 0x0a2f70726f746f6275662f      | /protobuf/    | Protocol Buffers        |           |
| 0x072f6361706e702f            | /capnp/       | Cap-n-Proto             |           |
| 0x092f666c61746275662f        | /flatbuf/     | FlatBuffers             |           |
| 0x052f726c702f                | /rlp/         | recursive length prefix | 0x60      |
| **Multiformats**                                                                    |
| 0x182f6d756c7469636f6465632f  | /multicodec/  |                         | 0x40      |
| 0x162f6d756c7469686173682f    | /multihash/   |                         | 0x41      |
| 0x162f6d756c7469616464722f    | /multiaddr/   |                         | 0x42      |
| **Multihashes**                                                                     |
|                               |               |                         |           |
| **Multiaddrs**                                                                      |
|                               |               |                         |           |
| **Archiving formats**                                                               |
| 0x052f7461722f                | /tar/         |                         |           |
| 0x052f7a69702f                | /zip/         |                         |           |
| **Image formats**                                                                   |
| 0x052f706e672f                | /png/         |                         |           |
|                               | /jpg/         |                         |           |
| **Video formats**                                                                   |
|                               | /mp4/         |                         |           |
|                               | /mkv/         |                         |           |
| **Blockchain formats**                                                              |
| n/a                           | n/a           | n/a                     | n/a       |
| **VCS formats**                                                                     |
| n/a                           | n/a           | n/a                     | n/a       |
| **IPLD formats**                                                                    |
|                               | /dag-pb/      | MerkleDAG protobuf      |           |
|                               | /dag-cbor/    | MerkleDAG cbor          |           |
|                               | /eth-rlp/     | Ethereum Block RLP      |           |


## Implementations

- multicodec
  - [go-multicodec](https://github.com/multiformats/go-multicodec)
- multicodec-packed
  - [go-multicodec-packed](https://github.com/multiformats/go-multicodec-packed)
  - [js-multicodec-packed](https://github.com/multiformats/js-multicodec-packed)
- multistream
  - [go-multistream](https://github.com/multiformats/go-multistream) - Implements multistream, which uses multicodec for stream negotiation
  - [js-multistream](https://github.com/multiformats/js-multistream) - Implements multistream, which uses multicodec for stream negotiation
  - [clj-multicodec](https://github.com/greglook/clj-multicodec)

## FAQ

> **Q. Why?**

Today, people speak many languages, and use common ones to interface. But every "common language" has evolved over time, or even fundamentally switched. Why should we expect programs to be any different?

And the reality is they're not. Programs use a variety of encodings. Today we like JSON. Yesterday, XML was all the rage. XDR solved everything, but it's kinda retro. Protobuf is still too cool for school. capnp ("cap and proto") is
for cerealization hipsters.

The one problem is figuring out what we're speaking. Humans are pretty smart, we pick up all sorts of languages over time. And we can always resort to pointing and grunting (the ascii of humanity).

Programs have a harder time. You can't keep piping json into a protobuf decoder and hope they align. So we have to help them out a bit. That's what multicodec is for.

> **Q. Why "codec" and not "encoder" and "decoder"?**

Because they're the same thing. Which one of these is the encoder and which the decoder?

    5555 ----[ THING ]---> 8888
    5555 <---[ THING ]---- 8888

> **Q. Full paths are too big for my use case, is there something smaller?**

Yes, check out [multicodec-packed](./multicodec-packed.md). It uses a varint and a table to achieve the same thing.

## Maintainers

Captain: [@jbenet](https://github.com/jbenet).

## Contribute

Contributions welcome. Please check out [the issues](https://github.com/multiformats/multicodec/issues).

Check out our [contributing document](https://github.com/multiformats/multiformats/blob/master/contributing.md) for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS [Code of Conduct](https://github.com/ipfs/community/blob/master/code-of-conduct.md).

## License

[MIT](LICENSE)
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`# multicodec`
wip 2015-08-23 22:34:57 +00:00
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io)`
			`[![](https://img.shields.io/badge/project-multiformats-blue.svg?style=flat-square)](http://github.com/multiformats/multiformats)`
			`[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](http://webchat.freenode.net/?channels=%23ipfs)`
bring updates on updated multistream 2015-08-24 10:16:30 +00:00
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			> Make data and streams self-described by prefixing them with human readable or binary packed codecs. `multicodec` offers a base table, but can also be extended with extra tables by application basis.
wip 2015-08-23 22:34:57 +00:00
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`## Table of Contents`
bring updates on updated multistream 2015-08-24 10:16:30 +00:00
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`- [Motivation](#motivation)`
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`- [How does it work? - Protocol Description](#how-does-it-work---protocol-description)`
			`- [The protocol path](#the-protocol-path)`
			`- [Multicodec table](#multicodec-table)`
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`- [Implementations](#implementations)`
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`- [Implementation details]()`
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`- [FAQ](#faq)`
			`- [Maintainers](#maintainers)`
			`- [Contribute](#contribute)`
			`- [License](#license)`
bring updates on updated multistream 2015-08-24 10:16:30 +00:00
			`## Motivation`
wip 2015-08-23 22:34:57 +00:00
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`Multicodecs are self-describing protocol/encoding streams. (Note that a file is a stream). It's designed to address the perennial problem:`

			`> I have a bitstring, what codec is the data coded with!?`

			`Instead of arguing about which data serialization library is the best, let's just pick the simplest one now, and build _upgradability_ into the system. Choices are never _forever_. Eventually all systems are changed. So, embrace this fact of reality, and build change into your system now.`

			`Multicodec frees you from the tyranny of past mistakes. Instead of trying to figure it all out beforehand, or continue using something that we can all agree no longer fits, why not allow the system to _evolve_ and _grow_ with the use cases of today, not yesterday.`

bring updates on updated multistream 2015-08-24 10:16:30 +00:00			`To decode an incoming stream of data, a program must either (a) know the format of the data a priori, or (b) learn the format from the data itself. (a) precludes running protocols that may provide one of many kinds of formats without prior agreement on which. multistream makes (b) neat using self-description.`
wip 2015-08-23 22:34:57 +00:00
bring updates on updated multistream 2015-08-24 10:16:30 +00:00			`Moreover, this self-description allows straightforward layering of protocols without having to implement support in the parent (or encapsulating) one.`

			`## How does it work? - Protocol Description`
wip 2015-08-23 22:34:57 +00:00
			`multicodec` is a _self-describing multiformat_, it wraps other formats with a tiny bit of self-description:

			```sh
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`<varint-len><multicodec><encoded-data>`
wip 2015-08-23 22:34:57 +00:00			```

			`For example, let's encode a json doc:`

fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			```JavaScript
			`// encode some json`
			`const buf = new Buffer(JSON.stringify({ hello: 'world' }))`

			`const prefixedBuf = multicodec.addPrefix('json', str) // prepends multicodec ('json')`
			`console.log(prefixedBuf)`
			`// <Buffer 06 2f 6a 73 6f 6e 2f 7b 22 68 65 6c 6c 6f 22 3a 22 77 6f 72 6c 64 22 7d>`

			`const.log(prefixedBuf.toString('hex'))`
			`// 062f6a736f6e2f7b2268656c6c6f223a22776f726c64227d`

			`// let's get the Multicodec, and get the data back`

			`const codec = multicodec.getMulticodec(prefixedBuf)`
			`console.log(codec)`
			`// json`

			`console.log(multicodec.rmPrefix(prefixedBuf).toString())`
			`// "{ \"hello\": \"world\" }`
wip 2015-08-23 22:34:57 +00:00			```

			So, `buf` is:

			```
			`hex: 062f6a736f6e2f7b2268656c6c6f223a22776f726c64227d`
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`ascii: /json/"{\"hello\":\"world\"}`
wip 2015-08-23 22:34:57 +00:00			```

fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`Note that on the ascii version, the varint at the beginning is not being represented, you should account that.`

			`For a binary packed version of the multicodecs, see [multicodec-packed](./multicodec-packed.md).`

			`## The protocol path`

			`multicodec` allows us to specify different protocols in a universal namespace, that way being able to recognize, multiplex, and embed them easily. We use the notion of a `path` instead of an `id` because it is meant to be a Unix-friendly URI.

			`A good path name should be decipherable -- meaning that if some machine or developer -- who has no idea about your protocol -- encounters the path string, they should be able to look it up and resolve how to use it.`

			`An example of a good path name is:`
wip 2015-08-23 22:34:57 +00:00
			```
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`/bittorrent.org/1.0`
wip 2015-08-23 22:34:57 +00:00			```

fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`An example of a _great_ path name is:`

			```
			`/ipfs/Qmaa4Rw81a3a1VEx4LxB7HADUAXvZFhCoRdBzsMZyZmqHD/ipfs.protocol`
			`/http/w3id.org/ipfs/1.1.0`
			```
added multicodec-packed beginning The table is still missing. 2016-08-25 01:18:07 +00:00
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			These path names happen to be resolvable -- not just in a "multicodec muxer(e.g [multistream](https://github.com/multiformats/multistream))" but -- in the internet as a whole (provided the program (or OS) knows how to use the `/ipfs` and `/http` protocols).
wip 2015-08-23 22:34:57 +00:00
fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`## Multicodec table`
wip 2015-08-23 22:34:57 +00:00
update table 2016-09-25 08:19:10 +00:00			`\| prefix \| codec \| description \| [packed](https://github.com/multiformats/multicodec/blob/master/multicodec-packed.md)\|`
			`\|-------------------------------\|---------------\|-------------------------\|-----------\|`
			`\| Miscelaneous \|`
			`\| 0x052f62696e2f \| /bin/ \| raw binary \| 0x00 \|`
			`\| Bases encodings \|`
			`\| 0x042f62322f \| /b2/ \| ascii base2 \| \|`
			`\| 0x052f6231362f \| /b16/ \| ascii base16 \| \|`
			`\| 0x052f6233322f \| /b32/ \| ascii base32 \| \|`
			`\| 0x052f6235382f \| /b58/ \| ascii base58 \| \|`
			`\| 0x052f6236342f \| /b64/ \| ascii base64 \| \|`
			`\| Serialization formats \|`
			`\| 0x062f6a736f6e2f \| /json/ \| \| \|`
			`\| 0x062f63626f722f \| /cbor/ \| \| \|`
			`\| 0x062f62736f6e2f \| /bson/ \| \| \|`
			`\| 0x072f626a736f6e2f \| /bjson/ \| \| \|`
			`\| 0x082f75626a736f6e2f \| /ubjson/ \| \| \|`
			`\| 0x0a2f70726f746f6275662f \| /protobuf/ \| Protocol Buffers \| \|`
			`\| 0x072f6361706e702f \| /capnp/ \| Cap-n-Proto \| \|`
			`\| 0x092f666c61746275662f \| /flatbuf/ \| FlatBuffers \| \|`
			`\| 0x052f726c702f \| /rlp/ \| recursive length prefix \| 0x60 \|`
			`\| Multiformats \|`
			`\| 0x182f6d756c7469636f6465632f \| /multicodec/ \| \| 0x40 \|`
			`\| 0x162f6d756c7469686173682f \| /multihash/ \| \| 0x41 \|`
			`\| 0x162f6d756c7469616464722f \| /multiaddr/ \| \| 0x42 \|`
			`\| Multihashes \|`
			`\| \| \| \| \|`
			`\| Multiaddrs \|`
			`\| \| \| \| \|`
			`\| Archiving formats \|`
			`\| 0x052f7461722f \| /tar/ \| \| \|`
			`\| 0x052f7a69702f \| /zip/ \| \| \|`
			`\| Image formats \|`
			`\| 0x052f706e672f \| /png/ \| \| \|`
			`\| \| /jpg/ \| \| \|`
			`\| Video formats \|`
			`\| \| /mp4/ \| \| \|`
			`\| \| /mkv/ \| \| \|`
			`\| Blockchain formats \|`
			`\| n/a \| n/a \| n/a \| n/a \|`
			`\| VCS formats \|`
			`\| n/a \| n/a \| n/a \| n/a \|`
			`\| IPLD formats \|`
			`\| \| /dag-pb/ \| MerkleDAG protobuf \| \|`
			`\| \| /dag-cbor/ \| MerkleDAG cbor \| \|`
			`\| \| /eth-rlp/ \| Ethereum Block RLP \| \|`


bring updates on updated multistream 2015-08-24 10:16:30 +00:00
			`## Implementations`

fix: fix some mistakes in the spec 2016-09-25 07:46:07 +00:00			`- multicodec`
			`- [go-multicodec](https://github.com/multiformats/go-multicodec)`
			`- multicodec-packed`
			`- [go-multicodec-packed](https://github.com/multiformats/go-multicodec-packed)`
			`- [js-multicodec-packed](https://github.com/multiformats/js-multicodec-packed)`
			`- multistream`
			`- [go-multistream](https://github.com/multiformats/go-multistream) - Implements multistream, which uses multicodec for stream negotiation`
			`- [js-multistream](https://github.com/multiformats/js-multistream) - Implements multistream, which uses multicodec for stream negotiation`
			`- [clj-multicodec](https://github.com/greglook/clj-multicodec)`
bring updates on updated multistream 2015-08-24 10:16:30 +00:00
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`## FAQ`
wip 2015-08-23 22:34:57 +00:00
			`> Q. Why?`

			`Today, people speak many languages, and use common ones to interface. But every "common language" has evolved over time, or even fundamentally switched. Why should we expect programs to be any different?`

			`And the reality is they're not. Programs use a variety of encodings. Today we like JSON. Yesterday, XML was all the rage. XDR solved everything, but it's kinda retro. Protobuf is still too cool for school. capnp ("cap and proto") is`
			`for cerealization hipsters.`

			`The one problem is figuring out what we're speaking. Humans are pretty smart, we pick up all sorts of languages over time. And we can always resort to pointing and grunting (the ascii of humanity).`

			`Programs have a harder time. You can't keep piping json into a protobuf decoder and hope they align. So we have to help them out a bit. That's what multicodec is for.`

			`> Q. Why "codec" and not "encoder" and "decoder"?`

			`Because they're the same thing. Which one of these is the encoder and which the decoder?`

			`5555 ----[ THING ]---> 8888`
the `5555 <---[ THING ]---- 5555` revelation is too enlightening 2016-08-17 00:03:13 +00:00			`5555 <---[ THING ]---- 8888`
Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00
added multicodec-packed beginning The table is still missing. 2016-08-25 01:18:07 +00:00			`> Q. Full paths are too big for my use case, is there something smaller?`

			`Yes, check out [multicodec-packed](./multicodec-packed.md). It uses a varint and a table to achieve the same thing.`

Standardized Readme See multiformats/multiformats#13 2016-08-15 20:53:49 +00:00			`## Maintainers`

			`Captain: [@jbenet](https://github.com/jbenet).`

			`## Contribute`

			`Contributions welcome. Please check out [the issues](https://github.com/multiformats/multicodec/issues).`

			`Check out our [contributing document](https://github.com/multiformats/multiformats/blob/master/contributing.md) for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS [Code of Conduct](https://github.com/ipfs/community/blob/master/code-of-conduct.md).`

			`## License`

nits 2016-08-25 01:20:48 +00:00			`[MIT](LICENSE)`
update table 2016-09-25 08:19:10 +00:00