2018-10-02 13:46:22 +00:00
# [WIP] SimpleSerialize (SSZ) Spec
2018-09-23 23:52:38 +00:00
2018-10-03 05:08:20 +00:00
This is the **work in progress** document to describe `simpleserialize` , the
2018-09-23 23:52:38 +00:00
current selected serialization method for Ethereum 2.0 using the Beacon Chain.
This document specifies the general information for serializing and
deserializing objects and data types.
## ToC
* [About ](#about )
* [Terminology ](#terminology )
* [Constants ](#constants )
* [Overview ](#overview )
2018-10-02 13:33:11 +00:00
+ [Serialize/Encode ](#serializeencode )
- [uint: 8/16/24/32/64/256 ](#uint-816243264256 )
- [Address ](#address )
- [Hash ](#hash )
* [Hash32 ](#hash32 )
* [Hash96 ](#hash96 )
* [Hash97 ](#hash97 )
- [Bytes ](#bytes )
2018-10-02 13:42:25 +00:00
- [List/Vectors ](#listvectors )
2018-10-03 05:08:20 +00:00
- [Container (TODO) ](#container )
2018-10-02 13:33:11 +00:00
+ [Deserialize/Decode ](#deserializedecode )
- [uint: 8/16/24/32/64/256 ](#uint-816243264256-1 )
- [Address ](#address-1 )
- [Hash ](#hash-1 )
* [Hash32 ](#hash32-1 )
* [Hash96 ](#hash96-1 )
* [Hash97 ](#hash97-1 )
- [Bytes ](#bytes-1 )
2018-10-02 13:42:25 +00:00
- [List/Vectors ](#listvectors-1 )
2018-10-03 05:08:20 +00:00
- [Container (TODO) ](#container-1 )
2018-09-23 23:52:38 +00:00
* [Implementations ](#implementations )
## About
2018-10-02 02:34:20 +00:00
`SimpleSerialize` was first proposed by Vitalik Buterin as the serialization
2018-09-23 23:52:38 +00:00
protocol for use in the Ethereum 2.0 Beacon Chain.
The core feature of `ssz` is the simplicity of the serialization with low
overhead.
## Terminology
| Term | Definition |
|:-------------|:-----------------------------------------------------------------------------------------------|
| `big` | Big Endian |
| `byte_order` | Specifies [endianness: ](https://en.wikipedia.org/wiki/Endianness ) Big Endian or Little Endian. |
| `len` | Length/Number of Bytes. |
| `to_bytes` | Convert to bytes. Should take parameters ``size`` and ``byte_order``. |
2018-10-02 22:17:29 +00:00
| `from_bytes` | Convert from bytes to object. Should take ``bytes`` and ``byte_order``. |
2018-09-23 23:52:38 +00:00
| `value` | The value to serialize. |
| `rawbytes` | Raw serialized bytes. |
## Constants
2018-10-02 22:17:29 +00:00
| Constant | Value | Definition |
|:---------------|:-----:|:--------------------------------------------------------------------------------------|
| `LENGTH_BYTES` | 4 | Number of bytes used for the length added before a variable-length serialized object. |
2018-09-23 23:52:38 +00:00
## Overview
### Serialize/Encode
2018-10-01 23:41:18 +00:00
#### uint: 8/16/24/32/64/256
2018-09-23 23:52:38 +00:00
Convert directly to bytes the size of the int. (e.g. ``uint16 = 2 bytes``)
All integers are serialized as **big endian** .
2018-10-02 22:17:29 +00:00
| Check to perform | Code |
|:-----------------------|:----------------------|
| Size is a byte integer | ``int_size % 8 == 0`` |
2018-09-23 23:52:38 +00:00
```python
2018-10-02 22:17:29 +00:00
assert(int_size % 8 == 0)
2018-09-23 23:52:38 +00:00
buffer_size = int_size / 8
return value.to_bytes(buffer_size, 'big')
```
2018-10-11 13:51:35 +00:00
#### bool
Convert directly to a single 0x00 or 0x01 byte.
| Check to perform | Code |
|:------------------|:---------------------------|
| Value is boolean | ``value in (True, False)`` |
```python
assert(value in (True, False))
return b'\x01' if value is True else b'\x00'
```
2018-09-23 23:52:38 +00:00
#### Address
The address should already come as a hash/byte format. Ensure that length is
**20**.
| Check to perform | Code |
|:-----------------------|:---------------------|
| Length is correct (20) | ``len(value) == 20`` |
```python
assert( len(value) == 20 )
return value
```
2018-10-02 13:33:11 +00:00
#### Hash
2018-09-23 23:52:38 +00:00
2018-10-02 13:33:11 +00:00
| Hash Type | Usage |
|:---------:|:------------------------------------------------|
| `hash32` | Hash size of ``keccak`` or `blake2b[0.. < 32]` . |
| `hash96` | BLS Public Key Size. |
| `hash97` | BLS Public Key Size with recovery bit. |
2018-09-23 23:52:38 +00:00
2018-10-02 13:33:11 +00:00
| Checks to perform | Code |
|:-----------------------------------|:---------------------|
| Length is correct (32) if `hash32` | ``len(value) == 32`` |
| Length is correct (96) if `hash96` | ``len(value) == 96`` |
| Length is correct (97) if `hash97` | ``len(value) == 97`` |
**Example all together**
```python
if (type(value) == 'hash32'):
assert(len(value) == 32)
elif (type(value) == 'hash96'):
assert(len(value) == 96)
elif (type(value) == 'hash97'):
assert(len(value) == 97)
else:
raise TypeError('Invalid hash type supplied')
return value
```
##### Hash32
Ensure 32 byte length and return the bytes.
```python
assert(len(value) == 32)
return value
```
##### Hash96
Ensure 96 byte length and return the bytes.
2018-09-23 23:52:38 +00:00
```python
2018-10-02 13:33:11 +00:00
assert(len(value) == 96)
return value
```
##### Hash97
Ensure 97 byte length and return the bytes.
```python
assert(len(value) == 97)
2018-09-23 23:52:38 +00:00
return value
```
#### Bytes
For general `byte` type:
2018-10-02 22:17:29 +00:00
1. Get the length/number of bytes; Encode into a `4-byte` integer.
2018-09-23 23:52:38 +00:00
2. Append the value to the length and return: ``[ length_bytes ] + [
value_bytes ]``
2018-10-02 00:36:58 +00:00
| Check to perform | Code |
|:-------------------------------------|:-----------------------|
| Length of bytes can fit into 4 bytes | ``len(value) < 2 * * 32 ` ` |
2018-09-23 23:52:38 +00:00
```python
2018-10-02 22:17:29 +00:00
assert(len(value) < 2 * * 32 )
2018-10-02 02:34:20 +00:00
byte_length = (len(value)).to_bytes(LENGTH_BYTES, 'big')
2018-09-23 23:52:38 +00:00
return byte_length + value
```
2018-10-02 13:42:25 +00:00
#### List/Vectors
2018-09-23 23:52:38 +00:00
2018-10-27 11:36:10 +00:00
Lists are a collection of elements of the same homogeneous type.
2018-10-02 22:17:29 +00:00
| Check to perform | Code |
|:--------------------------------------------|:----------------------------|
| Length of serialized list fits into 4 bytes | ``len(serialized) < 2 * * 32 ` ` |
1. Get the number of raw bytes to serialize: it is ``len(list) * sizeof(element)``.
2018-10-02 13:42:25 +00:00
* Encode that as a `4-byte` **big endian** `uint32` .
2018-10-02 22:17:29 +00:00
2. Append the elements in a packed manner.
2018-10-02 13:42:25 +00:00
* *Note on efficiency*: consider using a container that does not need to iterate over all elements to get its length. For example Python lists, C++ vectors or Rust Vec.
**Example in Python**
2018-09-23 23:52:38 +00:00
```python
2018-10-02 22:17:29 +00:00
serialized_list_string = b''
2018-09-23 23:52:38 +00:00
for item in value:
serialized_list_string += serialize(item)
2018-10-02 22:17:29 +00:00
assert(len(serialized_list_string) < 2 * * 32 )
2018-10-02 02:34:20 +00:00
serialized_len = (len(serialized_list_string).to_bytes(LENGTH_BYTES, 'big'))
2018-09-23 23:52:38 +00:00
return serialized_len + serialized_list_string
```
2018-10-03 05:08:20 +00:00
#### Container
```
########################################
TODO
########################################
```
2018-09-23 23:52:38 +00:00
### Deserialize/Decode
The decoding requires knowledge of the type of the item to be decoded. When
performing decoding on an entire serialized string, it also requires knowledge
2018-10-02 22:17:29 +00:00
of the order in which the objects have been serialized.
2018-09-23 23:52:38 +00:00
Note: Each return will provide ``deserialized_object, new_index`` keeping track
of the new index.
At each step, the following checks should be made:
2018-10-02 22:17:29 +00:00
| Check to perform | Check |
|:-------------------------|:-----------------------------------------------------------|
| Ensure sufficient length | ``length(rawbytes) >= current_index + deserialize_length`` |
2018-09-23 23:52:38 +00:00
2018-10-01 23:41:18 +00:00
#### uint: 8/16/24/32/64/256
2018-09-23 23:52:38 +00:00
Convert directly from bytes into integer utilising the number of bytes the same
size as the integer length. (e.g. ``uint16 == 2 bytes``)
All integers are interpreted as **big endian** .
```python
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) >= current_index + int_size)
2018-09-23 23:52:38 +00:00
byte_length = int_size / 8
new_index = current_index + int_size
return int.from_bytes(rawbytes[current_index:current_index+int_size], 'big'), new_index
```
2018-10-11 13:51:35 +00:00
#### Bool
Return True if 0x01, False if 0x00.
```python
assert rawbytes in (b'\x00', b'\x01')
return True if rawbytes == b'\x01' else False
```
2018-09-23 23:52:38 +00:00
#### Address
Return the 20 bytes.
```python
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) >= current_index + 20)
2018-09-23 23:52:38 +00:00
new_index = current_index + 20
return rawbytes[current_index:current_index+20], new_index
```
2018-10-02 13:33:11 +00:00
#### Hash
##### Hash32
2018-09-23 23:52:38 +00:00
Return the 32 bytes.
```python
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) >= current_index + 32)
2018-09-23 23:52:38 +00:00
new_index = current_index + 32
return rawbytes[current_index:current_index+32], new_index
```
2018-10-02 13:33:11 +00:00
##### Hash96
Return the 96 bytes.
```python
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) >= current_index + 96)
2018-10-02 13:33:11 +00:00
new_index = current_index + 96
return rawbytes[current_index:current_index+96], new_index
```
##### Hash97
Return the 97 bytes.
```python
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) >= current_index + 97)
2018-10-02 13:33:11 +00:00
new_index = current_index + 97
return rawbytes[current_index:current_index+97], new_index
```
2018-09-23 23:52:38 +00:00
#### Bytes
Get the length of the bytes, return the bytes.
2018-10-02 22:17:29 +00:00
| Check to perform | code |
|:--------------------------------------------------|:-------------------------------------------------|
| rawbytes has enough left for length | ``len(rawbytes) > current_index + LENGTH_BYTES`` |
| bytes to return not greater than serialized bytes | ``len(rawbytes) > bytes_end `` |
2018-09-23 23:52:38 +00:00
```python
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) > current_index + LENGTH_BYTES)
2018-10-02 02:34:20 +00:00
bytes_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big')
2018-10-02 22:17:29 +00:00
bytes_start = current_index + LENGTH_BYTES
bytes_end = bytes_start + bytes_length
new_index = bytes_end
assert(len(rawbytes) >= bytes_end)
return rawbytes[bytes_start:bytes_end], new_index
2018-09-23 23:52:38 +00:00
```
2018-10-02 13:42:25 +00:00
#### List/Vectors
2018-09-23 23:52:38 +00:00
2018-10-27 11:36:10 +00:00
Deserialize each element in the list.
2018-09-23 23:52:38 +00:00
1. Get the length of the serialized list.
2018-10-01 23:41:18 +00:00
2. Loop through deserializing each item in the list until you reach the
2018-09-23 23:52:38 +00:00
entire length of the list.
2018-10-02 22:17:29 +00:00
| Check to perform | code |
|:------------------------------------------|:----------------------------------------------------------------|
| rawbytes has enough left for length | ``len(rawbytes) > current_index + LENGTH_BYTES`` |
| list is not greater than serialized bytes | ``len(rawbytes) > current_index + LENGTH_BYTES + total_length`` |
2018-09-23 23:52:38 +00:00
```python
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) > current_index + LENGTH_BYTES)
2018-10-02 02:34:20 +00:00
total_length = int.from_bytes(rawbytes[current_index:current_index + LENGTH_BYTES], 'big')
new_index = current_index + LENGTH_BYTES + total_length
2018-10-02 22:17:29 +00:00
assert(len(rawbytes) >= new_index)
2018-10-02 02:34:20 +00:00
item_index = current_index + LENGTH_BYTES
2018-09-23 23:52:38 +00:00
deserialized_list = []
while item_index < new_index:
object, item_index = deserialize(rawbytes, item_index, item_type)
deserialized_list.append(object)
return deserialized_list, new_index
```
2018-10-03 05:08:20 +00:00
#### Container
```
########################################
TODO
########################################
```
2018-09-23 23:52:38 +00:00
## Implementations
2018-10-02 00:36:58 +00:00
| Language | Implementation | Description |
|:--------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|
| Python | [ https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py ) | Beacon chain reference implementation written in Python. |
2018-10-10 07:14:53 +00:00
| Rust | [ https://github.com/sigp/lighthouse/tree/master/beacon_chain/utils/ssz ](https://github.com/sigp/lighthouse/tree/master/beacon_chain/utils/ssz ) | Lighthouse (Rust Ethereum 2.0 Node) maintained SSZ. |
2018-10-02 00:36:58 +00:00
| Nim | [ https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim ](https://github.com/status-im/nim-beacon-chain/blob/master/beacon_chain/ssz.nim ) | Nim Implementation maintained SSZ. |
| Rust | [ https://github.com/paritytech/shasper/tree/master/util/ssz ](https://github.com/paritytech/shasper/tree/master/util/ssz ) | Shasper implementation of SSZ maintained by ParityTech. |
2018-10-14 20:29:47 +00:00
| Javascript | [ https://github.com/ChainSafeSystems/ssz-js/blob/master/src/index.js ](https://github.com/ChainSafeSystems/ssz-js/blob/master/src/index.js ) | Javascript Implementation maintained SSZ |