From 4768ec89f680655574f68425802e5140fe5c8c6d Mon Sep 17 00:00:00 2001 From: Dankrad Feist Date: Tue, 27 Aug 2019 11:45:17 +0100 Subject: [PATCH 1/2] SSZ clarifications on deserialization --- specs/simple-serialize.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/specs/simple-serialize.md b/specs/simple-serialize.md index 588200f20..6e250fd81 100644 --- a/specs/simple-serialize.md +++ b/specs/simple-serialize.md @@ -129,7 +129,7 @@ return bytes(array) ### `Bitlist[N]` -Note that from the offset coding, the length (in bytes) of the bitlist is known. An additional leading `1` bit is added so that the length in bits will also be known. +Note that from the offset coding, the length (in bytes) of the bitlist is known. An additional `1` bit is added at position `N` where `N` is the legnth of the bitlist so that the length in bits will also be known. ```python array = [0] * ((len(value) // 8) + 1) @@ -171,7 +171,15 @@ return serialized_type_index + serialized_bytes ## Deserialization -Because serialization is an injective function (i.e. two distinct objects of the same type will serialize to different values) any bytestring has at most one object it could deserialize to. Efficient algorithms for computing this object can be found in [the implementations](#implementations). +Because serialization is an injective function (i.e. two distinct objects of the same type will serialize to different values) any bytestring has at most one object it could deserialize to. + +Deserialization can be implemented using a recursive algorithm. The deserialization of basic objects is easy, and from there we can find a simple recursive algorithm for all fixed-size objects. For variable-size objects we have to do one of the following depending on what kind of object it is: + +* Vector/list of a variable-size object: The serialized data will start with offsets of all the serialized objects (`BYTES_PER_LENGTH_OFFSET` bytes each). + * Using the first offset, we can compute the length of the list (divide by `BYTES_PER_LENGTH_OFFSET`), as it gives us the total number of bytes in the offset data. + * The size of each object in the vector/list can be inferred from the difference of two offsets. To get the size of the last object, the total number of bytes has to be known (it is not generally possible to deserialize an SSZ object of unknown length) +* Containers follow the same principles as vectors, with the difference that there may be fixed-size objects in a container as well. This means the `fixed_parts` data will contain offsets as well as fixed-size objects. +* In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit in position `N` where `N` is the length of the list that is always set. This bit has to be used to infer the size of the bitlist in bits. Note that deserialization requires hardening against invalid inputs. A non-exhaustive list: @@ -179,6 +187,8 @@ Note that deserialization requires hardening against invalid inputs. A non-exhau - Scope: Extra unused bytes, not aligned with element size. - More elements than a list limit allows. Part of enforcing consensus. +Efficient algorithms for computing this object can be found in [the implementations](#implementations). + ## Merkleization We first define helper functions: From e984d10a0cc3b19fc9c03b9ef731a0ccac5dc6b6 Mon Sep 17 00:00:00 2001 From: protolambda Date: Fri, 25 Oct 2019 12:02:12 +0200 Subject: [PATCH 2/2] fix typo, and fix bitlist end-bit description --- specs/simple-serialize.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specs/simple-serialize.md b/specs/simple-serialize.md index 6e250fd81..fdd5a26ca 100644 --- a/specs/simple-serialize.md +++ b/specs/simple-serialize.md @@ -129,7 +129,7 @@ return bytes(array) ### `Bitlist[N]` -Note that from the offset coding, the length (in bytes) of the bitlist is known. An additional `1` bit is added at position `N` where `N` is the legnth of the bitlist so that the length in bits will also be known. +Note that from the offset coding, the length (in bytes) of the bitlist is known. An additional `1` bit is added to the end, at index `e` where `e` is the length of the bitlist (not the limit), so that the length in bits will also be known. ```python array = [0] * ((len(value) // 8) + 1) @@ -179,7 +179,7 @@ Deserialization can be implemented using a recursive algorithm. The deserializat * Using the first offset, we can compute the length of the list (divide by `BYTES_PER_LENGTH_OFFSET`), as it gives us the total number of bytes in the offset data. * The size of each object in the vector/list can be inferred from the difference of two offsets. To get the size of the last object, the total number of bytes has to be known (it is not generally possible to deserialize an SSZ object of unknown length) * Containers follow the same principles as vectors, with the difference that there may be fixed-size objects in a container as well. This means the `fixed_parts` data will contain offsets as well as fixed-size objects. -* In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit in position `N` where `N` is the length of the list that is always set. This bit has to be used to infer the size of the bitlist in bits. +* In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit at the end that is always set. This bit has to be used to infer the size of the bitlist in bits. Note that deserialization requires hardening against invalid inputs. A non-exhaustive list: