Merge pull request #1381 from ethereum/dankrad-patch-11

SSZ clarifications on deserialization
2019-10-25 18:09:37 +08:00 · 2019-10-25 18:09:37 +08:00 · f1bf0bf85b
parent c9c4a6c823 e984d10a0c
commit f1bf0bf85b
1 changed files with 12 additions and 2 deletions
--- a/specs/simple-serialize.md
+++ b/specs/simple-serialize.md
@ -139,7 +139,7 @@ return bytes(array)

 ### `Bitlist[N]`

-Note that from the offset coding, the length (in bytes) of the bitlist is known. An additional leading `1` bit is added so that the length in bits will also be known.
+Note that from the offset coding, the length (in bytes) of the bitlist is known. An additional `1` bit is added to the end, at index `e` where `e` is the length of the bitlist (not the limit), so that the length in bits will also be known.

 ```python
 array = [0] * ((len(value) // 8) + 1)
@ -181,7 +181,15 @@ return serialized_type_index + serialized_bytes

 ## Deserialization

-Because serialization is an injective function (i.e. two distinct objects of the same type will serialize to different values) any bytestring has at most one object it could deserialize to. Efficient algorithms for computing this object can be found in [the implementations](#implementations).
+Because serialization is an injective function (i.e. two distinct objects of the same type will serialize to different values) any bytestring has at most one object it could deserialize to. 
+
+Deserialization can be implemented using a recursive algorithm. The deserialization of basic objects is easy, and from there we can find a simple recursive algorithm for all fixed-size objects. For variable-size objects we have to do one of the following depending on what kind of object it is:
+
+* Vector/list of a variable-size object: The serialized data will start with offsets of all the serialized objects (`BYTES_PER_LENGTH_OFFSET` bytes each).
+  * Using the first offset, we can compute the length of the list (divide by `BYTES_PER_LENGTH_OFFSET`), as it gives us the total number of bytes in the offset data.
+  * The size of each object in the vector/list can be inferred from the difference of two offsets. To get the size of the last object, the total number of bytes has to be known (it is not generally possible to deserialize an SSZ object of unknown length)
+* Containers follow the same principles as vectors, with the difference that there may be fixed-size objects in a container as well. This means the `fixed_parts` data will contain offsets as well as fixed-size objects.
+* In the case of bitlists, the length in bits cannot be uniquely inferred from the number of bytes in the object. Because of this, they have a bit at the end that is always set. This bit has to be used to infer the size of the bitlist in bits.

 Note that deserialization requires hardening against invalid inputs. A non-exhaustive list:

@ -189,6 +197,8 @@ Note that deserialization requires hardening against invalid inputs. A non-exhau
 - Scope: Extra unused bytes, not aligned with element size.
 - More elements than a list limit allows. Part of enforcing consensus.

+Efficient algorithms for computing this object can be found in [the implementations](#implementations).
+
 ## Merkleization

 We first define helper functions: