From 000a0ecc525bbbfb564ef8bd4d819fa6fe001fb5 Mon Sep 17 00:00:00 2001 From: Jacek Sieka Date: Thu, 17 Sep 2020 23:23:54 +0200 Subject: [PATCH] initial e2store file format description (#1355) This is one way we could organize the flat file storage for blocks - the alternative would be to not do `type` in the file itself, but have a single type per file which arguably is simpler but may become annoying. Another potential restriction would be to require that blocks are ordered - with this format, it's a little bit more involved to recreate an index file, and it's easy to accidentally build in assumptions about the block order in the main data file. --- docs/e2store.md | 108 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 docs/e2store.md diff --git a/docs/e2store.md b/docs/e2store.md new file mode 100644 index 000000000..6091338d4 --- /dev/null +++ b/docs/e2store.md @@ -0,0 +1,108 @@ +# Introduction + +The `e2store` (extension: `.e2s`) is a simple linear [TLV](https://en.wikipedia.org/wiki/Type-length-value) file for storing arbitrary items typically encoded using serialization techniques used in ethereum 2 in general: SSZ, varint, snappy. + +# General structure + +`e2s` files consist of repeated type-length-value records. Each record is variable-length, and unknown records can easily be skipped. In particular, `e2s` files are designed to: + +* allow trivial implementations that are easy to analyze +* allow append-only implementations +* allow future record types to be added + +The type and length are encoded in an 8-byte header which is directly followed by data. + +``` +record = header | data +header = type | length +type = Vector[byte, 2] +length = Vector[byte, 6] +``` + +The `length` is the first 6 bytes of a little-endian encoded `uint64`, not including the header itself. For example, the entry with header type `[0x22, 0x32]`, the length `4` and the bytes `[0x01, 0x02, 0x03, 0x04]` will be stored as the byte sequence `[0x22, 0x32, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x02, 0x03, 0x04]`. + +## Reading + +In a loop, the following pseudocode can be used to read the file: + +``` +while file.bytesRemaining > 0: + if file.bytesRemaining < 8: + abort("Header missing") + + header = read(file, 8) + type = header[0:2] + length = fromLittleEndian(header[2:8]) + + if file.bytesRemaining < length: + abort("Not enough data") + + data = read(file, length) + + if type == ...: + # process the data + else: + # Unkown record type, skip +``` + +## Writing + +`e2s` files are linear and append-only. To write a new entry, simply append it to the end of the file. In a separate transaction, the index file may be updated also. + +Since the files are append-only, `e2s` files are suitable in particular for finalized blocks only. + +# Known types + +## Version + +``` +type: [0x65, 0x32] +data: Vector[byte, 0] +``` + +The `version` type must be the first record in the file. Its type is `[0x65, 0x32]` (`e2` in ascii) and the length of its data field is always 0, thus the first 8 bytes of an `e2s` file are always `[0x65, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]`. + +## CompressedSignedBeaconBlock + +``` +type: [0x01, 0x00] +data: snappyFramed(length-varint | ssz(SignedBeaconBlock)) +``` + +`CompressedSignedBeackBlock` entries are entries whose data field matches the payload of `BeaconBlocksByRange` and `BeaconBlocksByRoot` chunks in the phase0 p2p specification. In particular, the SignedBeaconBlock is serialized using SSZ, prefixed with a varint-length, then compressed using the snappy [framing format](https://github.com/google/snappy/blob/master/framing_format.txt). + +# Slot Index files + +Index files are files that store indices to linear histories of entries. They consist of offsets that point the the beginning of the corresponding record. Index files start with an 8-byte header, followed by a series of `uint64` encoded as little endian bytes. An index of 0 idicates that there is no data for the given slot. + +Each entry in the slot index is fixed-length, meaning that the entry for slot `N` can be found at index `(N * 8) + 8` in the index file. Index files only support linear histories, meaning that the blocks that they point to must have passed finalization. + +By convention, slot index files have the name `.e2i`. + +``` +header | index | index | index ... +``` + +## IndexVersion + +The `version` header of an index file consists of the bytes `[0x69, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]`. + +## Index + +Index entries are `uint64` offsets, encoded as little-endian, from the beginning of the store file to the corresponding entry. + +## Reading + +``` + if failed(setpos(indexfile, slot * 8 + 8)): + abort("no data for the given slot") + + offset = fromLittleEndian(read(indexfile, 8)) + if offset == 0: + abort("no data for the given slot") + + if failed(setpos(datafile, offset)): + abort("index file corrupt, data not found at offset") + header = read(datafile, 8) + # as above +```