nimbus-eth2/docs/e2store.md
Jacek Sieka 000a0ecc52
initial e2store file format description (#1355)
This is one way we could organize the flat file storage for blocks - the
alternative would be to not do `type` in the file itself, but have a
single type per file which arguably is simpler but may become annoying.

Another potential restriction would be to require that blocks are
ordered - with this format, it's a little bit more involved to recreate
an index file, and it's easy to accidentally build in assumptions about
the block order in the main data file.
2020-09-17 23:23:54 +02:00

3.9 KiB

Introduction

The e2store (extension: .e2s) is a simple linear TLV file for storing arbitrary items typically encoded using serialization techniques used in ethereum 2 in general: SSZ, varint, snappy.

General structure

e2s files consist of repeated type-length-value records. Each record is variable-length, and unknown records can easily be skipped. In particular, e2s files are designed to:

  • allow trivial implementations that are easy to analyze
  • allow append-only implementations
  • allow future record types to be added

The type and length are encoded in an 8-byte header which is directly followed by data.

record = header | data
header = type | length
type = Vector[byte, 2]
length = Vector[byte, 6]

The length is the first 6 bytes of a little-endian encoded uint64, not including the header itself. For example, the entry with header type [0x22, 0x32], the length 4 and the bytes [0x01, 0x02, 0x03, 0x04] will be stored as the byte sequence [0x22, 0x32, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x02, 0x03, 0x04].

Reading

In a loop, the following pseudocode can be used to read the file:

while file.bytesRemaining > 0:
  if file.bytesRemaining < 8:
    abort("Header missing")

  header = read(file, 8)
  type = header[0:2]
  length = fromLittleEndian(header[2:8])

  if file.bytesRemaining < length:
    abort("Not enough data")

  data = read(file, length)

  if type == ...:
    # process the data
  else:
    # Unkown record type, skip

Writing

e2s files are linear and append-only. To write a new entry, simply append it to the end of the file. In a separate transaction, the index file may be updated also.

Since the files are append-only, e2s files are suitable in particular for finalized blocks only.

Known types

Version

type: [0x65, 0x32]
data: Vector[byte, 0]

The version type must be the first record in the file. Its type is [0x65, 0x32] (e2 in ascii) and the length of its data field is always 0, thus the first 8 bytes of an e2s file are always [0x65, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00].

CompressedSignedBeaconBlock

type: [0x01, 0x00]
data: snappyFramed(length-varint | ssz(SignedBeaconBlock))

CompressedSignedBeackBlock entries are entries whose data field matches the payload of BeaconBlocksByRange and BeaconBlocksByRoot chunks in the phase0 p2p specification. In particular, the SignedBeaconBlock is serialized using SSZ, prefixed with a varint-length, then compressed using the snappy framing format.

Slot Index files

Index files are files that store indices to linear histories of entries. They consist of offsets that point the the beginning of the corresponding record. Index files start with an 8-byte header, followed by a series of uint64 encoded as little endian bytes. An index of 0 idicates that there is no data for the given slot.

Each entry in the slot index is fixed-length, meaning that the entry for slot N can be found at index (N * 8) + 8 in the index file. Index files only support linear histories, meaning that the blocks that they point to must have passed finalization.

By convention, slot index files have the name .e2i.

header | index | index | index ...

IndexVersion

The version header of an index file consists of the bytes [0x69, 0x32, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00].

Index

Index entries are uint64 offsets, encoded as little-endian, from the beginning of the store file to the corresponding entry.

Reading

  if failed(setpos(indexfile, slot * 8 + 8)):
    abort("no data for the given slot")

  offset = fromLittleEndian(read(indexfile, 8))
  if offset == 0:
    abort("no data for the given slot")

  if failed(setpos(datafile, offset)):
    abort("index file corrupt, data not found at offset")
  header = read(datafile, 8)
  # as above