NimYAML/doc/serialization.txt
Felix Krause 48aeff20c0 Use variant object types for heterogeneous data
* Made variant object types work (really this time)
 * Added `markAsImplicit`
 * Implemented implicit variant object types
 * Added documentation
2016-06-05 19:29:16 +02:00

341 lines
14 KiB
Plaintext

======================
Serialization Overview
======================
Introduction
============
NimYAML tries hard to make transforming YAML characters streams to native Nim
types and vice versa as easy as possible. In simple scenarios, you might not
need anything else than the two procs
`dump <yaml.html#dump,K,Stream,PresentationStyle,TagStyle,AnchorStyle,int>`_ and
`load <yaml.html#load,Stream,K>`_. On the other side, the process should be as
customizable as possible to allow the user to tightly control how the generated
YAML character stream will look and how a YAML character stream is interpreted.
An important thing to remember in NimYAML is that unlike in interpreted
languages like Ruby, Nim cannot load a YAML character stream without knowing the
resulting type beforehand. For example, if you want to load this piece of YAML:
.. code-block:: yaml
%YAML 1.2
--- !nim:system:seq(nim:system:int8)
- 1
- 2
- 3
You would need to know that it will load a ``seq[int8]`` *at compile time*. This
is not really a problem because without knowing which type you will load, you
cannot do anything useful with the result afterwards in the code. But it may be
unfamiliar for programmers who are used to the YAML libraries of Python or Ruby.
Supported Types
===============
NimYAML supports a growing number of types of Nim's ``system`` module and
standard library, and it also supports user-defined object, tuple and enum types
out of the box.
**Important**: NimYAML currently does not support polymorphism or variant
object types. This may be added in the future.
This also means that NimYAML is generally able to work with object, tuple and
enum types defined in the standard library or a third-party library without
further configuration.
Scalar Types
------------
The following integer types are supported by NimYAML: ``int``, ``int8``,
``int16``, ``int32``, ``int64``, ``uint8``, ``uint16``, ``uint32``, ``uint64``.
Note that the ``int`` type has a variable size dependent on the target
operation system. To make sure that it round-trips properly between 32-bit and
64-bit operating systems, it will be converted to an ``int32`` during loading
and dumping. This will raise an exception for values outside of the range
``int32.low .. int32.high``! If you define the types you serialize yourself,
always consider using an integer type with explicit length. The same goes for
``uint``.
The floating point types ``float``, ``float32`` and ``float64`` are also
supported. There is currently no problem with ``float``, because it is always a
``float64``.
``string`` is supported and one of the few Nim types which directly map to a
standard YAML type. NimYAML is able to handle strings that are ``nil``, they
will be serialized with the special tag ``!nim:nil:string``. ``char`` is also
supported.
To support new scalar types, you must implement the ``constructObject()`` and
``representObject()`` procs on that type (see below).
Collection Types
----------------
NimYAML supports Nim's ``array``, ``set``, ``seq``, ``Table`` and
``OrderedTable`` types out of the box. Unlike the native YAML types ``!!seq``
and ``!!map``, Nim's collection types define the type of all their contained
items (or keys and values). So YAML objects with heterogeneous types in them
cannot be loaded to Nim collection types. For example, this sequence:
.. code-block:: yaml
%YAML 1.2
--- !!seq
- !!int 1
- !!string foo
Cannot be loaded to a Nim ``seq``. For this reason, you cannot load YAML's
native ``!!map`` and ``!!seq`` types directly into Nim types. However, you can
use variant object types to process heterogeneous value lists, see below.
Nim ``seq`` types may be ``nil``. This is handled by serializing them to an
empty scalar with the tag ``!nim:nil:seq``.
Reference Types
---------------
A reference to any supported non-reference type (including user defined types,
see below) is supported by NimYAML. A reference type will be treated like its
base type, but NimYAML is able to detect multiple references to the same object
and dump the structure properly with anchors and aliases in place. It is
possible to dump and load cyclic data structures without further configuration.
It is possible for reference types to hold a ``nil`` value, which will be mapped
to the ``!!null`` YAML scalar type.
Pointer types are not supported because it seems dangerous to automatically
allocate memory which the user must then manually deallocate.
User Defined Types
------------------
For an object or tuple type to be directly usable with NimYAML, the following
conditions must be met:
- Every type contained in the object/tuple must be supported
- All fields of an object type must be accessible from the code position where
you call NimYAML. If an object has non-public member fields, it can only be
processed in the module where it is defined.
- The object may not have a generic parameter
NimYAML will present enum types as YAML scalars, and tuple and object types as
YAML maps. Some of the conditions above may be loosened in future releases.
Variant Object Types
....................
A *variant object type* is an object type that contains one or more ``case``
clauses. NimYAML currently supports variant object types. However, this feature
is **highly experimental**. Only the currently accessible fields of a variant
object type are dumped, and only those may be present when loading. The
discriminator field(s) are treated like all other fields. The value of a
discriminator field must occur before any value of a field that depends on it.
This violates the YAML specification and therefore will be changed in the
future.
While dumping variant object types directly is currently not production ready,
you can use them for processing heterogeneous data sets. For example, if you
have a YAML document which contains differently typed values in the same list
like this:
.. code-block:: yaml
%YAML 1.2
---
- 42
- this is a string
- !!null
You can define a variant object type that can hold all types that occur in this
list in order to load it:
.. code-block:: nim
import yaml
type
ContainerKind = enum
ckInt, ckString, ckNone
Container = object
case kind: ContainerKind
of ckInt:
intVal: int
of ckString:
strVal: string
of ckNone:
discard
markAsImplicit(Container)
var
list: seq[Container]
s = newFileStream("in.yaml")
load(s, list)
``markAsImplicit`` tells NimYAML that you want to use the type ``Container``
implicitly, i.e. its fields are not visible in YAML, and are set dependent on
the value type that gets loaded into it. The type ``Container`` must fullfil the
following requirements:
- It must contain exactly one ``case`` clause, and nothing else.
- Each branch of the ``case`` clause must contain exactly one field, with one
exception: There may be at most one branch that contains no field at all.
- It must not be a derived object type (this is currently not enforced)
When loading the sequence, NimYAML writes the value into the first field that
can hold the value's type. All complex values (i.e. non-scalar values) *must*
have a tag in the YAML source, because NimYAML would otherwise be unable to
determine their type. The type of scalar values will be guessed if no tag is
available, but be aware that ``42`` can fit in both ``int8`` and ``int16``, so
in the case you have fields for both types, you should annotate the value.
When dumping the sequence, NimYAML will always annotate a tag to each value it
outputs. This is to avoid possible ambiguity when loading. If a branch without
a field exists, it is represented as a ``!!null`` value.
Tags
====
NimYAML uses local tags to represent Nim types that do not map directly to a
YAML type. For example, ``int8`` is presented with the tag ``!nim:system:int8``.
Tags are mostly unnecessary when loading YAML data because the user already
defines the target Nim type which usually defines all types of the structure.
However, there is one case where a tag is necessary: A reference type with the
value ``nil`` is represented in YAML as a ``!!null`` scalar. This will be
automatically detected by type guessing, but if it is for example a reference to
a string with the value ``"~"``, it must be tagged with ``!!string``, because
otherwise, it would be loaded as ``nil``.
As you might have noticed in the example above, the YAML tag of a ``seq``
depends on its generic type parameter. The same applies to ``Table``. So, a
table that maps ``int8`` to string sequences would be presented with the tag
``!nim:tables:Table(nim:system:int8,nim:system:seq(tag:yaml.org,2002:string))``.
These tags are generated on the fly based on the types you instantiate
``Table`` or ``seq`` with.
You may customize the tags used for your types by using the template
`setTagUri <yaml.html#setTagUri.t,typedesc,string>`_. It may not
be applied to scalar and collection types implemented by NimYAML, but you can
for example use it on a certain ``seq`` type:
.. code-block:: nim
setTagUri(seq[string], "!nim:my:seq")
Customize Serialization
=======================
It is possible to customize the serialization of a type. For this, you need to
implement two procs, ``constructObject̀`` and ``representObject``. If you only
need to process the type in one direction (loading or dumping), you can omit
the other proc.
constructObject
---------------
.. code-block:: nim
proc constructObject*(s: var YamlStream, c: ConstructionContext,
result: var MyObject)
{.raises: [YamlConstructionError, YamlStreamError.}
This proc should construct the type from a ``YamlStream``. Follow the following
guidelines when implementing a custom ``constructObject`` proc:
- For constructing a value from a YAML scalar, consider using the
``constructScalarItem`` template, which will automatically catch exceptions
and wrap them with a ``YamlConstructionError``, and also will assure that the
item you use for construction is a ``yamlScalar``. See below for an example.
- For constructing a value from a YAML sequence or map, you **must** use the
``constructChild`` proc for child values if you want to use their
``constructObject`` implementation. This will check their tag and anchor.
Always try to construct child values that way.
- For non-scalars, make sure that the last value you remove from the stream is
the object's ending event (``yamlEndMap`` or ``yamlEndSequence``)
- Use `peek <yaml.html#peek,YamlStream>`_ for inspecting the next event in
the ``YamlStream`` without removing it.
- Never write a ``constructObject`` proc for a ``ref`` type. ``ref`` types are
always handled by NimYAML itself. You can only customize the construction of
the underlying object.
The following example for constructing from a YAML scalar value is the actual
implementation of constructing ``int`` types:
.. code-block:: nim
proc constructObject*[T: int8|int16|int32|int64](
s: var YamlStream, c: ConstructionContext, result: var T)
{.raises: [YamlConstructionError, YamlStreamError].} =
var item: YamlStreamEvent
constructScalarItem(s, item, name(T)):
result = T(parseBiggestInt(item.scalarContent))
The following example for constructiong from a YAML non-scalar is the actual
implementation of constructing ``seq`` types:
.. code-block:: nim
proc constructObject*[T](s: var YamlStream, c: ConstructionContext,
result: var seq[T])
{.raises: [YamlConstructionError, YamlStreamError].} =
let event = s.next()
if event.kind != yamlStartSequence:
raise newException(YamlConstructionError, "Expected sequence start")
result = newSeq[T]()
while s.peek().kind != yamlEndSequence:
var item: T
constructChild(s, c, item)
result.add(item)
discard s.next()
representObject
---------------
.. code-block:: nim
proc representObject*(value: MyObject, ts: TagStyle = tsNone,
c: SerializationContext): RawYamlStream {.raises: [].}
This proc should return an iterator over ``YamlStreamEvent`` which represents
the type. Follow the following guidelines when implementing a custom
``representObject`` proc:
- You can use the helper template
`presentTag <yaml.html#presentTag,typedesc,TagStyle>`_ for outputting the
tag.
- Always output the first tag with a ``yAnchorNone``. Anchors will be set
automatically by ``ref`` type handling.
- When outputting non-scalar types, you should use the ``representObject``
implementation of the child types, if possible.
- Check if the given ``TagStyle`` equals ``tsRootOnly`` and if yes, change it
to ``tsNone`` for the child values.
- Never write a ``representObject`` proc for ``ref`` types.
The following example for representing to a YAML scalar is the actual
implementation of representing ``int`` types:
.. code-block:: nim
proc representObject*[T: uint8|uint16|uint32|uint64](
value: T, ts: TagStyle, c: SerializationContext):
RawYamlStream {.raises: [].} =
result = iterator(): YamlStreamEvent =
yield scalarEvent($value, presentTag(T, ts), yAnchorNone)
The following example for representing to a YAML non-scalar is the actual
implementation of representing ``seq`` types:
.. code-block:: nim
proc representObject*[T](value: seq[T], ts: TagStyle,
c: SerializationContext): RawYamlStream {.raises: [].} =
result = iterator(): YamlStreamEvent =
let childTagStyle = if ts == tsRootOnly: tsNone else: ts
yield YamlStreamEvent(kind: yamlStartSequence,
seqTag: presentTag(seq[T], ts),
seqAnchor: yAnchorNone)
for item in value:
var events = representObject(item, childTagStyle, c)
while true:
let event = events()
if finished(events): break
yield event
yield YamlStreamEvent(kind: yamlEndSequence)