Serialization¶

Serializing your data structures using immer::persist allows you to preserve the structural sharing across sessions of your application.

This has multiple practical use cases, like storing the undo history or the clipboard of a complex application, or applying advanced logging techniques.

The library serializes multiple containers together via the notion of a pool. These pools are produced automatically and represent in the JSON the internal structure (trees) that implement the Immer containers.

Example¶

For this example, we’ll use a document type that contains two immer vectors.

// Set the BL constant to 1, so that only 2 elements are stored in leaves.
// This allows to demonstrate structural sharing even in vectors with just a few
// elements.
using vector_one =
    immer::vector<int, immer::default_memory_policy, immer::default_bits, 1>;

struct document
{
    vector_one ints;
    vector_one ints2;

    auto tie() const { return std::tie(ints, ints2); }

    friend bool operator==(const document& left, const document& right)
    {
        return left.tie() == right.tie();
    }

    // Make the struct serializable with cereal as usual, nothing special
    // related to immer::persist.
    template <class Archive>
    void serialize(Archive& ar)
    {
        ar(CEREAL_NVP(ints), CEREAL_NVP(ints2));
    }
};

using json_t = nlohmann::json;

Let’s say we have two vectors v1 and v2, where v2 is derived from v1 so that it shares data with it:

    const auto v1    = vector_one{1, 2, 3};
    const auto v2    = v1.push_back(4).push_back(5).push_back(6);
    const auto value = document{v1, v2};

We can serialize the document using cereal with this:

            auto os = std::ostringstream{};
            {
                auto ar = cereal::JSONOutputArchive{os};
                ar(value);
            }
            return os.str();

Generating a JSON like this one:

{"value0": {"ints": [1, 2, 3], "ints2": [1, 2, 3, 4, 5, 6]}}

As you can see, ints and ints2 contain the full linearization of each vector. The structural sharing between these two data structures is not represented in its serialized form.

Using pools¶

First, let’s make the document struct compatible with boost::hana. This way, the persist library can automatically determine what pool types are needed, and to name the pools.

BOOST_HANA_ADAPT_STRUCT(document, ints, ints2);

Then using immer::persist we can serialize it with:

        const auto policy =
            immer::persist::hana_struct_auto_member_name_policy(document{});
        const auto str = immer::persist::cereal_save_with_pools(value, policy);

Which generates some JSON like this:

        const auto expected_json = json_t::parse(R"(
{
  "value0": {"ints": 0, "ints2": 1},
  "pools": {
    "ints": {
      "B": 5,
      "BL": 1,
      "inners": [
        [0, {"children": [2], "relaxed": false}],
        [3, {"children": [2, 5], "relaxed": false}]
      ],
      "leaves": [[1, [3]], [2, [1, 2]], [4, [5, 6]], [5, [3, 4]]],
      "vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
    }
  }
}
        )");

As you can see, the value is serialized with every immer container replaced by an identifier. This identifier is a key into a pool, which is serialized just after.

Note

Currently, immer-persist makes a distinction between pools used for saving containers (output pools) and for loading containers (input pools), similar to cereal with its InputArchive and OutputArchive distinction.

Currently, immer-persist focuses on JSON as the serialization format and uses the cereal library internally. In principle, other formats and serialization libraries could be supported in the future. sharing across sessions.

You can see in the output that the nodes of the trees that make up the immer containers are directly represented in the JSON and, because we are representing all the containers as a whole, those nodes that are referenced in multiple trees can be stored only once. That same structure is preserved when reading the pool back from disk and reconstructing the vectors (and other containers) from it, thus allowing us to preserve the structural sharing across sessions.

Custom policies¶

We can use policy to control the naming of the pools for each container.

For this example, let’s define a new document type doc_2. It will also contain another type extra_data with a vector of strings in it. To demonstrate the responsibilities of the policy, the doc_2 type will not be a boost::hana::Struct and will not allow for compile-time reflection.

using vector_str = immer::
    vector<std::string, immer::default_memory_policy, immer::default_bits, 1>;

struct extra_data
{
    vector_str comments;

    friend bool operator==(const extra_data& left, const extra_data& right)
    {
        return left.comments == right.comments;
    }

    template <class Archive>
    void serialize(Archive& ar)
    {
        ar(CEREAL_NVP(comments));
    }
};

struct doc_2
{
    vector_one ints;
    vector_one ints2;
    vector_str strings;
    extra_data extra;

    auto tie() const { return std::tie(ints, ints2, strings, extra); }

    friend bool operator==(const doc_2& left, const doc_2& right)
    {
        return left.tie() == right.tie();
    }

    template <class Archive>
    void serialize(Archive& ar)
    {
        ar(CEREAL_NVP(ints),
           CEREAL_NVP(ints2),
           CEREAL_NVP(strings),
           CEREAL_NVP(extra));
    }
};

We define the doc_2_policy as following:

struct doc_2_policy
{
    template <class T>
    auto get_pool_types(const T&) const
    {
        return boost::hana::tuple_t<vector_one, vector_str>;
    }

    template <class Archive>
    void save(Archive& ar, const doc_2& doc2_value) const
    {
        ar(CEREAL_NVP(doc2_value));
    }

    template <class Archive>
    void load(Archive& ar, doc_2& doc2_value) const
    {
        ar(CEREAL_NVP(doc2_value));
    }

    auto get_pool_name(const vector_one&) const { return "vector_of_ints"; }
    auto get_pool_name(const vector_str&) const { return "vector_of_strings"; }
};

The get_pool_types function returns the types of containers that should be serialized with pools, in this case it’s both vector of ints and strings. The save and load functions control the name of the document node, in this case it is doc2_value. And the get_pool_name overloaded functions supply the name of the pool for each corresponding immer container. To create and serialize a value of doc_2, you can use the following approach:

    const auto v1   = vector_one{1, 2, 3};
    const auto v2   = v1.push_back(4).push_back(5).push_back(6);
    const auto str1 = vector_str{"one", "two"};
    const auto str2 =
        str1.push_back("three").push_back("four").push_back("five");
    const auto value = doc_2{v1, v2, str1, extra_data{str2}};

    const auto str =
        immer::persist::cereal_save_with_pools(value, doc_2_policy{});

The serialized JSON looks like this:

    const auto expected_json = json_t::parse(R"(
{
  "doc2_value": {"ints": 0, "ints2": 1, "strings": 0, "extra": {"comments": 1}},
  "pools": {
    "vector_of_ints": {
      "B": 5,
      "BL": 1,
      "leaves": [[1, [3]], [2, [1, 2]], [4, [5, 6]], [5, [3, 4]]],
      "inners": [
        [0, {"children": [2], "relaxed": false}],
        [3, {"children": [2, 5], "relaxed": false}]
      ],
      "vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
    },
    "vector_of_strings": {
      "B": 5,
      "BL": 1,
      "leaves": [[1, ["one", "two"]], [3, ["five"]], [4, ["three", "four"]]],
      "inners": [
        [0, {"children": [], "relaxed": false}],
        [2, {"children": [1, 4], "relaxed": false}]
      ],
      "vectors": [{"root": 0, "tail": 1}, {"root": 2, "tail": 3}]
    }
  }
}
    )");

And it can also be loaded from JSON like this:

    const auto loaded_value =
        immer::persist::cereal_load_with_pools<doc_2>(str, doc_2_policy{});

This example also demonstrates a scenario in which the main document type doc_2 contains another type extra_data with a vector. As you can see in the resulting JSON, nested types are also serialized with pools: "extra": {"comments": 1}. Only the ID of the comments vector is serialized instead of its content.