This is a library of immutable containers.
These containers have all their methods marked const
. Instead of
mutating them in place, they provide manipulation functions that
return a new transformed value, leaving the original value
unaltered. In the context of data-structures, this property of
preserving old values is called persistence.
Most of these containers use data-structures in which these operations can be done efficiently. In particular, not all data is copied when a new value is produced. Instead, the new values may share, internally, common data with other objects. We sometimes refer to this property as structural sharing. This behaviour is transparent to the user.
We are sorry, we lied. These containers provide one mutating
operation: assignment — i.e. operator=
.
There is a good reason: without operator=
everything becomes
complicated in C++. For example, one may not contain non-assignable
types in many standard containers, assignment would also be disabled
from your custom types holding immutable containers, and so on and so
forth.
C++ is a multi-paradigm language with an imperative core. Thus, it is
built on the foundation that variables can be mutated —
i.e. assigned to. We don’t want to ride against this tide. What we
want to prevent is in-place object manipulation. Because of C++
semantics, variable assignment is defined in terms of object
mutation, so we have to provide this very particular mutating
operation, but nothing else. Of course, you are free to mark your
variables const
to completely forbid assignment.
Warning
Assignment is not thread safe. When a mutable variable is shared across multiple threads, protect access using some other mechanism.
For obvious reasons, all other methods, which are const
, are
thread-safe. It is safe to share immutable state across multiple
threads.
const
or not to const
¶Many C++ programmers, influenced by functional programming, are trying
to escape the evils of mutability by using const
whenever
possible. We also do it ourselves in many of our examples to
reinforce the property of immutability.
While in general this is a good practice backed up with very good
intentions, it has one caveat: it disables moveability. It does so
even when std::move()
is used. This makes sense, since moving from
an object may mutate it, and const
, my friends, prevents all
mutations. For example:
immer::vector<int> do_stuff(const immer::vector<int> v)
{
return std::move(v).push_back(42);
}
One may think that the variable v
is moved into the
push_back()
call. This is not the case, because the variable
v
is marked const
. Of course, one may enable the move by
removing it, as in:
immer::vector<int> do_stuff_better(immer::vector<int> v)
{
return std::move(v).push_back(42);
}
So, is it bad style then to use const
as much as possible? I
wouldn’t say so, and it is advisable when std::move()
is not used.
An alternative style is to not use const
but adopt an AAA-style
(Almost Always use Auto). This way, it is easy to look for
mutations by looking for lines that contain =
but no auto
.
Remember that when using our immutable containers operator=
is the
only way to mutate a variable.
Why does const
prevent move semantics?
For those adventurous into the grainy details C++, here is why.
std::move()
does not move anything, it is just a cast from
normal l-value references (T&
) to r-value reference
(T&&
). This is, you pass it a variable, and it returns a
reference to its object disguised as an intermediate result. In
exchange, you promise not to do anything with this variable later
[1]. It is the role of the thing that receives the moved-from
value (in the previous example, push_back
) to actually do
anything interesting with it — for example, steal its contents
😈.
So if you pass a T&
to std::move()
you get a T&&
and,
unsurprisingly, if you pass a const T&
you get a const T&&
.
But the receivers of the moved-from value (like constructors or our
push_back()
) maybe be moved-into because they provide an
overload that expects T&&
— without the const
! Since a
const T&&
can not be converted into a T&&
, the compiler
looks up for you another viable overload, and most often finds a
copy constructor or something alike that expects a const T&
or
just T
, to which a const T&&
can be converted. The code
compiles and works correctly, but it is less efficient than we
expected. Our call to std::move()
was fruitless.
[1] | For the sake of completeness: it is actually allowed to do stuff with the variable after another value is assigned to it. |
When using reference counting (which is the default) mutating operations can often be faster when operating on r-value references (temporaries and moved-from values). Note that this removes persistence, since one can not access the moved-from value anymore! However, this may be a good idea when doing a chain of operations where the intermediate values are not important to us.
For example, let’s say we want to write a function that inserts all integers in the range \([first, last)\) into an immutable vector. From the point of view of the caller of the function, this function is a transaction. Whatever intermediate vectors are generated inside of it can be discarded since the caller can only see the initial vector (the one passed in as argument) and the vector with all the elements. We may write such function like this:
immer::vector<int> myiota(immer::vector<int> v, int first, int last)
{
for (auto i = first; i < last; ++i)
v = std::move(v).push_back(i);
return v;
}
The intermediate values are moved into the next push_back()
call. They are going to be discarded anyways, this little
std::move
just makes the whole thing faster, letting push_back
mutate part of the internal data structure in place when possible.
If you don’t like this syntax, transients may be used to obtain similar performance benefits.
Assignment guarantees
From the language point of view, the only requirement on moved from values is that they should still be destructible. We provide the following two additional guarantees:
v =
std::move(v)
are well-defined.Most containers will fail to be instantiated with a type of unknown size, this is, an incomplete type. This prevents using them for building recursive types. The following code fails to compile:
struct my_type
{
int data;
immer::vector<my_type> children;
};
However, we can easily workaround this by using an immer::box
to wrap
the elements in the vector, as in:
struct my_type
{
int data;
immer::vector<immer::box<my_type>> children;
};
Standard containers and incomplete types
While the first example might seem to compile when using some
implementations of std::vector
instead of immer::vector
, such
use is actually forbidden by the standard:
17.6.4.8 Other functions (…) 2. the effects are undefined in the following cases: (…) In particular—if an incomplete type (3.9) is used as a template argument when instantiating a template component, unless specifically allowed for that component.
Sometimes you may write a function that needs to do multiple changes to a container. Like most code you write with this library, this function is pure: it takes one container value in, and produces a new container value out, no side effects.
Let’s say we want to write a function that inserts all integers in the range \([first, last)\) into an immutable vector:
immer::vector<int> myiota(immer::vector<int> v, int first, int last)
{
for (auto i = first; i < last; ++i)
v = v.push_back(i);
return v;
}
This function works as expected, but it is slower than necessary. On every loop iteration, a new value is produced, just to be forgotten in the next iteration.
Instead, we can grab a mutable view on the value, a Transients. Then, we manipulate it in-place. When we are done with it, we extract back an immutable value from it. The code now looks like this:
immer::vector<int> myiota(immer::vector<int> v, int first, int last)
{
auto t = v.transient();
for (auto i = first; i < last; ++i)
t.push_back(i);
return t.persistent();
}
Both conversions are \(O(1)\). Note that calling transient()
does not break the immutability of the variable it is called on. The
new mutable object will adopt its contents, but when a mutation is
performed, it will copy the data necessary using copy on write.
Subsequent manipulations may hit parts that have already been copied,
and these changes are done in-place. Because of this, it does not
make sense to use transients to do only one change.
Tip
Note that move semantics can be used instead to support a similar use-case. However, transients optimise updates even when reference counting is disabled.
While the immutable containers provide an interface that follows a functional style, this is incompatible with what the standard library algorithms sometimes expect. transients try to provide an interface as similar as possible to similar standard library containers. Thus, can also be used to interoperate with standard library components.
For example the myiota() function above may as well be written using standard library tools:
immer::vector<int> myiota(immer::vector<int> v, int first, int last)
{
auto t = v.transient();
std::generate_n(
std::back_inserter(t), last - first, [&] { return first++; });
return t.persistent();
}