PhD topic: Efficient management of memory and storage for CRDTs
Marc Shapiro (Sorbonne-Université
The main output of this PhD will be the design and implementation of a
highly-available geo-replicated industry-grade file system, by
combining the best features of Antidote and RingFS. Antidote is a
CRDT-based database, designed for geo-replication and high
availability. Scality's RingFS is a battle-proven, high-throughput,
failure-resilient file system.
Managing memory and storage is one of the main challenges. Currently,
Antidote stores its updates in an infinite journal. In this project,
we require instead to bound the size of the journal, and to introduce
persistent storage of checkpoints and of file system state; we also
require to avoid redundant copying for performance reasons.
In order to maintain correctness, we will identify the invariants of
each individual component, and the global invariants that link them
together. For instance, a crucial invariant is that every version
that may be needed by the file system must persist in the storage
layers. We will then write pseudocode of the system, and apply
verification tools to guarantee that the invariants hold. This will
be combined with a practical implementation that conforms to the
pseudocode, which we will validate experimentally.
This is a particular instance of a more general problem, which we plan
to address: the efficient management of memory and storage for CRDTs
in general. CRDTs are more challenging than classical objects,
because an update is not restricted to an assignment, but can be an
arbitrarly complex operation; concurrent updates are allowed and must
be merged; this requires managing multiple versions that are not
totally ordered. Managing this complexity while avoiding redundant
copies is especially challenging.
Marc.Shapiro =at= acm.org