BigSets: Scaling CRDTs to large sizes in Riak
Bigsets is a Basho project that brings scale and performance to CRDTs in the database. Inspired by the delta-CRDT work in SyncFree, bigsets decomposes a delta-CRDT Set datatype across many keys and stores them sorted on disk.
Taking advantage of CRDTs, but actually designing them into the system rather than using a library, Bigsets provide vast write-throughput performance improvements (O(1) vs. O(n)), and allows the creation of sets with millions of elements, which was previously impossible with the object-per-crdt model in the literature and the previous version of Riak.
Bigsets also address the fundamental problem of Action-At-A-Distance. A database client is not a replica: as the database acts for the client, causality, consistency, and semantics must be correct from the clients' point of view, but without an explosion in metadata. For this purpose, Bigsets use logical clocks inspired by Dotted Version Vectors.
Basho is working to put Bigsets into Riak, and will follow on with BigMaps. Bigsets’ advances include: a streamimg subset CRDT merge for quorum reads and range queries; a one-way full state merge that only needs the sending set to be read from disk; a growing and shrinking logical set-tombstone that allows element removal via LevelDB compaction.
Further reading
- White paper (PDF)
- Big(ger) sets: decomposed delta CRDT sets in Riak: Russell Brown, Zeehan Makhani, Paul Place; PaPoC, London UK, April 2016.
- Video presentation at Erlang Users' Conference 2016