BigSets: Scaling CRDTs to large sizes in Riak

Bigsets is a Basho project that brings scale and performance to CRDTs in the database. Inspired by the delta-CRDT work in SyncFree, bigsets decomposes a delta-CRDT Set datatype across many keys and stores them sorted on disk. 

Taking advantage of CRDTs, but actually designing them into the system rather than using a library, Bigsets provide vast write-throughput performance improvements (O(1) vs. O(n)), and allows the creation of sets with millions of elements, which was previously impossible with the object-per-crdt model in the literature and the previous version of Riak.

Bigsets also address the fundamental problem of Action-At-A-Distance. A database client is not a replica: as the database acts for the client, causality, consistency, and semantics must be correct from the clients' point of view, but without an explosion in metadata.  For this purpose, Bigsets use logical clocks inspired by Dotted Version Vectors.

Basho is working to put Bigsets into Riak, and will follow on with BigMaps. Bigsets’ advances include: a streamimg subset CRDT merge for quorum reads and range queries; a one-way full state merge that only needs the sending set to be read from disk; a growing and shrinking logical set-tombstone that allows element removal via LevelDB compaction.

Further reading

            

 This project is funded by the European Union, 7th Research Framework Programme, ICT call 10, grant agreement n°609551. 

Contact (Project Coordinator)

MARC dot SHAPIRO atsign ACM dot ORG