PhD topic: Bridging the CAP gap, all the way to the edge
Background
The CAP theorem points to an inherent conflict in the design of a geo-scalable applications, between consistency on the one hand, and availability and performance on the other [1, 5]. Application developers want their data consistent, yet synchronising it is a major obstacle to responsiveness and scalability [4]. Because of the CAP theorem, most existing cloud databases make a black-or-white choice between consistency and availability. We propose instead to bridge the CAP gap by tailoring consistency to application needs, avoiding synchronisation whenever it does not jeopardise application correctness.
Accordingly, in previous work we have designed CRDTs, data types that support concurrent updates and are guaranteed to converge to a correct value [9, 10].  We have also designed and implemented the planet-scale  cloud database Antidote [2] to guarantee Transactional Causal Consistency (TCC).  TCC is is the strongest consistency model that is compatible with availability, and is sufficient for many application invariants: for instance, a social network that requires that the friendship relation be symmetric, or that a user can post photos only if she is in the “approved for posting” set.
Beyond TCC, some invariants do require synchronised operations: for instance, ensuring that a shared counter remains positive requires to synchronise concurrent decrements (but not increments). We show how to improve the availability of synchronised operations by delaying and batching them [3]. In order to synchronise only when strictly necessary, developers need tools to navigate between too much synchronisation, which degrades performance and availability, and too little, which can corrupt data. To help with this delicate decision, we developed the CISE logic and tool that verify whether consistency is sufficient for the application, and if not,  help the developer to fix the problem (either by weakening the application or strengthening the synchronisation) [6, 7, 8].
We have demonstrated a prototype CISE tool.  Our Antidote prototype is fully functional, supports some real industrial applications, and has been shown experimentally to scale to hundreds of shards replicated across several data centres. However, the continued growth of cloud computing, and the demands of future social, 5G and IoT applications, create new challenges.
Problem statement and research directions
We now aim to scale Antidote to thousands of replicas at the network edge. As implementing TCC efficiently at scale involves complex trade-offs in the amount of parallelism vs. protocol overhead vs. generality, this raises a number of exciting research challenges:
 	- To scale the number of sites by three orders of magnitude, while keeping thrifty metadata, and without impacting the guarantees, availability and performance of TCC.
- In particular, provide partial replication: make use of locality in order to replicate data, metadata and operations only where needed, allowing the system to scale across a range of node types, from mobile phones to server farms.
- Deconstructing consistency models into orthogonal components, implemented in a modular fashion, that can be combined efficiently to provide guarantees tailored to application requirements. We aim to extend TCC to scale all the way to the edge, without breaking correctness, hopefully relaxing causality requirements that do not scale to thousands of servers.
- Improve and generalise the CISE logic and tools, leveraging modular consistency to remove unnecessary assumptions, and better automating the work of the application developer, and to minimise application requirements.
- Making the results of the research practical, by developing and experimenting with planet-scale applications, in co-operation with our industrial partners.
The LightKone project
The LightKone project, scheduled to start at the end of 2016, brings together five academic research partners and four application partners across Europe. It aims to develop a scientifically sound, and industrially validated, approach for programming edge networks.
Bibliography
 	- 1
- Daniel J. Abadi. Consistency tradeoffs in modern distributed database system design: {CAP} is only part of the story. IEEE Computer, 45(2):37--42, February 2012.
- 2
- Deepthi Devaki Akkoorath, Alejandro Z. Tomsic, Manuel Bravo, Zhongmiao Li, Tyler Crain, Annette Bieniusa, Nuno Preguiça, and Marc Shapiro. {C}ure: Strong semantics meets high availability and low latency. In Int. Conf. on Distributed Comp. Sys. (ICDCS), pages 405--414, Nara, Japan, June 2016.
- 3
- Valter Balegas, Diogo Serra, Sérgio Duarte, Carla Ferreira, Marc Shapiro, Rodrigo Rodrigues, and Nuno Preguiça. Extending eventually consistent cloud databases for enforcing numeric invariants. In Symp. on Reliable Dist. Sys. (SRDS), pages 31--36, Montréal, Canada, September 2015. IEEE Comp. Society, IEEE Comp. Society.
- 4
- Ken Birman, Gregory Chockler, and Robbert van Renesse. Toward a {C}loud {C}omputing research agenda. ACM SIGACT News, 40(2):68--80, June 2009.
- 5
- Seth Gilbert and Nancy Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51--59, 2002.
- 6
- Alexey Gotsman, Hongseok Yang, Carla Ferreira, Mahsa Najafzadeh, and Marc Shapiro. '{C}ause {I'm strong enough: Reasoning about consistency choices in distributed systems}. In Symp. on Principles of Prog. Lang. (POPL), pages 371--384, St. Petersburg, FL, USA, 2016.
- 7
- Mahsa Najafzadeh and Marc Shapiro. Demo of the {CISE} tool, November 2015.
- 8
- Mahsa Najafzadeh, Alexey Gotsman, Hongseok Yang, Carla Ferreira, and Marc Shapiro. The {CISE} tool: Proving weakly-consistent applications correct. Rapport de Recherche RR-8870, Inria, Rocquencourt, France, February 2016.
- 9
- Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. Conflict-free replicated data types. In Int. Symp. on Stabilization, Safety, and Security of Dist. Sys. (SSS), LNCS 6976, pages 386--400, Grenoble, France, October 2011. {S}pringer-{V}erlag.
- 10
- Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. Convergent and commutative replicated data types. Bulletin of the European Association for Theoretical Computer Science (EATCS), (104):67--88, June 2011.
Marc.Shapiro =at= acm.org
Last modified: Mon Sep 24 14:43:27 CEST 2018