Denis Jeanneau - Homepage

Solving agreement problems using failure detectors in dynamic distributed systems

Traditionally, distributed computing considers distributed systems where the system membership is static and the communication graph is connected or fully connected. But in order to model modern networks such as wireless or peer-to-peer networks, it is necessary to consider another kind of distributed systems. Dynamic systems are distributed systems in which (1) processes can join or leave the system during the run, and (2) the communication graph evolves over time.

Agreement problems such as consensus are at the heart of distributed computing. In order to solve consensus, all correct processes in the system must agree on a same value. The k-set agreement problem is a generalization of consensus where processes can agree on up to k different values. Although they have been widely studied in static systems, solving agreement problems in dynamic systems is still a challenge. Existing solutions make strong assumptions either on the timeliness of communications or on the number of process failures. Instead, we choose to rely on failure detectors.

The failure detector abstraction was introduced as a way to circumvent the impossibility of solving consensus in asynchronous systems prone to crash failures. A failure detector is a local oracle that provides processes in the system with unreliable information on process failures. But a failure detector that is sufficient to solve a given problem in a static system is not necessarily sufficient to solve the same problem in a dynamic system. Therefore, it is necessary to redefine existing failure detectors and provide new algorithms.

The family of quorum failure detectors Σ_k has been proved necessary to solve k-set agreement in message passing systems. In a paper published in IEEE NCA15, we extend Σ_k to obtain the Σ_⊥,k failure detector, which is suited to dynamic networks. We then provide algorithms implementing Σ_⊥,k in asynchronous dynamic systems and an algorithm solving k-set agreement. But Σ_⊥,k is not sufficient to solve k-set agreement: as a result, our solution can only solve k-set agreement for some values of k.

Future work includes defining a failure detector stronger than Σ_⊥,k and capable of solving k-set agreement in asynchronous dynamic networks, along with an implementation of said failure detector and an algorithm solving k-set agreement for any k value.