PhD thesis (Funded)

"Elastic distributed algorithms for Clouds"

LIP6(UPMC/CNRS) - INRIA - Regal Team (Paris)

Contact

 Luciana.Arantes@lip6.fr, Pierre.Sens@lip6.fr, Julien.Sopena@lip6.fr

Context

Cloud computing is a model that provides computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services. They are highly dynamic (“elastic”) since the virtual machines running the client applications can migrate based on the load of machines of the Cloud platform, energy consumption or even for satisfying quality of service contract, denoted Service Level Agreement (SLA).

PhD subject

The goal of this thesis proposition is to study distributed algorithms able to adapt themselves to the elasticity of Clouds. It is financed by the French ANR MyCloud project (http://mycloud.inrialpes.fr/). We particularly focus on failure detectors which are known to be fundamental building blocks for distributed agreement algorithms. Unreliable failure detector (FD) have been introduced by [1] to circumvent the impossibility result of Fischer-Lynch-Paterson which shows that a consensus cannot be deterministically solve in the asynchronous distributed system subject to one single crash failure. There are many implementations of FD [2,3]. Most of them are not suitable for the Cloud environments since (1) they are not scalable, (2) they assume a known and fix topology of the underlying network, (3) they do not take into account SLA constraints. So failure detectors algorithms have to be rethought in the context of Cloud computing.

We propose then in this thesis to adapt failure detector algorithms to Cloud environments. In previous works, our group has adapted distributed failure detectors for Grids [3]. The latter take into account the physical topology of the grid in order to organize the exchanges of detection messages (heartbeat). In the context of clouds, the failure algorithm should be able to adapt itself to unknown topology and to the transparent mobility of virtual machines which might have an impact algorithm’s performance. It is thus necessary to provide self-organizing algorithms adapted to the dynamics of Cloud virtual machines. Note that in this case, it is also necessary to dynamically detect virtual machine migrations. Such a task is also part of the thesis proposal. Evaluation of the proposed algorithms will be conducted on top of Grid’5000/Aladdin and some Cloud environments.

Scientific challenges

Development of distributed algorithms for the Clouds should cope with the following features which are not presented in traditional approaches:

  • The number of physical and virtual machines change dynamically.
  • The number of virtual machines is much greater than the number of machines of Grids.
  • Virtual machines can migrate and it is necessary to discover the logical topology of the application tasks.
  • Clients Service Level Agreement must be satisfied as much as possible.

In conclusion, the conception of existing failure detection algorithms must be reconsidered: algorithms for Clouds must adapt themselves to the dynamics of the environment while respecting the quality of service constraints (SLA).

References

[1] Chandra, T., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2) (March 1996) 225–267

[2] Joshua B. Leners, Hao Wu, Wei-Lun Hung , Marcos K. Aguilera, Michael Walfish . Detecting failures in distributed systems with the FALCON spy network, 23rd ACM Symposium on Operating Systems Principles, October, 2011.

[3] M. Bertier, O. Marin, P. Sens. Performance analys of hierrachical failure detector. Proc. of the International Conference on Dependable Systems and Networks (DSN'03) San Fransisco, USA, Juin, 2003 (IEEE Society Press)