Hadoop Data Center Systems

Hadoop has long been accepted as the defacto engine for Big Data processing. Strengths like an active open source community, wide industry and vendor acceptance and general scalability and fault tolerance make it an easy choice. However Open Source Hadoop has not been traditionally designed for multi datacenter grade applications. There are quite a few variants, Amazon Web Services included who try to offer Hadoop (or Hadoop compliant) environments in a geographically distributed manner.

In this study made a few months back, I try to examine various key options available in the industrial landscape, and also try to evaluate how the Apache Open Source Community for Hadoop is trying to address this topic. The study is not by any means exhaustive and is purely my own views and understanding which may have deficiencies.




























































Going forward Cloud Infrastructure and Hadoop need to work hand in hand to deliver an effective Data Center solution built around Hadoop at its core. Applications can be served from any of the geographically distributed data centers without really bothering about data locality, reliability and performance. In the above few slides I tried to summarize various ways people have been trying to achieve the same and what is probably a way forward for the Hadoop Community in this area. In the coming year or two I feel this will be a fluid topic and gradually only a few approaches will emerge as the standard ones.


Comments

Popular Posts