Predictive Analytics Times
Predictive Analytics Times
Machine Learning and Artificial Intelligence: Not New Concepts for the Data Science Practitioner
 Economic disruption is a reality which has been a...
Interview: The Institute of Business Forecasting & Planning Talks to Dr. Eric Siegel
  Dr. Eric Siegel cuts through the buzzwords surrounding...
Investment Modeling Grounded In Data Science
 For more from Dr. Elder, join Predictive Analytics World...
Some Thoughts On Being a Data Science Entrepreneur in a Disruptive Economy
 The movie “Being There” may seem like an odd...

2 months ago
What Happened to Hadoop? And Where Do We Go from Here?


Originally published by InsideBigData, September 4, 2019.

Apache Hadoop emerged on the IT scene in 2006 with the promise to provide organizations with the capability to store an unprecedented volume of data using cheap, commodity hardware. In a sense, Hadoop helped usher in the era of big data.  Hopes were high and expectations were higher. In this brave new world, businesses could store as much data as they could get their hands on in Hadoop-based repositories known as data lakes and worry about the analysis later. These data lakes were accompanied by a number of independent open source compute engines – and on top of that, “open source” meant free! What could go wrong?

Monte Zweben, CEO of Splice Machine, has an interesting take on what happened to Hadoop, specifically three main reasons behind its downfall:

Schema-on-Read was a mistake

First, the so-called best features of Hadoop turned out to be its Achilles heel. With the schema-on-write restriction lifted, terabytes of structured and unstructured data began to flow into the data lakes. With Hadoop’s data governance framework and capability still being defined, it became increasingly difficult for businesses to determine the lineage of their data, causing them to lose trust in their data and data lakes to turn into data swamps.

Hadoop complexity and duct-taped compute engines

Second, Hadoop distributions provided a number of Open Source compute engines like Apache Hive, Apache Spark and Apache Kafka to name a few, but this turned out to be too much of a good thing. These compute engines were complex to operate and required specialized skills to duct-tape together that were difficult to find in the market.

To continue reading this article click here.




Leave a Reply

Pin It on Pinterest

Share This