Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Three Best Practices for Unilever’s Global Analytics Initiatives
    This article from Morgan Vawter, Global Vice...
Getting Machine Learning Projects from Idea to Execution
 Originally published in Harvard Business Review Machine learning might...
Eric Siegel on Bloomberg Businessweek
  Listen to Eric Siegel, former Columbia University Professor,...
Effective Machine Learning Needs Leadership — Not AI Hype
 Originally published in BigThink, Feb 12, 2024.  Excerpted from The...
SHARE THIS:

This excerpt is from the Techcrunch. To view the whole article click here.  

9 years ago
Spark And Hadoop Are Friends, Not Foes

 

June was an exciting month for Apache Spark. At Hadoop Summit San Jose, it was a frequent topic of conversation, as well as the subject of many session presentations. On June 15, IBM announced plans to make a massive investment in Spark-related technology.

This announcement helped kick off the Spark Summit in San Francisco, where one could witness the increasing number of engineers learning about Spark — and the increasing number of companies experimenting with and adopting Spark.

The virtuous cycle of Spark investment and adoption is driving rapidly the maturity and capabilities of this important technology, to the benefit of the entire big data community. However, the growing attention directed toward Spark also has given rise to a strange and stubborn misconception: that Spark is somehow an alternative to Apache Hadoop, instead of a complement to it. This misconception can be seen in headlines like “Newer Software Aims to Crunch Hadoop’s Numbers” and “Companies Move On From Big Data Technology Hadoop.”

As a long-time big data practitioner, an early advocate for investment in Hadoop by Yahoo! and now CEO of a company that provides big data as a service for the enterprise, I’d like to bring some perspective and clarity to this conversation.

Spark and Hadoop work together.

Hadoop is increasingly the enterprise platform of choice for big data. Spark is an in-memory processing solution that runs on top of Hadoop. The largest users of Hadoop — including eBay and Yahoo! — both run Spark inside their Hadoop clusters. Cloudera and Hortonworks ship Spark as part of their Hadoop distributions. And our own customers here at Altiscale have been using Spark on Hadoop since we launched.

To position Spark in opposition to Hadoop is like saying that your new electric car is so cool that you won’t need electricity anymore. If anything, electric cars will drive demand for more electricity.

 Why the confusion? Modern-day Hadoop consists of two main components. The first is a large-scale storage system called the Hadoop Distributed File System (HDFS), which stores data in a low-cost, high-performance manner optimized for the volume, variety and velocity of big data. The second component is a computation engine called YARN, which can run massively parallel programs on top of the data stored in HDFS.

YARN can host any number of programming frameworks. The original such framework was MapReduce, invented at Google to help process massive web crawls. Spark is another such framework, as is another new one called Tez. When people talk about Spark “crushing” Hadoop, what they really mean is that programmers now prefer using Spark to the older MapReduce framework.

However, MapReduce should not be equated with Hadoop. MapReduce is just one of many ways to process your data in a Hadoop cluster. Spark can be used as an alternative. Looking more broadly, business analysts — a growing base of big data practitioners — avoid both of these frameworks, which are low-level toolkits meant for programmers. Instead, they use high-level languages like SQL that make Hadoop more accessible.

In the last four years, Hadoop-based big data technology has seen an unprecedented level of innovation. We’ve gone from batch SQL to interactive; from one framework (MapReduce) to multiple frameworks (e.g., MapReduce, Spark and many others).

This excerpt is from the TechCrunch. To view the whole article click here.

Leave a Reply