Spark And Hadoop Are Friends, Not Foes

This excerpt is from the Techcrunch. To view the whole article click here. ♦

Jul 18, 2015
Comments Off on Spark And Hadoop Are Friends, Not Foes
Industry News

Hadoop, spark
3553 Views

11 years ago
Spark And Hadoop Are Friends, Not Foes

By: Raymie Stata, Crunch Network Contributor, TechCrunch
Originally published at http://techcrunch.com

June was an exciting month for Apache Spark. At Hadoop Summit San Jose, it was a frequent topic of conversation, as well as the subject of many session presentations. On June 15, IBM announced plans to make a massive investment in Spark-related technology.

This announcement helped kick off the Spark Summit in San Francisco, where one could witness the increasing number of engineers learning about Spark — and the increasing number of companies experimenting with and adopting Spark.

The virtuous cycle of Spark investment and adoption is driving rapidly the maturity and capabilities of this important technology, to the benefit of the entire big data community. However, the growing attention directed toward Spark also has given rise to a strange and stubborn misconception: that Spark is somehow an alternative to Apache Hadoop, instead of a complement to it. This misconception can be seen in headlines like “Newer Software Aims to Crunch Hadoop’s Numbers” and “Companies Move On From Big Data Technology Hadoop.”

As a long-time big data practitioner, an early advocate for investment in Hadoop by Yahoo! and now CEO of a company that provides big data as a service for the enterprise, I’d like to bring some perspective and clarity to this conversation.

Spark and Hadoop work together.

Hadoop is increasingly the enterprise platform of choice for big data. Spark is an in-memory processing solution that runs on top of Hadoop. The largest users of Hadoop — including eBay and Yahoo! — both run Spark inside their Hadoop clusters. Cloudera and Hortonworks ship Spark as part of their Hadoop distributions. And our own customers here at Altiscale have been using Spark on Hadoop since we launched.

To position Spark in opposition to Hadoop is like saying that your new electric car is so cool that you won’t need electricity anymore. If anything, electric cars will drive demand for more electricity.

Why the confusion? Modern-day Hadoop consists of two main components. The first is a large-scale storage system called the Hadoop Distributed File System (HDFS), which stores data in a low-cost, high-performance manner optimized for the volume, variety and velocity of big data. The second component is a computation engine called YARN, which can run massively parallel programs on top of the data stored in HDFS.

YARN can host any number of programming frameworks. The original such framework was MapReduce, invented at Google to help process massive web crawls. Spark is another such framework, as is another new one called Tez. When people talk about Spark “crushing” Hadoop, what they really mean is that programmers now prefer using Spark to the older MapReduce framework.

However, MapReduce should not be equated with Hadoop. MapReduce is just one of many ways to process your data in a Hadoop cluster. Spark can be used as an alternative. Looking more broadly, business analysts — a growing base of big data practitioners — avoid both of these frameworks, which are low-level toolkits meant for programmers. Instead, they use high-level languages like SQL that make Hadoop more accessible.

In the last four years, Hadoop-based big data technology has seen an unprecedented level of innovation. We’ve gone from batch SQL to interactive; from one framework (MapReduce) to multiple frameworks (e.g., MapReduce, Spark and many others).

This excerpt is from the TechCrunch. To view the whole article click here.

EXCLUSIVE HIGHLIGHTS

This excerpt is from the Techcrunch. To view the whole article click here. ♦

Related

11 years ago
Spark And Hadoop Are Friends, Not Foes

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

This excerpt is from the Techcrunch. To view the whole article click here. ♦

Related

11 years agoSpark And Hadoop Are Friends, Not Foes

Recommended

Why AI hasn’t replaced software engineers, and won’t

A reality check on the AI jobs hysteria

Apocalypse No

There will be no AI jobpocalypse.

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

11 years ago
Spark And Hadoop Are Friends, Not Foes

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact