October 29-November 2, 2017
New York
The premier machine learning conference
Click here for upcoming PAW events

James Casaletto Instructor:
James Casaletto
Principal Solutions Architect
MapR Technologies



Wednesday, November 1, 2017 in New York
Full-day: 9:00am - 4:30pm

Spark on Hadoop for Machine Learning:
Hands-On Lab

Intended Audience:

  • Analysts, data engineers, and data scientists who build predictive models with machine learning and wish to explore using Spark and Hadoop for the same.

Requirements for this Workshop:

For those attending this Workshop, please note the two Workshop download options are required before the Workshop starts:

  • Use a Hadoop cluster already running in the cloud. This option requires zero time set up, however, your environment during the Workshop cannot be saved.
  • Install a 1-node Hadoop cluster virtual machine on laptop (setup required before the Workshop begins and takes about an hour) at this link: https://mapr.com/products/mapr-sandbox-hadoop/download/ and your environment can be saved.

Why Machine Learning Needs Spark and Hadoop

Standard machine learning platforms need to catch up. As data grows bigger, faster, more varied-and more widely distributed-storing, transforming, and analyzing it doesn't scale using traditional tools. Instead, today's best practice is to maintain and even process data in its distributed form rather than centralizing it. Apache Hadoop and Apache Spark provide a powerful platform and mature ecosystem with which to both manage and analyze distributed data.

Machine learning projects can and must accommodate these challenges, i.e., the classic "3 V's" of big data-volume, variety, and velocity. In this hands-on workshop, leading big data educator and technology leader James Casaletto will show you how to:

  • Build and deploy models with Spark. Create predictive models over enterprise-scale big data using the modeling libraries built into the standard, open-source Spark platform.

  • Model both batch and streaming data. Implement predictive modeling using both batch and streaming data to gain insights in near real-time.

  • Do it yourself. Gain the power to extract signals from big data on your own, without relying on data engineers, DBA's, and Hadoop specialists for each and every request.

This training program answers these questions:
  • What are the particular challenges of big data for machine learning?
  • When does Hadoop provide the greatest value?
  • How can streaming data be processed in Hadoop?
  • How does one build machine learning models with Apache Spark?

Hands-on lab (afternoon session):

  • Access to an enterprise-scale Hadoop cluster running in the cloud
  • Access to real data sets, working code, and hands-on exercises in Python
  • Option to install a pre-configured, 1-node Hadoop cluster on your laptop


Price and Registration Info:
  • Workshop program starts at 9:00am
  • Morning Coffee Break at 10:30 - 11:00am
  • Lunch provided at 12:30 - 1:15pm
  • Afternoon Coffee Break at 2:30 - 3:00pm
  • End of the Workshop: 4:30pm


James Casaletto, Principal Solutions Architect, MapR Technologies

James Casaletto is a principal solutions architect at MapR Technologies where he designs, implements, and deploys complete solution frameworks for big data. He has written and delivered courses on MapReduce programming, data engineering, and data science on Hadoop. Today, he is also teaching a graduate course in these topics for the computer science department at San Jose State University.

Share |

Register Now!

Predictive Analytics Geek Rap Video

Co-Located with:
PAW Financial

PAW Healthcare

eMetrics Summit

2017 Sponsors


Dun & Bradstreet

The Trade Desk




Blog Partners
© 2024 Predictive Analytics World | Privacy
Produced by Prediction Impact, Inc. and Rising Media, Inc.

Predictive Analytics Company           Predictive Analytics Event Producer