Instructor:
James Casaletto
Senior Solutions Architect
MapR Technologies

Workshop

Monday, June 20, 2016 in Chicago
Full-day: 9:00am - 4:30pm

Room: Salon A2

Hadoop for Predictive Analytics:
Hands-On Lab

Intended Audience:

Analysts, data engineers, and data scientists who build predictive models and wish to explore using Hadoop for the same.

Why Predictive Analytics Needs Hadoop

Standard predictive analytics platforms need to catch up. As data grows bigger, faster, more varied—and more widely distributed—storing, transforming, and analyzing it doesn’t scale using traditional tools. Instead, today’s best practice is to maintain and even process data in its distributed form rather than centralizing it. Apache Hadoop provides a powerful platform and mature ecosystem with which to both manage and analyze distributed data.

Predictive analytics projects can and must accommodate these challenges, i.e., the classic "3 V's" of big data–volume, variety, and velocity—as well as its distributed nature. In this hands-on workshop, leading Hadoop educator and technology leader James Casaletto will show you how to:

Predict with Hadoop. Create predictive models over enterprise-scale big data using the modeling libraries built into the standard, open-source Hadoop ecosystem.

Model both batch and streaming data. Implement predictive modeling using both batch and streaming data and gain insights in near real time.

Model in a distributed fashion. Accommodate predictive modeling projects to the distributed nature of data in order to benefit from parallel computation, while also averting the often unneeded and prohibitively inefficient process of merging and centralizing widely distributed sources of data.

Do it yourself. Gain the power to extract signals from big data on your own, without relying on data engineers and Hadoop specialists for each and every request.

This training program answers these questions:

What are the particular challenges of big data for predictive analytics?
When does Hadoop provide the greatest value?
How can streaming data be processed in Hadoop?
How does one build predictive models with Apache Spark?

Hands-on lab (afternoon session):

Access to an enterprise-scale Hadoop cluster running in the cloud
Access to real data sets, working code, and hands-on exercises in Python
Option to install a pre-configured, 1-node Hadoop cluster on your laptop

Schedule

Workshop program starts at 9:00am
Morning Coffee Break at 10:30 - 11:00am
Lunch provided at 12:30 - 1:15pm
Afternoon Coffee Break at 2:30 - 3:00pm
End of the Workshop: 4:30pm

Instructor

James Casaletto, Senior Solutions Architect, MapR Technologies

James Casaletto is a senior solutions architect at MapR Technologies where he designs, implements, and deploys complete solution frameworks for big data. He has written and delivered courses on MapReduce programming, data engineering, and data science on Hadoop. Today, he is also teaching a graduate course in these topics for the computer science department at San Jose State University.