Predictive Analytics World for Business New York 2017

Oct 29-Nov 2, 2017 – Jacob Javits Convention Center


Workshops - Sunday, October 29th, 2017

8:00 am
Full-day Workshop

"Big Data" is everywhere. The topic is impacting every industry and institution. Big excitement about big data comes from the intersection of dramatic increases in computing power and data storage with growing streams of data coming from almost every person and process on Earth. The pressing question is, how do we best make value of all this data - what should we do with it?

See more details

Session description
Instructor
Vladimir Barash, Chief Scientist, Graphika
Morning Session Workshop

This morning workshop launches your tenure as a user of R, the well-known open-source platform for data science and machine learning. The workshop stands alone as the perfect way to get started with R, or may serve to prepare for the more advanced full-day hands-on workshop, “R for Predictive Modeling”.

Session description
Instructor
Max Kuhn, Software Engineer, RStudio
12:00 pm
Full-day Workshop

This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically expose attendees to best practices driving R and its rich set of predictive modeling (machine learning) packages, providing hands-on experience and know-how. R is compared to other data analysis platforms, and common pitfalls in using R are addressed.

Session description
Instructor
Max Kuhn, Software Engineer, RStudio
4:30 pm
End of Full-day Workshop with Vladimir Barash
7:30 pm
End

Predictive Analytics World for Business - New York - Day 1 - Monday, October 30th, 2017

(PAW Financial & PAW Healthcare run in parallel on this day - dual registration required)
8:00 am
Registration & Networking Breakfast
8:45 am
Conference Chair Welcome
Eric Siegel, Conference Founder, Machine Learning Week
8:50 am
Room: 1E10
Keynote:

Welcome to the Analytics Explosion! Despite speculation that the need for analytics would begin to level-off, evidence suggests it continues to be at an all-time high. Trends show the establishment of more and more in-house analytics teams, allowing the luxury of predictive and prescriptive analytics to be applied across all levels of an organization. However, many factors should be considered when evaluating an analytics undertaking, such as complexity of the problem, precision necessary in the solution, and timeliness required for the response. With so many variables, how do you choose the right analytics tool for the job? What else is required for an analytics effort to be successful? Leveraging a dynamic analytical approach will achieve the greatest value for your business.

Session description
Speaker
Anne G. Robinson, Chief Strategy Officer, Kinaxis
9:40 am
Diamond Sponsor Presentation:
The Session Description will be available shortly.
Session description
Sponsored by
DataRobot
10:00 am
Exhibits & Morning Coffee Break
10:30 am
Track 1—BUSINESS: Analytics strategy & operationalization
Crisis response; analytics management
Lessons from: NYC Mayor's Office

Predictive analytics has proven to be a highly useful tool in the public sector, but what happens when an emergency strikes and we have to build an entire analytics infrastructure from scratch? In this case study, the NYC Mayor's Office of Data Analytics (MODA) will walk you through how the City of New York built a system to collect, monitor, and predict the presence of potentially disease-carrying cooling towers (Legionnaires' Disease) among New York's one-million plus buildings in less than a week.

Session description
Speaker
Simon Rimmele, Associate, Analytics, NYC Mayor's Office of Data Analytics
Track 2—TECH: Predictive modeling & machine learning methods
Hand-labeled training data
Case Study: Bloomberg L.P.

Machine learning models depend on often large amounts of training data for supervised learning tasks. This data may be expensive to collect, especially if it requires human labeling (as in for document or image classification) and raises some particular quality issues. For example, how do we ensure that human agreement is high and what do we do in the event that it is not? Also, when your data is expensive to tag, how do you ensure that you have the smallest set possible that is representative of all your features? This talk will address these and other issues associated with gathering crowd-sourced, hand-coded data sets for supervised machine learning models.

Session description
Speaker
Leslie Barrett, Senior Software Engineer, Bloomberg LP
Track 3—MARKETING: Marketing & market research analytics
Churn modeling
Case Study: Paychex

Small businesses are susceptible to even minor changes in economic conditions. When strategically allocating retention efforts across half a million businesses, we need to account for said changes as well as maximize our resource allocations. Traditional modeling techniques can fail over time in the presence of concept drift. We devised an innovative method to account for unknown changes by using seasonal model and trend model to probabilistically assign retention efforts. Additionally we built in functionally that allows new variables to be considered for development as they become relevant. This new methodology removes the necessity for annual retools and stabilizes performance.

Session description
Speaker
Rob Rolleston, Manager, Data Science, Paychex
11:20 am
Track 1—BUSINESS: Analytics strategy & operationalization
Education and team building
Lessons from: LinkedIn

LinkedIn Learning's mission includes training the enormous number of people who need data science and business analytics skills. But what's the best way to assess market demand and develop tools for validation in stack-ranking skills coverage? How do you go from a handful of data science courses to over 100 in less than a year, and make them as effective as possible? How do you find the best instructors? How and why do we contrast with standard classroom education and with alternative learning like MOOCs? LinkedIn, a data company, uses analytics to answer these questions and to guide our strategy.

Session description
Speaker
Steve Weiss, Content Manager, Data Science and Business Analytics, LinkedIn
Track 2—TECH: Predictive modeling & machine learning methods
Time series modeling

The problem of predicting, for each type of crime, the crime frequency in a specific area on a specific day can be framed as a regression problem on crime frequencies and Twitter data: given (1) the last 31 days of Twitter activity geo-tagged in the immediate area, (2) the last 31 days of Twitter activity in the general area, and (3) the historical crime frequencies of the general area for the past year, predict the crime frequency for the next day in that location. Inherent in this problem description is the following prediction: the taxonomy used in tweets from around the area (such as a large fraction of restaurants with low ratings, or lots of tweets about how unsafe people feel in that area) contains information that could be used to build predictors of future crime frequencies for different types of crime. Assuming that the time series can be modeled as deviation from a periodic function, and incorporating this assumption into the model, may potentially produce better crime frequency estimates than directly predicting crime frequencies. The proposed research has implications for decision makers concerned with geographic spaces occupied by Twitter users. This session will cover these analytical results, which were produced by an extended group of graduate students and researchers at New York University.

Session description
Speakers
Anasse Bari Ph.D., Professor of Computer Science - Director of the AI and Predictive Analytics Lab, New York University
Chuan-Heng Lin, Machine-Learning Engineer, Pienso
Aaron McKinstry, Computer Scientist, Courant Institute of Mathematical Sciences of New York University
Gen Xiang, Software Engineer, Trinnacle Capital Management
Track 3—MARKETING: Marketing & market research analytics
Churn modeling
Case Study: Atlassian

Measuring customer churn is a key aspect of marketing data science regardless of the type of product a company is selling. In fact, identifying the most predictive features is equally important as identifying the users at risk of churning because it can help marketing and sales teams alike in adopting the most appropriate strategy to retain their customers, improve their product and identify new opportunities. This presentation will present a decision tree-based methodology to compute churn likelihood, and will discuss which attributes, behavioral features or character traits, are the most beneficial to our model.

Session description
Speaker
Jennifer Prendki, VP of Machine Learning, Figure Eight
11:40 am
Track 3—MARKETING: Marketing & market research analytics
Market research and analytics
Case Study: Verizon Wireless

Using robust data anonymization, safeguarding, and security, Verizon linked thousands of brand health survey participants to their actual customer database records to see how behaviors like increased data usage or phone upgrades could predict changes in surveyed ratings of the brand. This presentation will discuss the business value of analytics around the brand then delve into the analytic methods and selected results.

Session description
Speakers
Michael E. Gooch-Breault, Director, Consumer and Marketplace Insights, Verizon Wireless
Jade Xi, Cslt-Pred/Presc Analytics, Verizon
12:05 pm
Lunch in Exhibit Hall
12:25 pm
Lunch & Learn
Sponsored by
DataRobot
1:15 pm
Lunch in Exhibit Hall
1:30 pm
Keynote

In the context of building predictive models, predictability is usually considered a blessing. After all - that is the goal... to build the model that has the highest predictive performance. The rise of 'big data' has, in fact, vastly improved our ability to predict human behavior, thanks to the introduction of much more informative features.

However, in practice, the target variable is often more differentiated than accounted for in the data. For example, some customers churn (from a telecom provider) because they are moving, others because they got a better offer in the mail, and the third because their home is in a location with terrible reception. These are all positives for a model that learns to predict churn, but the predicted outcome has occurred for very different reasons. In many applications, such mixed scenarios mean the model will automatically gravitate to the one that is easiest to predict at the expense of the others. This even holds if the predictable scenario is by far less common or relevant. In the worst case, predictive models can introduce biases NOT even present in the training data.

In this talk, we will cover a number of applications where this takes place: clicks on ads being performed 'intentionally' vs. 'accidentally', consumers visiting store locations vs. their phones pretending to be there, and finally customers filling out online forms vs. bots defrauding the advertising industry. In conclusion, the combination of different and highly informative features can have a significantly negative impact on the usefulness and ethics of predictive modeling.

Session description
Speaker
Claudia Perlich, Chief Scientist, Dstillery
2:15 pm
Diamond Sponsor Presentation
The Session Description will be available shortly.
Session description
Sponsored by
diwo
2:40 pm
Track 1—BUSINESS: Analytics strategy & operationalization
Analytics strategy
Lessons from: The Clorox Company

The need to be more consumer (data) centric, availability of disparate sources of granular data and rapidly advancing technology, techniques and skills makes it necessary and feasible for consumer packaged goods (CPG) companies to embed data science into their analytics strategy to further drive growth and innovation. In this session, we will discuss factors that make Data Science (in CPG) difficult, the organizational maturity curve of CPG analytics, data and techniques that can be used for driving the consumer journey, some techniques and use cases to get started, some in-house examples at Clorox and some key learnings in the journey so far.

Session description
Speaker
Payel Chowdhury, Associate Director - Data Science, The Clorox Company
Track 2—TECH: Predictive modeling & machine learning methods
Analytical methods

The use of machine learning is a common theme in organizations today, yet most people still struggle with its definition given its many different levels. In this session, we attempt to eliminate this confusion by exploring a number of machine learning algorithms ranging from the simple to the more complex. We observe the use of these algorithms across a variety of industries as well as different behaviours such as customer response and customer risk. Alongside the comparison of machine learning algorithms, we also look at the impact of the data and how feature engineering impacts a given solution.

Session description
Speaker
Richard Boire, President, Boire Analytics
Track 3—MARKETING: Marketing & market research analytics
Marketing applications

Lookalike Audience is a way to reach new people who are likely to be interested in your business because they're similar to your best existing customers. By implementing Facebook Big Data best practices you will be able to create value-based Lookalike Audiences to help you reach more people who resemble your current high-value customers and to showcase products they are most likely to purchase.

Prospecting strategies based on recent case-studies with Intelligent Blends, Lenny Lemons, Daily Fast Deal, Gearvilla and The Gadget Mole already showed 3x-5x CTR increase, 69% lower cost per click and 4.5x positive return on ad spend, along with other ad campaigns conversion increase.

Session description
Speaker
Kristina Pototska, Growth Product Manager, RetargetApp
3:05 pm
Track 3—MARKETING: Marketing & market research analytics
Case Study: Becker College

Predictive modeling has gained popularity in studying college enrollment due to fierce competition in higher education. To make informed decisions and allocate limited resources to improve enrollment, predictive modeling has been applied to challenge and change the traditional recruitment process. This session is intended for two learning outcomes: Participants who are not familiar with predictive modeling will learn how to lay out a plan to collect and build acomprehensive data infrastructure and conduct predictive modeling. Participants who have run predictive modeling will learn how to critically examine the quality of their predictive analyses

Session description
Speaker
Feyzi Bagirov, Senior Machine Learning Engineer, Booz Allen Hamilton
3:25 pm
Exhibits & Afternoon Coffee Break
3:55 pm
Track 1—BUSINESS: Analytics strategy & operationalization
Analytics strategy
Lessons from: Prudential Financial

Data analytics has been one of the hottest areas that companies invested in their future, hoping to keep up and improve competitive standing in the industry. Yet many executives at the C level have doubts on its return and the value that these investments will bring. In this talk, first we will review the challenges that companies faced in planning and working on analytics projects. Then we will examine various types of analytics innovations, and different value outcomes these innovations may bring.

Session description
Speaker
Wayne Huang, Director of Analytics, Prudential Financial Inc.
Track 2—TECH: Predictive modeling & machine learning methods
Analytical methods
Case Study: Citigroup

The model variable selection process is a key component of predictive analytics. Whereas logistic regression depends on feature selection/discovery being done beforehand, Random Forests and GBM tree based machine learning approaches intrinsically enact feature selections. However, default options for Random Forest and GBM are biased towards continuous variables, less favorable for categorical and binary variables. Unbiased solutions are very computational intensive. We propose a new approach that combines the strengths of intrinsic feature selection from Random Forests and GBM to detect non-linear relationships, enact scaling, and find interactions, and integrate these non-linear transformations into a traditional approach. A 7-10% lift is observed.

Session description
Speaker
Yulin Ning, Senior Director in Global Decision Management, Citigroup
Track 3—MARKETING: Marketing & market research analytics
Churn modeling; uplift modeling
Case Study: The Co-operators

A wide variety of models exist for predicting customer retention, each with different data requirements and model outputs. These diverse modeling options make it difficult to identify the approach with the highest potential predictive power for increasing retention. This case study explains how a survival analysis approach to predicting household retention was replaced by a more complicated but more precise "true-lift" model. This model was used for targeted marketing campaigns undertaken to increase household retention for a Canadian insurer, with performance monitored over one year. The "true-lift" model performed better than the retention model.

Session description
Speaker
Emilie Lavoie-Charland, Research & Innovation Analyst, The Co-operators
4:45 pm
Track 1—BUSINESS: Analytics strategy & operationalization
Building Data Science Teams
Lessons from: Comcast

As more and more companies strive to recruit and build data science teams, they must also identify how to structure those teams within the organization to drive innovative data science applications that will propel their business. This session will cover key considerations in structuring a data science organization capable of generating the big analytical ideas that will have the biggest impact on pushing the business forward.

Session description
Speaker
Bob Bress, Head of Data Science, Freewheel, A Comcast Company
Track 2—TECH: Predictive modeling & machine learning methods
Forecasting analytical methods
Case Study: Micron Technology

Accurate forecasting of customer demand for products is critical to success in the semiconductor industry. A diverse product portfolio, detailed customer mappings, dynamic market conditions, and extended production cycle times all sharpen the need for a reliable and responsive automated forecasting solution. I will describe a custom forecasting model recently developed by and now in use at Micron Technology Inc. that combines machine learning algorithms, established time series modeling techniques, and human expertise in leveraging existing forecasting infrastructure to achieve significant improvements in forecast accuracy across multiple levels of the product and customer hierarchy.

Session description
Speaker
Colin Ard, Senior Enterprise Data Scientist, Micron Technology, Inc.
Track 3—MARKETING: Marketing & market research analytics
Optimizing outreach; uplift modeling

In the current environment, media consumption is fragmenting, cord cutters are an increasingly large segment of the population, and "digital" is no longer a ubiquitous, single medium. As such, large companies and other organizations looking to do outreach at scale to change individuals' behavior have an overwhelming number of choices for how to deploy their outreach resources. In this talk, Daniel Porter, co-founder and Chief Analytics Officer of BlueLabs, will discuss how current tools which combine uplift models with state of the art allocation algorithms make it possible for organizations ranging from Fortune 100 companies to Presidential Campaigns to large government agencies to optimize these decisions at the individual level, leading to ensuring delivery of the right message to the right person at the right time, through media channels where an individual is most likely to engage positively with the content.

Session description
Speaker
Daniel Porter, Co-Founder, BlueLabs
5:30 pm
Networking Reception
Sponsored by
DataRobot

Predictive Analytics World for Business - New York - Day 2 - Tuesday, October 31st, 2017

(PAW Financial & PAW Healthcare run in parallel on this day - dual registration required)
8:00 am
Registration & Networking over Coffee
8:35 am
Eric Siegel, Conference Founder, Machine Learning Week
8:40 am
Special Plenary Session

Every analytics challenge reduces, at its technical core, to optimizing a metric. Product recommendation engines push items to maximize a customer's purchases; fraud detection algorithms flag transactions to minimize losses; and so forth. As modeling and classification (optimization) algorithms improve over time, one could imagine obtaining a solution merely by defining the guiding metric. But are our tools that good? More importantly, are we aiming them in the right direction? I think, too often, the answer is no. I'll argue for clear thinking about what exactly it is we ask our computer assistant to do for us, and recount some illustrative war stories. (Analytic heresy guaranteed.)

Session description
Speaker
John Elder Ph.D., Founder & Chair, Elder Research
9:25 am
Plenary Session

In the spring of 2017, over a thousand analytic professionals from around the world participated in the 8th Rexer Analytics Data Miner Survey. In this PAW session, Karl Rexer will unveil the highlights of this year's survey results. Highlights will include:

  • key algorithms
  • challenges of Big Data Analytics, and steps being taken to overcome them
  • trends in analytic computing environments & tools
  • analytic project deployment
  • job satisfaction
Session description
Speaker
Karl Rexer, President, Rexer Analytics
9:40 am
Diamond Sponsor Presentation

In this session, we will discuss how using data and analytics within a Machine Learning environment has become the latest trend in using analytics for sales & marketing. This approach is set to become one of the most widely accepted means of improving campaign effectiveness through use of propensity and response predictive modeling.

Session description
Sponsored by
Dun & Bradstreet
Speaker
Kelley Gazdak, Global Vice President Data & Analytic Solutions, Dun & Bradstreet
10:00 am
Track 1—BUSINESS: Analytics strategy & operationalization
Getting it deployed
Lessons from: Honeywell

How often do analytic projects fail to drive value or monetization? How often does great analytic work go to waste? The primary reason for failure to realize value from analytic investment is the lack of ability to deploy to market. Advances in Big Data technology have enabled companies to deploy analytics in a consistent and expedient way to begin realizing the value of their investment.

So what does all this change really mean to you as a data scientist? How do you not only master the shifting environment - but use it to thrive?

In this talk, Bill Groves, Chief Data Scientist & Analytics Officer at Honeywell International, will explore how to operationalize analytics.

  • Why the "last mile" is critical and so often forgotten
  • Organizing to optimize return on analytics
  • Leveraging emerging technologies in data and analytics to increase speed to market and deployment

There's never been a better time to be at the forefront of data science. Ride the wave of change today - and into tomorrow.

Session description
Speaker
William Groves, Chief Data & Analytic Officer, Honeywell International
Track 2—TECH: Predictive modeling & machine learning methods
Data quality

Three Steps for Improving Data Quality for Predictive Analytics

Bad data is the enemy of predictive analytics. From confusion about units or measure, to missing definitions, to data values that are just plain wrong, bad data gets in the way or, worse, leads predictions awry. Unfortunately, there is no "silver bullet" solution for all the problems that can arise.

Fortunately, practitioners can take steps to alleviate the bad data problems:

  1. Understand quality levels and evaluate whether the data are fit-for-use in their analyses.
  2. Use "rinse, wash, and scrub" routines to clean some bad data, and
  3. Address the root causes of poor quality data longer-term.

This presentation briefly summarizes the issues, puts the steps above in context, and shows they contribute to better predictive analytics.

Session description
Speaker
Tom Redman, Data Quality Solutions
Track 3—MORE CASE STUDIES: Varied business applications
Data storytelling

Everybody Lies author Seth Stephens-Davidowitz will discuss how to use Google searches to get new insights into people. He will discuss public tools available for researchers, how to find important insights, and potential pitfalls in using search data. His talk will address a range of topics, including marketing, psychology, and economics.

Session description
Speaker
Seth Stephens-Davidowitz, Author and NYTimes Opinion Writer
10:45 am
Exhibits & Morning Coffee Break
Book Signing with Seth Stephens-Davidowitz, Author, Everybody Lies and former Google data scientist
11:15 am
Track 1—BUSINESS: Analytics strategy & operationalization
Workforce analytics
Lessons from: Intel

As Intel is transforming to become a data-center company and entering new fields such as AI and autonomous cars, the need to integrate HR data into decision making has become more relevant than ever. In this session, we will review new methods and capabilities Intel's Talent intelligence organization developed in order to help its business leverage its internal talent and attract external candidates to fill critical positions. We will show real case studies of how these new analytical capabilities are adopted by Intel leaders to win the right talent in the marketplace.

Session description
Speaker
Hai Harari, Director, Talent Intelligence and Analytics, Intel
Track 2—TECH: Predictive modeling & machine learning methods
Best practices

Preeminent consultant, author and instructor Dean Abbott, along with Rexer Analytics president Karl Rexer, field questions from an audience of predictive analytics practitioners about their work, best practices, and other tips and pointers.

Session description
Speakers
Dean Abbott, Chief Data Scientist, Abbott Analytics
Karl Rexer, President, Rexer Analytics
Track 3—MORE CASE STUDIES: Varied business applications
Legal applications

Is the market for legal services a lemons market? Consumers have no means of determining ex ante whether their lawyer is good or mediocre. Legal services consumption often defaults to guesswork or over-reliance on referrals. Equally, the pricing of legal services bears little relationship with quality - mediocre lawyers often charge as much as good lawyers, and clients end up paying very high fees for lawyers with losing records. On the flip side, good lawyers might be underpaid relative to poor ones, because the market prices all services at an average quality level due to a lack of transparency and trust. In this milieu, analysis of win-loss records, citation practices of particular judges, biases in favour of plaintiffs or defendants, etc have the potential to bring much needed evidence-based decision-making into the legal system. Analytics and machine learning also offer the potential to automate highly important legal decisions such as parole - where idiosyncratic and opaque methods have yielded wildly varying outcomes in similar circumstances imperiling society. For governments, analysis of productivity and efficiency offer the potential to tailor justice spending in more optimal ways to address the justice gap.

Session description
Speaker
Sandeep Gopalan, Pro Vice-Chancellor (Academic Innovation), Deakin University, Melbourne, Australia
11:40 am
Track 3—MORE CASE STUDIES: Varied business applications
Industry-leading case studies

What if you could accurately assess the value - and risk - of a potential customer, right from the very first interaction? Detailed customer data is now more widely available, across all the touchpoints that prospects and customers have with your organization. By creating an Experience Platform, you can use advanced analytics to tie together the whole journey, and each customer's individual pathway can now be optimized.

In this session, you'll learn:

  • Analytic techniques to better understand the customer experience
  • The data, technology, and processes that will help you transform customer experience
  • How to get started in using advanced analytics to improve your organization's customer experience.
Session description
Speaker
Steven Ramirez, CEO, Beyond the Arc
12:00 pm
Lunch in Exhibit Hall
12:10 pm
Lunch & Learn

All markets are large organizational elements made up of smaller elements called products. The session begins by showing that the only markets that express the law of supply and demand are those for scarce, highly valued products such as gold, silver, and platinum. All other markets demonstrate action in the dual states of value and demand across four mathematical axes, creating a 4D position, modulated by other forces, as the buyers deem appropriate. Adding another variable, time, to a 4D state yields a 5D state. This session continues with an examination of the nature of 4D and 5D states in MEE4DTM software, showing users how to find over and underpriced products already in the market; additionally, it displays what markets want, do not have, and can afford. Working with common data, it provides actionable, statistically significant business intelligence to buyers, sellers and new market entrants alike.

Session description
Speaker
Doug Howarth, CEO, MEE Inc
1:00 pm
Lunch in Exhibit Hall
1:10 pm
Keynote

Using advanced analytics, UPS was able to reduce 185M miles driven annually. The latest tool, ORION (On Road Integrated Optimization and Navigation) completed deployment in 2016 and accounts for $300M to $400M in cost reduction annually.

Session description
Speaker
Jack Levis, Formerly UPS (retired), now Chief Product Strategist, ESP Logistics Technology
1:55 pm
The Session Description will be available shortly.
Session description
Sponsored by
The Trade Desk Inc.
Speaker
Mark Davenport, Senior Director of Analytics, The Trade Desk
2:15 pm
Expert Panel

Across fields of science and engineering, the track record of contributions made by women continues to grow - a fact that helps pave the way for future female scientists. Predictive analytics and data science are no exception. In this session, our expert panelists will address questions such as:

  • How to increase the count of women in your analytics team
  • Differences from other science and engineering fields in terms of being male dominated
  • How to "survive" as a woman in analytics
  • The next generation - encouraging girls and newcomers in STEM (science, technology, engineering, and mathematics)
  • Balancing work and personal life

See also this Predictive Analytics Times article for related reading on the topic.

Session description
Moderator
Anne G. Robinson, Chief Strategy Officer, Kinaxis
Panelists
Tracie Coker Kambies, Principal | Retail Technology and Analytics, Deloitte
Julia Minkowski, Product Lead, Walmart Global Tech
Pallavi Yerramilli, Senior Product Manager, The Trade Desk
3:00 pm
Exhibits & Afternoon Coffee Break
3:30 pm
Track 1—BUSINESS: Analytics strategy & operationalization
Analytics management
Lessons from: Vanguard

As Data Scientists enter the Enterprise work environment at a rapid pace, delivering immediate business impact is often a challenge for new teams. This session will provide an overview of best practices in project management that can be applied to maximize value from your data science team, including Agile/Scrum methodology, team communication and client expectation management, in order to quickly grow both the data-driven practice at your organization and deliver on-going business value.

Session description
Speaker
Wanda Wang, Data Scientist - Investment Management Fintech Strategies, Vanguard
Track 2—TECH: Predictive modeling & machine learning methods
Model interpretation
Case Study: SmarterHQ

For some of us, predictive accuracy is paramount when we assess our models: PCC, ROC AUC, Type I and Type II errors, etc. However, in other applications, the interpretation of predictive models is paramount so we understand why the model behaves the way it behaves. For this reason, many practitioners end up building models that are easier to interpret rather than models that are more accurate: regression or decision trees in particular. Neural Networks and Model Ensembles fall out of favor in these applications because they are perceived to be "black boxes".

This talk will describe an approach to determine the relative influence of each input variable in any predictive model using input shuffling, no matter how simple or complex the model. Interpretation of linear regression, logistic regression, neural networks, and Random Forest models will be compared and contrasted.

Session description
Speaker
Dean Abbott, Chief Data Scientist, Abbott Analytics
Track 3—MORE CASE STUDIES: Varied business applications
PA adoption in a new industry
Case Study: RightShip

Over 90% of world trade is carried by sea, and RightShip's highly regarded online vetting system provides data and risk evaluations on over 75,000 vessels in the world fleet. In this unique, ground-breaking case study of predictive model deployment in the maritime industry, Bryan Guenther will cover:

  • Model development - the right team and expertise
  • Chaid model - pros and cons
  • How the model highlighted a huge problem in applying a fair rating across different types of vessels
  • Issues with "flip-flopping" near the cutoff between two ratings
  • Industry socialization and training
  • Industry acceptance (or not)
  • How much do you show the customer without causing confusion?
  • How do you train up your own internal staff to answer customer questions?
  • When to retrain the model - How to handle the fallout of changes when updating
  • The impact of this endeavor on the shipping industry
Session description
Speaker
Bryan Guenther, Qi Program Manager, RightShip
3:55 pm
Track 3—MORE CASE STUDIES: Varied business applications
Agriculture analytics
Case Study: Circle A Farms

In this session we'll review a case study of smart hydroponics - how we created a connected farm, the data we collected, and the analysis performed to improve yields and make a better product.

Session description
Speaker
Steve Fowler, CEO, Jivoo
4:15 pm
Track 1—BUSINESS: Analytics strategy & operationalization
Model deployment
Lessons from: John Hancock

A model is only as valuable as its adoption. Speed to value, repeatability and low cost solutions can dramatically reduce software and services budgets and free up valuable dollars for other activities. Open source tools such as Shiny (R) and Flask (Python) have enabled the creation and deployment of data science based web applications convenient and manageable. At John Hancock, in its Advanced Analytics function, we routinely wrap sophisticated modeling code into such web-based point and click solutions. In this session you will see and learn about real-life examples of how one can rapidly operationalize both model build & maintenance.

Session description
Speaker
Shatrunjai Singh, Senior Data Scientist, John Hancock
Track 2—TECH: Predictive modeling & machine learning methods
Data policy

Software and analytics have been eating the world for a long time, and law and government are next. Businesses are increasingly transcending physical boundaries into new, unregulated virtual domains, forcing companies, developers and regulators to take a hard look at how data is collected, stored, and used. And new laws around the world are beginning to force the technology industry to rethink how it approaches the law. This talk will explain how and why the worlds of law and technology are colliding, and what this means for data-driven companies, the technology industry, and governments and citizens around the world.

Session description
Speaker
Andrew Burt, Chief Privacy Officer & Legal Engineer, Immuta
Track 3—MORE CASE STUDIES: Varied business applications
Logistics analytics
Case Study: Cargonexx

Automated pricing is delicate when it comes to dynamic changing prices with no real current market price. We present challenges and approaches in this pricing area, from the development of a pricing engine for a logistics platform to deriving the current price for each transport request in realtime (< 15ms). We implement the solution acting as an intermediary between contractors of transports and freight carriers. To solve this stochastic problem, we use "fuzzyfication" and machine learning to build probability distributions for price acceptances. In this session, general steps are presented, applicable to broader cases that have an underlying stochastic problem, where distributions need to be extracted from data to derive optimal control actions under uncertainty.

Session description
Speaker
Alwin Haensel, Founder and Managing Director, Haensel AMS
5:00 pm
End of Day 2

Post-Conference Workshops: Wednesday, November 1, 2017

Full-day Workshop

As crucial as it is, data preparation is perhaps the most under-taught part of the predictive analytics (machine learning) process, even though we spend 60%, 70%, even up to 90% of our time doing data preparation steps. This workshop will cover the most important aspects of data preparation. Each of these topics will be described and connected to specific modeling algorithms that benefit from the data preparation step, including: 

  • Data cleaning: outlier detection and "fixing", and which algorithms care about outliers
  • Missing value imputation: the simple approaches and more complex and complete methods
  • Feature creation: why we do it, which algorithms are helped most by which kinds of features, and how to automate building different kinds of continuous-valued and categorical features
  • Feature selection: why it's important to many algorithms
  • Sampling: what kind of sampling we should do, how large the samples should be, should we (ever) stratify samples, and how to sample small data sets to improve model robustness
Session description
Instructor
Dean Abbott, Chief Data Scientist, Abbott Analytics
Full-day Workshop

Predictive analytics has proven capable of enormous returns across industries – but, with so many core methods for predictive modeling (machine learning), there are some tough questions that need answering:

  • How do you pick the right one to deliver the greatest impact for your business, as applied over your data?
  • What are the best practices along the way?
  • And how do you avoid the most treacherous pitfalls?
Session description
Instructor
John Elder Ph.D., Founder & Chair, Elder Research
Full-day Workshop

Why Machine Learning Needs Spark and Hadoop

Standard machine learning platforms need to catch up. As data grows bigger, faster, more varied-and more widely distributed-storing, transforming, and analyzing it doesn't scale using traditional tools. Instead, today's best practice is to maintain and even process data in its distributed form rather than centralizing it. Apache Hadoop and Apache Spark provide a powerful platform and mature ecosystem with which to both manage and analyze distributed data.

Machine learning projects can and must accommodate these challenges, i.e., the classic "3 V's" of big data-volume, variety, and velocity. In this hands-on workshop, leading big data educator and technology leader James Casaletto will show you how to:

  • Build and deploy models with Spark. Create predictive models over enterprise-scale big data using the modeling libraries built into the standard, open-source Spark platform.
  • Model both batch and streaming data. Implement predictive modeling using both batch and streaming data to gain insights in near real-time.
  • Do it yourself. Gain the power to extract signals from big data on your own, without relying on data engineers, DBA's, and Hadoop specialists for each and every request.
Session description
Instructor
James Casaletto, PhD Candidate, UC Santa Cruz Genomics Institute and former Senior Solutions Architect, MapR

Post-Conference Workshop: Thursday, November 2, 2017

Full-day Workshop

Once you know the basics of predictive analytics and machine learning—including data exploration, data preparation, model building, and model evaluation—what can be done to improve model accuracy? One key technique is the use of model ensembles, combines several or even thousands of models into a single, new model score. It turns out that model ensembles are usually more accurate than any single model, and they are typically more fault tolerant than single models.

Are model ensembles an algorithm or an approach? How can one understand the influence of key variables in the ensembles? Which options affect the ensembles most? This workshop dives into the key ensemble approaches including Bagging, Random Forests, and Stochastic Gradient Boosting. Attendees will learn "best practices" and attention will be paid to learning and experiencing the influence various options have on ensemble models so that attendees will gain a deeper understanding of how the algorithms work qualitatively and how one can interpret resulting models. Attendees will also learn how to automate the building of ensembles by changing key parameters.

Session description
Instructor
Dean Abbott, Chief Data Scientist, Abbott Analytics
CloseSelected Tags: