Predictive Analytics World for Business 2022
June 19-24, 2022 l Caesars Palace, Las Vegas
TRACK TOPICS – The three tracks of the main two-day conference cover these topics:
Analytics operationalization & leadership
Advanced ML methods & MLops
Cross-Industry Business Applications of ML
TOPICS – The sessions across this two-day, three-track conference are grouped into the following four topics:
Analytics operationalization & leadership
Advanced ML methods & Mlops
Cross-industry business applications of ML
Most of Track 3
Blue circle sessions are for All Levels
Red triangle sessions are Expert/Practitioner Level
Workshops - Sunday, June 19th, 2022
Full-day: 8:30am – 4:30pm.
Python leads as a top machine learning solution – thanks largely to its extensive battery of powerful open source machine learning libraries. It’s also one of the most important, powerful programming languages in general.
Workshops - Monday, June 20th, 2022
Full-Day 8:30 am - 4:30pm
This one-day session surveys standard and advanced methods for predictive modeling (aka machine learning).
Full-Day 8:30 am - 4:30pm
Machine learning improves operations only when its predictive models are deployed, integrated and acted upon – that is, only when you operationalize it.
Full-Day 8:30 am - 4:30pm
This one-day introductory workshop dives deep. You will explore deep neural classification, LSTM time series analysis, convolutional image classification, advanced data clustering, bandit algorithms, and reinforcement learning.
Predictive Analytics World for Business - Las Vegas - Day 1 - Tuesday, June 21st, 2022
Nvidia's Siddha Ganju has gained a unique perspective on machine learning's cross-sector deployment. In her current role, she work's on a range of applications, from self-driving vehicles to healthcare, and she previously led NASA's Long-Period Comets team, applying ML to develop meteor detectors. Deep learning impacts the masses, so it demands mass, interdisciplinary collaboration. In this keynote session, Siddha will describe the very particular interdisciplinary effort -- driven by established joint directives -- required to successfully deploy deep learning across a variety of domains, including climate, planetary defense, healthcare, and self-driving cars.The format of this session will be a "fireside chat," with PAW Founder Eric Siegel interviewing Siddha in order to dig deep into the lessons she's learned.
Machine learning and robotics are dramatically shifting our industrial capabilities and are opening new doors to our functional understanding and ways to support the natural world. Together, these advances can enable something far beyond simply limiting our damage to the planet -- they create the possibility of building a new relationship to nature wherein our industrial footprint can be radically reduced and nature's capability to support itself and all life on Earth (including us!) can be amplified.
Machine learning is hot. Companies are investing in larger, more complex and more expensive machine learning models. But is this investment delivering better results? The evidence is mixed at best with high rates of project failure and low rates of real-world business adoption. Results are especially poor for large, established companies. The solution is to deliver artificial intelligence solutions where machine learning does less because it is combined with the rules, constraints and logic that also determine how a decision is made. James will show how you can develop composite "decisioning" systems that combine multiple, simpler models with the knowledge of human users and other forms of analytics to deliver more value, more quickly.
We all know that feature creation is a crucial step in building predictive models. Many features can be automated easily such as z-scores, min-max normalization, log transforms, Box-Cox transforms. Multi-dimensional features, such as interactions, are often game-changers for model accuracy. However, finding which interaction terms to include in models can be tedious, often achieved by trying combinations manually, trying all combinations, or hoping that the tree ensemble or neural network will find it on its own (hint: maybe, but they often do not).
This session will describe how and why these features are critical for building accurate models and how one can streamline the process of building features without the costs of the combinatoric explosion we often battle when dealing with a high-dimensional data space.
A foundational pillar of the Little Caesars brand promise is convenience. Customers can come in during peak lunch and dinner hours and get a Hot N Ready pizza at a tremendous value within minutes and get on with their busy lives. This promise has made customers happy and delivered exceptional growth for Little Caesars restaurants for nearly 2 decades. To deliver on this promise, with little to no food waste, we need to predict what kind of pizzas in what quantities customers will want during these peak periods, every day of the week across 5500+ stores. We need to account for other conditions in the store like staffing, digital and deliver orders, as well as time restricted offers.. Naturally, this means the prediction must be dynamic and near real time. So, while the goal – keep our brand promise – is clear, the choice of which prediction model best suits is not. In this talk, Drew Smith, VP of Data and Analytics and Sage Murakishi, Director of Data Science, will cover the importance of keeping the focus on the business goal and the challenges of selecting a model that supports that goal.
In this talk, we will explain the approach we took in modeling user growth in Twitter in the aftermath of the Pandemic. We took a structurally causal approach in modeling user growth, which has now helped us in producing explainable forecast for our topline user metric, in performing attributable analysis for teams focused on the top of the funnel, and in projecting the contribution of our causal improvements for the next few quarters.
As predictive analytics grows more complex, our need for effective communication increases exponentially. Many of us rely heavily on our left brain. We are so highly trained that our managers, clients, and stakeholders find it difficult to understand what we do. Given this reality, we must be master communicators -- especially when running predictive analytics projects. In this talk, seasoned analytics consultant Olivia Parr-Rud will focus on the underlying science of communication. You will learn powerful techniques for building rapport as well as how to leverage a "quantum field" to build trust with your clients so that you can clearly communicate how your work makes a positive difference to their bottom line.
Organizations are often faced with analyzing large amounts of text documents – e.g., legislation, tax addenda, legal opinions, research papers, and grants. It is often highly time-consuming to manually investigate these documents even for individuals with high degree of expertise. There are traditional Natural Language Processing/Understanding (NLP/U) methods already in place to help expedite the analysis, but now there is a need to advance these methods. Recent advances in the field include the development of domain-specific (e.g., legal, medical) models to help interpret and analyze the text data. We’ll discuss industry use-cases demonstrating the advantages and application of domain-specific models aimed at analysis of recent legislature and its impacts across Government sectors, all developed on CortextAI for Government, Deloitte’s collection of AI solutions for specific mission challenges.
Global grocery industry is worth a massive $5.7T of which only 10% is currently online. As online grocery business accelerated from 2020, Instacart search, which supports one of the largest catalog of grocery items in the world, started facing new challenges. This talk focuses on these unique challenges and how we improved the performance of our machine learning models to significantly improve search relevance and our business metrics as a result.
Topics of discussion will include:
* Using transformer-based NLP models for understanding a user's intent in retailers from non-grocery verticals who are new to our platform.
* Using ML models that leverage a Knowledge graph to improve the diversity of the recalled results and improve basket size.
* Limitations we faced from keyword based search, and deep dive into Embedding based Retrieval techniques to capture latent user intent.
* Building a multi-objective Autocomplete ranking model for helping users explore the full breadth of content in Instacart.
* ML models to control the quality of search results to avoid showing irrelevant and embarrassing products especially for tail queries or retailers that suffer from the cold-start problem.
Making AI work in real applications involves a lot more work below the surface than the AI hype would have you believe. The weaknesses of AI and machine learning algorithms have been compensated for by the availability of a lot more data. For organizations to succeed in the new digital economy of interactions, data (structured and unstructured) has to be turned into an asset that is easy to collect, manage and utilize. We cover some of the lessons learned in making AI work in the context of case studies at Fortune 500 companies.
A story of lessons learned and a guide for business leaders considering hiring for and deploying their first machine learning project. In this session, WeVideo Director of Analytics Ryan Withop will explore what went right/wrong, the specific tools used, and how to avoid the top 5 pitfalls. You will hear successes and missteps and come out of the session armed with the knowledge needed to bring ML into your organization.
2:45pm - 3:05pm
Pull requests (PRs) have become the de facto standard for code review/merge process by development teams using SCM tools like Github, Bitbucket. PRs provids a rich source of information about developers & reviewers. They can give us quite a lot of insights about the coding styles, logical skills of the developers as every single line of code is being reviewed and "bad smells" are highlighted by the reviewer. The comments/suggestions that a reviewer gives helps in understanding the proficiency of the reviewer. We have developed a set of PR Analytics by applying Transformers based NLP, Decision Trees & Statistical Analysis on PR data.
PR Analytics can be used to perform skill assessment in order to find out the areas of improvement for the development team in a quantitative manner. PR Analytics can also help the Scrum masters & the project managers to better plan their deliverables since now they know strengths & weaknesses of the development team and can allocate the right developers for the right type of tasks
In this talk I would present some of the analytics that we have developed using data from bitbucket and how we are using them for improving the efficiency of our development teams.
3:10pm - 3:30pm
In this session, Google Cloud AI Leader Chanchal Chatterjee will present a template in python for the entire machine learning journey from concept to production. The template consists of the following parts:- Interactive notebooks to build ML components- Deploying these components in a Kubeflow pipeline with orchestration for training and prediction- Deploying their tested pipeline to production. At the end of this presentation, you will have an overview building a model from concept to a final production-ready ML pipeline. We will also provide github links to access code to do so.
2:45pm - 3:05pm
Johnson Controls began as a thermometer manufacturer in 1883 and operates, now, as a global leader in building controls, equipment manufacturer and service provider. In the middle of a digital transformation, the firm strives to use the best practices of Predictive Analytics and Machine Learning to improve operational performance in its Global Services business.
This presentation will discuss the journey of the company's Global Services business, from its analytics greenfield origin through the present, detailing all aspects of building the infrastructure necessary to solve problems using Machine Learning and Predictive Analytics. Follow along as we discuss the real-world, complicated steps necessary to predict customer CHURN using legacy industrial data from disparate, non-curated systems.
3:10pm - 3:30pm
Paychex is an American provider of human resource, payroll, and benefits outsourcing services for primarily small- to medium-sized businesses. We envision delivering curated, concise labor force analytics to ~700,000 customers representing tens of millions of workers. Although we have been successful in deploying analytics on-prem for 15 years, our existing systems have been unable to meet intensifying demand to serve larger numbers of users. We would like to harness cloud scalability while minimizing disruption to our work processes and security policies. Therefore, we are exploring a hybrid approach involving on-premises application hosting joined with cloud-based data engineering and model building. This talk describes advantages and tradeoffs we've experienced while migrating models to the Azure cloud. The elasticity of the cloud makes explainable AI achievable; we see up to 1,000-fold increase in computational speed. In addition, distributed processing has enabled us to overcome data issues and implement automated tests. Challenges have included networking infrastructure and access limitations. We've had to change ourselves to be cloud-ready, while the cloud has adapted to meet our requirements.
Delivering an effective data-driven presentation to a nontechnical live audience isn’t the same as discussing technical details with peers or delivering a written document. You must be purposeful and diligent if you want to develop a presentation that conveys a compelling story while simultaneously avoiding myriad traps that undercut your credibility and limit your impact.
Based on the new book Winning The Room, this session will provide concrete strategies and practical tips to clarify, simplify, and refine data-driven presentations in a way that maximizes comprehensibility without sacrificing accuracy. It will also utilize instructive and memorable visuals that illustrate how you can drive your points home and help your audience understand and retain your message.
By following the advice discussed in this session, you’ll get better at creating and delivering data-driven presentations that provide information in a manner enabling it to be received, understood, and embraced by your audience.
Learn why digital transformation is an important revenue opportunity, the obstacles in operationalizing ML workflow, amd the stability & scalability in Pipelines. This session will cover model lineage, versions of libraries and Docker images and the dependency based on roles such as data engineer and data scientist.
4:00pm - 4:25pm
The cruise industry is an ideal crucible for enterprise applications of data science and machine learning. Its ships are veritable mobile cities on the water, powered by full-blown industrial operations. Pricing and revenue management are driven by complex stochastic optimizations, demand forecasting, and price-response predictions. Marketing & eCommerce activities target guests globally with mixes of message, promotions and recommendations. And a portfolio of hotels and global supply chain must be managed to provide millions of guests all that they need locally on the ship. This session will cover case studies for how we leverage math and data to make this work, and in particular how we restarted from 18 months of essentially zero data (aka the pandemic).
4:25pm - 4:45pm
How does an enterprise company look to find new business opportunities? How can marketing and go-to-market strategies align their objectives and processes to drive new business growth? Understanding the ideal customer profiles (ICPs) and customer decision journey are important steps to inform the optimal strategy for success. At Zendesk, our research Center of Excellence (COE) team has conducted the ICPs (ideal customer profiles) and customer journey analysis through in-depth interviews of our customers as well as advanced analytics to map the customer decision journey from initial awareness, engagement, purchase to expansion. In this session, Zendesk's Weiwei Hu will share the key learnings and best practices on applying both qualitative and quantitative research to impact customer segmentation and targeting, go-to-market support and planning, as well as optimize brand positioning, messaging, content creation and delivery to better meet the customer needs. These key learnings and insights will enable Enterprise leaders to best leverage data and insights to understand customer's interests, activities, and touch points leading up to new and expansion deals as well as to engage customers at the right time with the right messaging, content and channels to create more relevance, stronger interests and increase loyalty among their customers.
As analytics becomes pervasive in every organization, how can you be sure the results of your analytics are driving the best decisions? Join Elder Research's Director of Commercial Analytics, Dr. Jennifer Schaff, as she walks you through the steps on how to validate and trust your models and drive more confidence in your decisions.
Artificial intelligence (AI) has quickly become a main focus topic for retail organisations worldwide. What started in small R&D environments in the "big data" revolution a few years ago has now grown into a mature practice where data scientists and data engineers work together towards common business goals, such as demand and supply chain forecasting, customer recommendations, and fraud detection. This growth also comes with challenges; machine learning models cannot live on their own and have to be incorporated into production environments. To that end, programming frameworks, tools and infrastructure are evolving at an enormous pace. New architectures and design pattern have arrived to work with these new technologies. One important field of research is MLOps, which has evolved into a way of working and set of best practices to deploy, test, manage, and monitor machine learning models in production. In this session, we'll explore this relatively new subject. Bas will explain the need for MLOps, dive into the tools and techniques, and give some examples of real-world retail solutions.
4:55pm - 5:15pm
In-App ratings can be used to train machine learning models. For instance, these ratings can serve as a valuable input to models that can generate a churn score. This score can be used to run targeted campaigns to retain users. Additionally, the users who provide a higher rating can be shown promotions to upsell premium offerings or establish loyalty programs. With such a wide range of benefits available from In-App ratings, product managers should definitely leverage them to better understand the user base, make impactful changes to the app and run personalized campaigns.
5:20pm - 5:40pm
Small, local organizations and non-profits often have a great need for data, both for funding purposes and program development/course correction. One challenge organizers face is aligning these stakeholders around shared metrics and measurement. In this session, we will show how we were able to convene an array of organizations, from hospitals to universities to regional non-profits, around an interactive dashboard we built with hyperlocal insights into the state's recommended 21 SDOH (social determinants of health) indicators. Machine learning enriches this dashboard -- for example, each of these 21 indicators are predictively modeled in order to assess the likelihood of meeting 2030 goals. By making our dashboard publicly available, equity-focused, intuitive to use, comprehensive, and fully customizable by our users, we were able to support organization around shared goals for the year 2030.
Predictive Analytics World for Business - Las Vegas - Day 2 - Wednesday, June 22nd, 2022
The UPS Smart Logistics network is a framework that continually incorporates the latest technology trends to serve customers better and more efficiently. Today it connects all the components of the transportation value chain by integrating Operations, Technology, Data and Optimization. Among many technological innovations, machine learning plays a critical role in the planning and execution of our integrated transportation network. In this talk, we will give an overview on applying predictive analytics to different phases of the network planning. The self-learning Demand Management model will be spotlighted with technical details and business impact. As for the connection to day-to-day operation, we will share our experience on deploying machine learning models to automate the key planning and execution decisions. At the end, we will also share our vision of transforming our self-learning network to a smarter self-healing network.
Many stories have appeared in the media about AI algorithms resulting in predictions and decisions that are biased, unfair, or otherwise harmful. Sometimes the harm is intended, but more often problems occur when algorithms cause harm that is a surprise to the data science team in charge. This talk will discuss several examples, and present a framework for "Responsible Data Science." The Framework is an extension of current industry standards for technical and business best practices, and provides processes and procedures by which data science practitioners and project managers can reduce the chances of unintended harm.
Value chains have been evaluated for decades. In this era of digital transformation, understanding the Data Science Value Chain is critical, but seldom is it examined as a system, nor re its component parts subject to systematic study. In this session, it will be shown how ML exists as a component within the value chain along with data acquisition, cleansing, formatting and accessibility. A conceptual case study aggregated from several non-specific sources shows the path from good data to the benefits of ML over traditional methods such as Designed Experiments. An overview of the 4.0 culture is integrated for a broader view of the benefit of ML within the value chain. Digital transformation's effects on the value chain are also integrated within the 4.0 culture. This presentation will highlight the myth that "everything data belongs to IT" by showing management and non-IT professionals the need for more knowledge about the data science value chain and where they fit within its constructs. Proposing a collaborative activity among the non-IT parts of the organization and an analytics maturity model expands where and how ML benefits decision making throughout the organization. The collaborative process also enhances communication of ML results to assist management in seeing beyond IT as a sole resource.
Today Stochastic Gradient Boosted Trees is a workhorse algorithm that is widely used in the Data Science community in the form of algorithms such as XGBoost, but in the early part of the first decade of this century it was a cutting-edge technique that had yet to be widely adopted. In 2004 Dr. Paslaski led the creation Capital One’s Analytic Testing Lab and was responsible for it until he left Capital One in 2007. The highlight of his time there was the introduction of TreeNet, an implementation of Stochastic Gradient Boosted Trees, to the Capital One statistical community and its subsequent adoption.
Dr. Paslaski will share the story of how Capital One became an early adopter of Stochastic Gradient Boosted Trees and some of the lessons he learned about how to drive change through Data Science along the way. These lessons can be applied in a general business setting and include
- It you are trying to drive meaningful change in a large company expect failure. Success = tenacity + empathy + innovation
- It's not about finding the best idea; it’s about finding the best idea your business partners will accept
- Understand who the end user is and what they need to use your product
- Be part of a team: If you are stuck get help
Thought leaders in machine learning, Dean Karl and Steven, field questions from the audience about strategies for machine learning projects, best practices, and tips, drawing from their decades of experience as consultants and company executives.
Closely following the latest in machine learning techniques, Conde Nast will uncover how it leverages a broad set of first-party data behavioral signals from its diverse content across touch points to better align audiences with intent. This session will cover how the media company taps into this highly effective methodology to drive more relevant advertising experiences and highly performant advertising campaigns.
I introduce a new concept and propose a way to estimate it: How much model search power can a given dataset endure before its confessions are spurious? I’ll explain “complexity capacity” with some simple controlled experiments and explore it during the search for a working investment timing strategy. By measuring the search power of an algorithm and the complementary search capacity of a dataset, we can avoid mismatches -- the disappointment of under-fit or under-search, yes, but mostly the disaster of over-search, where training results look great but out-of-sample predictions are worthless.
The latest poll reconfirms today's dire industry buzz: Very few machine learning models actually get deployed. This pervasive failure of ML projects comes from a lack of prudent leadership as well as various technical challenges. In this panel session, industry experts will weigh on to define which factors and practices contribute the greatest impact to ensure successful machine learning deployment. What are the most important organizational and technological ingredients? Come to this session to find out!
Deploying a model in production is not enough. Successful machine learning models aren't just successfully deployed, they are measurably impacting the bottom line. Investment in a machine learning team comes with big ROI expectations and plenty of hype. The challenges to delivering that ROI are everywhere: from picking the right problems to managing stakeholder expectations to knowing what to monitor once your model is deployed in production. Join us to learn from successes and failures in making machine learning deliver on its promise.
Identifying anomalous observations has important business impacts across all industries. None more than in the world of fraud detection where some observations are intentionally trying to hide, which is different than most rare event problems that exist in modeling. This talk will highlight some modern approaches to anomaly detection: local outlier factors, isolation forests, and classifier adjusted density estimation (CADE). All of these techniques have foundations in places that were not originally anomaly detection. This talk will highlight these approaches as well as demonstrate the approaches using open source software.
Offering well-designed rewards is essential for increasing the engagement of the users. Boosting user activity in the desired products helps economies of scale and increases the efficiency of the platform. However, rewards incur significant cost. Therefore, designing an efficient learning algorithm to optimize marketing campaigns can significantly increase the profitability of these platforms.
In this talk, we consider the following problem: given a set of different promotion structures, assign each user to a promotion structure from the set, or nothing at all. The goal is to maximize user retention while respecting the budget/keeping the cost low. I propose a novel methodology to maximize the treatment effect in a budget constraint setting. Furthermore, we use Boltzmann-exploration to balance exploration and exploitation. This enables us to efficiently collect data and update the model regularly. Finally, I show that our approach outperforms the other alternatives including R-linear and generalized random forest.
Competition for top new analytics talent is fierce. While tech and other corporate giants are indeed vacuuming up new grads from top schools, not all great students can or want to go that route. The challenge is to put yourself in a position to attract and land them. It can be done, even if you're not a well-known brand. In this session, you will learn what works from a leader of the analytics program ranked #2 in the world by QS for the past three years. Even if you are from a giant firm, you're still competing for talent. You will come away with ideas to help you gain an advantage!
As more and more cloud providers develop and offer AI services, enterprises and s/w service providers are relying on integrating with them rather than building models themselves. Google, Amazon, IBM, and Microsoft are leading in this regard. In contact centers, there are multiple such offerings for Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and Natural Language Understanding (NLU). Building a global contact center cloud solution using these 3rd party services is a challenge, as there no one vendor is mature and performs well across all geographies, languages and use cases. How do we offer customers the choice of vendors for their use case? How can we help customers consistently and continuously benchmark multiple services on a common set of criteria and choose the right product for their use case? We are building a universal harness that allows customers to mix-and-match vendors. In addition, we are building a benchmarking platform to help customers compare and test multiple vendors for the same use case. In the presentation, we will discuss the metrics, techniques, automation, and learnings for this benchmarking solution.
Human In The Loop (HITL) is a process in which, as part of the ML workflow, experts are asked their opinion about predictions made by an ML model in order to tune and improve the model. In this talk we’ll explain how we collaborated with and integrated engineers as a core part of our machine learning process, in order to create a mechanism to automatically predict the best security policies for our customers. We’ll go through the different stages of the project, discuss the challenges we faced along the way and how we overcame them, and show how you can use a similar process for any heuristic/ML project you have.
Workshops - Thursday, June 23rd, 2022
Full-Day 8:30 am - 4:30pm
This one-day session reveals the subtle mistakes analytics practitioners often make when facing a new challenge (the “deadly dozen”), and clearly explains the advanced methods seasoned experts use to avoid those pitfalls and build accurate and reliable models.
Full-Day 8:30 am - 4:30pm
This one day workshop reviews major big data success stories that have transformed businesses and created new markets.
Full-Day 8:30 am - 4:30pm
This workshop dives into the key ensemble approaches, including Bagging, Random Forests, and Stochastic Gradient Boosting.
3 hour workshop: 5:30-8:30pm
This 3 hour workshop launches your tenure as a user of R, the well-known open-source platform for data analysis.
Workshops - Friday, June 24th, 2022
Full-Day 8:30 am - 4:30pm
Gain experience driving R for predictive modeling across real examples and data sets. Survey the pertinent modeling packages.