Full Machine Learning Week 7-Track Agenda 2022 – Detailed Session Descriptions
Predictive Analytics World
June 19-24, 2022 l Caesars Palace, Las Vegas
See the full 7-track agenda for the six co-located conferences at Machine Learning Week. A Machine Learning Week Ticket is required for full access. To view the agenda for one individual conference, click here: PAW Business, PAW Financial, PAW Industry 4.0, PAW Climate, PAW Healthcare, or Deep Learning World.
Session Levels:
Blue circle sessions are for All Levels
Red triangle sessions are Expert/Practitioner Level
Machine Learning Week - Las Vegas - Day 1 - Tuesday, June 21st, 2022
Nvidia's Siddha Ganju has gained a unique perspective on machine learning's cross-sector deployment. In her current role, she work's on a range of applications, from self-driving vehicles to healthcare, and she previously led NASA's Long-Period Comets team, applying ML to develop meteor detectors. Deep learning impacts the masses, so it demands mass, interdisciplinary collaboration. In this keynote session, Siddha will describe the very particular interdisciplinary effort -- driven by established joint directives -- required to successfully deploy deep learning across a variety of domains, including climate, planetary defense, healthcare, and self-driving cars.The format of this session will be a "fireside chat," with PAW Founder Eric Siegel interviewing Siddha in order to dig deep into the lessons she's learned.
Nvidia's Siddha Ganju has gained a unique perspective on machine learning's cross-sector deployment. In her current role, she work's on a range of applications, from self-driving vehicles to healthcare, and she previously led NASA's Long-Period Comets team, applying ML to develop meteor detectors. Deep learning impacts the masses, so it demands mass, interdisciplinary collaboration. In this keynote session, Siddha will describe the very particular interdisciplinary effort -- driven by established joint directives -- required to successfully deploy deep learning across a variety of domains, including climate, planetary defense, healthcare, and self-driving cars.The format of this session will be a "fireside chat," with PAW Founder Eric Siegel interviewing Siddha in order to dig deep into the lessons she's learned.
Nvidia's Siddha Ganju has gained a unique perspective on machine learning's cross-sector deployment. In her current role, she work's on a range of applications, from self-driving vehicles to healthcare, and she previously led NASA's Long-Period Comets team, applying ML to develop meteor detectors. Deep learning impacts the masses, so it demands mass, interdisciplinary collaboration. In this keynote session, Siddha will describe the very particular interdisciplinary effort -- driven by established joint directives -- required to successfully deploy deep learning across a variety of domains, including climate, planetary defense, healthcare, and self-driving cars.The format of this session will be a "fireside chat," with PAW Founder Eric Siegel interviewing Siddha in order to dig deep into the lessons she's learned.
Nvidia's Siddha Ganju has gained a unique perspective on machine learning's cross-sector deployment. In her current role, she work's on a range of applications, from self-driving vehicles to healthcare, and she previously led NASA's Long-Period Comets team, applying ML to develop meteor detectors. Deep learning impacts the masses, so it demands mass, interdisciplinary collaboration. In this keynote session, Siddha will describe the very particular interdisciplinary effort -- driven by established joint directives -- required to successfully deploy deep learning across a variety of domains, including climate, planetary defense, healthcare, and self-driving cars.The format of this session will be a "fireside chat," with PAW Founder Eric Siegel interviewing Siddha in order to dig deep into the lessons she's learned.
Nvidia's Siddha Ganju has gained a unique perspective on machine learning's cross-sector deployment. In her current role, she work's on a range of applications, from self-driving vehicles to healthcare, and she previously led NASA's Long-Period Comets team, applying ML to develop meteor detectors. Deep learning impacts the masses, so it demands mass, interdisciplinary collaboration. In this keynote session, Siddha will describe the very particular interdisciplinary effort -- driven by established joint directives -- required to successfully deploy deep learning across a variety of domains, including climate, planetary defense, healthcare, and self-driving cars.The format of this session will be a "fireside chat," with PAW Founder Eric Siegel interviewing Siddha in order to dig deep into the lessons she's learned.
Machine learning and robotics are dramatically shifting our industrial capabilities and are opening new doors to our functional understanding and ways to support the natural world. Together, these advances can enable something far beyond simply limiting our damage to the planet -- they create the possibility of building a new relationship to nature wherein our industrial footprint can be radically reduced and nature's capability to support itself and all life on Earth (including us!) can be amplified.
Machine learning and robotics are dramatically shifting our industrial capabilities and are opening new doors to our functional understanding and ways to support the natural world. Together, these advances can enable something far beyond simply limiting our damage to the planet -- they create the possibility of building a new relationship to nature wherein our industrial footprint can be radically reduced and nature's capability to support itself and all life on Earth (including us!) can be amplified.
Machine learning and robotics are dramatically shifting our industrial capabilities and are opening new doors to our functional understanding and ways to support the natural world. Together, these advances can enable something far beyond simply limiting our damage to the planet -- they create the possibility of building a new relationship to nature wherein our industrial footprint can be radically reduced and nature's capability to support itself and all life on Earth (including us!) can be amplified.
Machine learning and robotics are dramatically shifting our industrial capabilities and are opening new doors to our functional understanding and ways to support the natural world. Together, these advances can enable something far beyond simply limiting our damage to the planet -- they create the possibility of building a new relationship to nature wherein our industrial footprint can be radically reduced and nature's capability to support itself and all life on Earth (including us!) can be amplified.
Machine learning and robotics are dramatically shifting our industrial capabilities and are opening new doors to our functional understanding and ways to support the natural world. Together, these advances can enable something far beyond simply limiting our damage to the planet -- they create the possibility of building a new relationship to nature wherein our industrial footprint can be radically reduced and nature's capability to support itself and all life on Earth (including us!) can be amplified.
As the world of Machine Learning (ML) has advanced, the biggest challenge that still faces data science organizations is the need for insightful, valuable, predictive attributes, aka “features” that can be applied to ML models. The process of building features is so tedious and costly that the “feature store” was invented to make re-building features a thing of the past.
The problem is that traditional means of building features to feed feature stores have been manual, labor-intensive efforts that involve data engineers, subject matter experts, data scientists, and your IT department. But what if there was a faster and more scalable way? Join dotData’s VP of Data Science, Dr. Aaron Cheng as he presents the concept of the automated Feature Factory and see how your organization can take a process that today takes months, and do it in a few days.
As the world of Machine Learning (ML) has advanced, the biggest challenge that still faces data science organizations is the need for insightful, valuable, predictive attributes, aka “features” that can be applied to ML models. The process of building features is so tedious and costly that the “feature store” was invented to make re-building features a thing of the past.
The problem is that traditional means of building features to feed feature stores have been manual, labor-intensive efforts that involve data engineers, subject matter experts, data scientists, and your IT department. But what if there was a faster and more scalable way? Join dotData’s VP of Data Science, Dr. Aaron Cheng as he presents the concept of the automated Feature Factory and see how your organization can take a process that today takes months, and do it in a few days.
As the world of Machine Learning (ML) has advanced, the biggest challenge that still faces data science organizations is the need for insightful, valuable, predictive attributes, aka “features” that can be applied to ML models. The process of building features is so tedious and costly that the “feature store” was invented to make re-building features a thing of the past.
The problem is that traditional means of building features to feed feature stores have been manual, labor-intensive efforts that involve data engineers, subject matter experts, data scientists, and your IT department. But what if there was a faster and more scalable way? Join dotData’s VP of Data Science, Dr. Aaron Cheng as he presents the concept of the automated Feature Factory and see how your organization can take a process that today takes months, and do it in a few days.
As the world of Machine Learning (ML) has advanced, the biggest challenge that still faces data science organizations is the need for insightful, valuable, predictive attributes, aka “features” that can be applied to ML models. The process of building features is so tedious and costly that the “feature store” was invented to make re-building features a thing of the past.
The problem is that traditional means of building features to feed feature stores have been manual, labor-intensive efforts that involve data engineers, subject matter experts, data scientists, and your IT department. But what if there was a faster and more scalable way? Join dotData’s VP of Data Science, Dr. Aaron Cheng as he presents the concept of the automated Feature Factory and see how your organization can take a process that today takes months, and do it in a few days.
As the world of Machine Learning (ML) has advanced, the biggest challenge that still faces data science organizations is the need for insightful, valuable, predictive attributes, aka “features” that can be applied to ML models. The process of building features is so tedious and costly that the “feature store” was invented to make re-building features a thing of the past.
The problem is that traditional means of building features to feed feature stores have been manual, labor-intensive efforts that involve data engineers, subject matter experts, data scientists, and your IT department. But what if there was a faster and more scalable way? Join dotData’s VP of Data Science, Dr. Aaron Cheng as he presents the concept of the automated Feature Factory and see how your organization can take a process that today takes months, and do it in a few days.
Machine learning is hot. Companies are investing in larger, more complex and more expensive machine learning models. But is this investment delivering better results? The evidence is mixed at best with high rates of project failure and low rates of real-world business adoption. Results are especially poor for large, established companies. The solution is to deliver artificial intelligence solutions where machine learning does less because it is combined with the rules, constraints and logic that also determine how a decision is made. James will show how you can develop composite "decisioning" systems that combine multiple, simpler models with the knowledge of human users and other forms of analytics to deliver more value, more quickly.
We all know that feature creation is a crucial step in building predictive models. Many features can be automated easily such as z-scores, min-max normalization, log transforms, Box-Cox transforms. Multi-dimensional features, such as interactions, are often game-changers for model accuracy. However, finding which interaction terms to include in models can be tedious, often achieved by trying combinations manually, trying all combinations, or hoping that the tree ensemble or neural network will find it on its own (hint: maybe, but they often do not).
This session will describe how and why these features are critical for building accurate models and how one can streamline the process of building features without the costs of the combinatoric explosion we often battle when dealing with a high-dimensional data space.
A foundational pillar of the Little Caesars brand promise is convenience. Customers can come in during peak lunch and dinner hours and get a Hot N Ready pizza at a tremendous value within minutes and get on with their busy lives. This promise has made customers happy and delivered exceptional growth for Little Caesars restaurants for nearly 2 decades. To deliver on this promise, with little to no food waste, we need to predict what kind of pizzas in what quantities customers will want during these peak periods, every day of the week across 5500+ stores. We need to account for other conditions in the store like staffing, digital and deliver orders, as well as time restricted offers.. Naturally, this means the prediction must be dynamic and near real time. So, while the goal – keep our brand promise – is clear, the choice of which prediction model best suits is not. In this talk, Drew Smith, VP of Data and Analytics and Sage Murakishi, Director of Data Science, will cover the importance of keeping the focus on the business goal and the challenges of selecting a model that supports that goal.
In today's hyper-digital world, the data contained in documents often represent significant business value. The application of machine learning to extracting information from these sources is becoming big business. However, each use case represents different challenges in data extraction.This talk examines the application of intelligent document processing in the health insurance space. BCBS Tennessee Director of Data Science & AI Brandon Cosley will discuss how their Data Science Center of Excellence deployed an ensemble of machine learning techniques (e.g. DL, open-source, and NLP libraries) to extract information from documents in different business contexts. He will highlight the successes and challenges of each implementation while focusing on key findings associated with business success.
Blood, platelets and other transferable fluids are critical for patient health. At PAW-2020 we described OneBlood’s use of analytics to optimize blood donor recruitment, to forecast hospital needs and to manage the blood supply chain during the Covid pandemic. Now we provide an update with a focus on platelets. We built and deployed three predictive models. Marketing campaigns use these models, and dashboards enable campaign tracking and iterative improvements. Inventory monitoring and hospital demand forecasting are the remaining solution components. With all of these components working together, we have dramatically increased the number of platelet donors, stabilized inventory to match demand, and dramatically increased platelet availability in Florida hospitals.
Predictive maintenance (PdM) has made significant strides in recent years and represents the strongest solution to the persistent manufacturing challenge of unplanned downtime. While many manufacturers understand the benefits that an IoT-based PdM solution can provide, the majority of them are still struggling to successfully implement these solutions. Markus Larsson, head of Predictive Maintenance at PARC talks about how manufacturers can successfully deploy and manage PdM solutions and put themselves on the path to zero unplanned downtime.
Recommender Systems are an integral part of Walmart's online e-Commerce platform, driving customer engagement and revenue. In this session, we'll present an approach to use causal inference to learn user-attribute affinities through temporal contexts. We'll also talk about how we leverage the temporal structure in users' repeat purchase behavior to recommend relevant categories of products to users.
By leveraging the power of the Metapath2Vec algorithm from raw representations, we form a heterogeneous interaction graph from the e-Commerce purchase/transaction data and traverse it using random walks of different lengths to get the context vectors for different nodes (users, baskets, categories, etc.). We obtain the Metapath2Vec embeddings by implementing the heterogeneous skip-gram model in TensorFlow and use the TensorFlow Probability/Edward to define large scale probabilistic models and do black-box Variational Inference optimization in an OOPS framework.
As predictive analytics grows more complex, our need for effective communication increases exponentially. Many of us rely heavily on our left brain. We are so highly trained that our managers, clients, and stakeholders find it difficult to understand what we do. Given this reality, we must be master communicators -- especially when running predictive analytics projects. In this talk, seasoned analytics consultant Olivia Parr-Rud will focus on the underlying science of communication. You will learn powerful techniques for building rapport as well as how to leverage a "quantum field" to build trust with your clients so that you can clearly communicate how your work makes a positive difference to their bottom line.
Organizations are often faced with analyzing large amounts of text documents – e.g., legislation, tax addenda, legal opinions, research papers, and grants. It is often highly time-consuming to manually investigate these documents even for individuals with high degree of expertise. There are traditional Natural Language Processing/Understanding (NLP/U) methods already in place to help expedite the analysis, but now there is a need to advance these methods. Recent advances in the field include the development of domain-specific (e.g., legal, medical) models to help interpret and analyze the text data. We’ll discuss industry use-cases demonstrating the advantages and application of domain-specific models aimed at analysis of recent legislature and its impacts across Government sectors, all developed on CortextAI for Government, Deloitte’s collection of AI solutions for specific mission challenges.
Global grocery industry is worth a massive $5.7T of which only 10% is currently online. As online grocery business accelerated from 2020, Instacart search, which supports one of the largest catalog of grocery items in the world, started facing new challenges. This talk focuses on these unique challenges and how we improved the performance of our machine learning models to significantly improve search relevance and our business metrics as a result.
Topics of discussion will include:
* Using transformer-based NLP models for understanding a user's intent in retailers from non-grocery verticals who are new to our platform.
* Using ML models that leverage a Knowledge graph to improve the diversity of the recalled results and improve basket size.
* Limitations we faced from keyword based search, and deep dive into Embedding based Retrieval techniques to capture latent user intent.
* Building a multi-objective Autocomplete ranking model for helping users explore the full breadth of content in Instacart.
* ML models to control the quality of search results to avoid showing irrelevant and embarrassing products especially for tail queries or retailers that suffer from the cold-start problem.
Business partners are inundated with a non-stop barrage of how they need to capitalize on data and it's uses in order to stay relevant. As such data scientists and business partners spend ample time discussing the value of predictive modeling, use cases and potential ROI. Engaging and aligned as those early conversations can be, it is often a single conversation once the model is built and ready for deployment that can be the most problematic. In short...Just how accepting are business partners of the deployment changes necessary to a process for a predictive model to deliver the promised quantifiable value?Resistance to these changes ultimately turn into a self-fulfilling prophecy causing the appearance of failure to many modeling efforts. Which is why ensuring model deployment acceptance conversations occur throughout a model delivery life cycle is critical to overcoming the distrust, disinformation and general dislike of change a model deployment can create. Let's discuss several examples from a Fortune 500 financial services company's a call center's predictive modeling efforts, where model deployment acceptance impacted not only process integration, model development and benefit realization and the areas of opportunity that could have driven a different result.
In January 2020, just after the first case of COVID-19 was discovered in the US, NYU Computer Science Prof. Anasse Bari led with infectious disease medical doctor Prof. Megan Coffee a multidisciplinary team of AI and medicine experts both in the US and China to develop the first COVID-19 Clinical Severity Predictive Tool. The tool aimed to help medical doctors triage and provide care effectively during the incoming surges of cases by using algorithms that can predict which mildly ill patients were likely to become severely ill. In July 2021, the team developed another tool named COVID-19 Early-alerts Signals built on a digital epidemiology framework that analyzes alternative data sources to discover predictors of the pandemic curve, which could supplement traditional predictive models and inform early warning systems and public health policies. The research finds that online google searches can predict major regional increases and decreases in COVID-19 cases. After the vaccine rollout, Prof. Bari and Coffee led a team that developed a Vaccine Hesitancy Analytics Tool which is a real-time big data analytics cloud application to track misinformation and extract themes and topics related to vaccine hesitancy. The platform was based on natural language processing and sentiment analysis predictive algorithms. The tool was deployed using Amazon Web Services.
In this talk Prof. Bari will outline the experimental research results from the three tools his team developed: (1) COVID-19 Clinical Severity Predictor, (2) Pandemics Early-alert Signals Tool based on alternative data, and (3) Vaccine Hesitancy Analytics Tool. This talk will also highlight the analytics lessons learned and how we can better prepare for future pandemics using predictive analytics and algorithms.
* Prof. Anasse Bari led these projects and teams with medical doctor Prof. Megan Coffee, Dr. Matthias Heymann and other researchers from the NYU Courant Institute of Mathematical Sciences, the NYU Computer Science Department and the NYU Grossman School of Medicine.
Additive manufacturing aka 3D Printing is fast becoming a viable option for final part manufacturing as material choices grow along with advancements in core printing technology. One of the key challenges is the ability to produce high quality parts with repeatability and to address this challenge there is a need for an automated part quality monitoring and prediction system.
In this session we present a system we developed that is designed to aid process engineers from designing a process, conducting Design Of Experiments(DOE), verifying the process, to deploying and monitoring the process in real/near real time.
Real/near real time monitoring of a process is based on the concept of a process digital twin. The process digital twin is a collection of machine learning models (supervised and unsupervised) working in unison to detect anomalies and generate alerts during the manufacturing process. Each of these models that make up the digital twin are developed based on the data generated (process parameters, telemetry and metrology) during the DOE and process verification phases and may be targeted at a subsystem. The models are continuously updated based on new data collected during the production phases. The system has been deployed internally for testing purposes.
Extracting key-fields from a variety of document types remains a challenging problem. Services such as AWS, Google Cloud and open-source alternatives provide text extraction to "digitize" images or pdfs to return phrases, words and characters. Processing these outputs is unscalable and error-prone as varied documents require different heuristics, rules or models and new types are uploaded daily. In addition, a performance ceiling exists as downstream models depend on good yet imperfect OCR algorithms.
We propose an end-to-end solution utilizing computer-vision based deep learning to automatically extract important text-fields from documents of various templates and sources. These produce state-of-the-art classification accuracy and generalizability through training on millions of images. We compare our in-house model accuracy, processing time and cost with 3rd party services and found favorable results to automatically extract important fields from documents.
Bill.com is working to build a paperless future. We process millions of documents a year ranging from invoices, contracts, receipts and a variety of others. Understanding those documents is critical to building intelligent products for our users.
Making AI work in real applications involves a lot more work below the surface than the AI hype would have you believe. The weaknesses of AI and machine learning algorithms have been compensated for by the availability of a lot more data. For organizations to succeed in the new digital economy of interactions, data (structured and unstructured) has to be turned into an asset that is easy to collect, manage and utilize. We cover some of the lessons learned in making AI work in the context of case studies at Fortune 500 companies.
When dealing with fraud in real-time payments, the reaction needs to be fast. The cost of mistakes in fraud analytics is very high, yet it is important to preserve a good customer experience and to reduce user friction. This presentation focuses on best practices to establish analytics as a meaningful business resource and how to make it more effective and pervasive. Learn how to communicate with quants, how to obtain buy-in from decision-makers and technology partners, and how to select models based on the impact on business metrics and revenues.
New advances in natural language processing have recently started moving from research to real-world production implementations. The session reviews recent case studies in several of the USA's largest healthcare systems and pharmaceuticals that applied novel research in deep learning and transfer learning to better answer medical questions, enable real-world data, predict patient outcomes and population risk, and anonymize data at scale. This session is intended for people looking to understand what it possible right now - and what are the lessons learned from the early adopters.
Sales forecasting is a key process in defining a realistic revenue target for public and private companies and it’s a backbone of short term and long-term strategic planning. Despite the rapid growth of AI application across many business processes, Sales forecasting to a large extent has still been driven by human intelligence, a time-consuming effort with high likelihood of human error and significant inaccuracy. Many prior efforts in developing large-scale sales forecasting engines have not been successful mainly due to the lack of clear definition on what machines can tackle vs human. This presentation demonstrates the strategy and process that led to the development of a large scale AI driven sales forecasting engine in practice that impacted many business processes including Revenue Recognition, Commercial Planning, Product Marketing, Supply Chain, and Strategic Planning.
At One Concern, we develop models for digital twins and their resilience by employing machine learning and advanced statistical methods to build a platform where organizations, communities, and private and public sectors understand, forecast, and mitigate climate risk. This session will cover how One Concern develops digital twins and resilience models by applying machine learning algorithms.
It takes 10-13 years to design, manufacture and deliver a new aerospace product, which can inhibit growth for the industry as a whole. With a strained global supply chain, it’s more important than ever to make design and manufacturing processes more efficient in order to keep pace with forecasted demand in commercial travel. In this session, Joakim Soederberg, Head of Data at Acubed, the Silicon Valley Innovation Center of Airbus, will discuss how to apply model-based engineering and digital technologies to manufacturing processes in order to reduce lead times, production costs and improve workflows dynamically.
Join Sharada Narayanan, dotData Data Scientist, as she presents on the challenges of building a predictive analytics practice in small organizations. What do you do when you don't have an established data science team but want to benefit from AI and Machine Learning? Sharada will walk through the problems of adopting an ML practice and how new techniques and tools can help accelerate the adoption process
Crystal Quota is a predictive machine learning framework that enables Google Cloud Quota operations to automatically grant or deny Quota Increase Requests for 100 million Cloud users. This framework improves the rate of quota approvals, reduces manual toil, and proactively provides a robust defense pillar against scaled abuse on the platform.
As healthcare produces more clinical and research data than ever before, there is a need for AI to efficiently use and reuse the data and to reduce the workload on our practitioners. Yet, AI development in healthcare has not seen widespread adoption, and one of the biggest bottlenecks is that the data is not AI ready. Join Ajun Prakash, Snorkel AI’s Director of Solutions, to learn how Snorkel is unblocking the data challenge, and helping payors, providers, and pharma accelerate their adoption of AI.
The majority of organizations have AI as a top company initiative, yet only 1% of models created today have their desired business impact. In this session, we’ll unpack the 3 most common roadblocks that cause AI projects to stall or fail completely. Then, of course, discuss the best way to overcome them by leveraging resources you already have.
We’ll share how a global sustainability-focused paper manufacturer found a secret weapon to help scale the impact of a small team of only 3 data scientists to more than 200 non-coders completing advanced AI projects in less than 6 months.
RapidMiner was started by PhD data scientists who understood that the power of AI shouldn’t be reserved for… PhD data scientists. RapidMiner is a no-code data science platform that can enable anyone in your organization to complete AI/ML projects—from making sense of your data to building models and AI-powered apps to drive better decision-making.
Join Sharada Narayanan, dotData Data Scientist, as she presents on the challenges of building a predictive analytics practice in small organizations. What do you do when you don't have an established data science team but want to benefit from AI and Machine Learning? Sharada will walk through the problems of adopting an ML practice and how new techniques and tools can help accelerate the adoption process
A story of lessons learned and a guide for business leaders considering hiring for and deploying their first machine learning project. In this session, WeVideo Director of Analytics Ryan Withop will explore what went right/wrong, the specific tools used, and how to avoid the top 5 pitfalls. You will hear successes and missteps and come out of the session armed with the knowledge needed to bring ML into your organization.
2:45pm - 3:05pm
Johnson Controls began as a thermometer manufacturer in 1883 and operates, now, as a global leader in building controls, equipment manufacturer and service provider. In the middle of a digital transformation, the firm strives to use the best practices of Predictive Analytics and Machine Learning to improve operational performance in its Global Services business.
This presentation will discuss the journey of the company's Global Services business, from its analytics greenfield origin through the present, detailing all aspects of building the infrastructure necessary to solve problems using Machine Learning and Predictive Analytics. Follow along as we discuss the real-world, complicated steps necessary to predict customer CHURN using legacy industrial data from disparate, non-curated systems.
The size of the credit market in US Dollars is in the tens of trillions, providing credit to people, business, and governments. Traditional credit scoring has been the primary tool for assessing the risk of default and is an indispensable tool for lenders. Due to the decentralized, pseudonymous nature of cryptocurrency, the same credit scoring models used by traditional lenders aren't useful. There is, however, a wealth of data available, and an opportunity to leverage that data to better assess the risk of borrowers. This talk will explore that data, the challenges, and the opportunities for both lenders and borrowers.
Building and deploying predictive models for the COVID-19 pandemic was challenging and most of the models have not performed as well as hoped. I cover five lessons learned from analyzing data by the Pandemic Response Commons, a not-for-profit that collects, analyzes and shares COVID-19 related data in the Chicago region. I also look at the challenges of understanding COVID-19 health disparities and present the results of models showing the unequal impact of COVID-19 on different populations. We conclude by discussing how regions can prepare for the future by putting in place persistent infrastructure for regional data collection, analysis and sharing.
Johnson Controls began as a thermometer manufacturer in 1883 and operates, now, as a global leader in building controls, equipment manufacturer and service provider. In the middle of a digital transformation, the firm strives to use the best practices of Predictive Analytics and Machine Learning to improve operational performance in its Global Services business. This presentation will discuss the journey of the company's Global Services business, from its analytics greenfield origin through the present, detailing all aspects of building the infrastructure necessary to solve problems using Machine Learning and Predictive Analytics. Follow along as we discuss the real-world, complicated steps necessary to predict customer CHURN using legacy industrial data from disparate, non-curated systems.
AI is everywhere—and you can use it today across a variety of industries and scenarios. Join this session where Shell team talk about the Data Science platform & Edge Platform. We will provide a behind-the-scenes look at the technology & Shell.ai Platform helping in accelerating development of Ai Products, and where it's taking this diverse company.
With python, it is easier than ever to retrieve very targeted data from massive document repositories, apply NLP to create curated datasets, and then mine that text data for domain-specific insights. This case study will discuss a simple 5-step process for extracting information from a U.S. government database for regulatory compliance. The business goal is to identify the questions that regulators ask relative to certain operating conditions, and how peer companies in the industry have responded. These methods would also be useful for other use cases such as analyzing work orders, maintenance logs, and other text data sources relating to plant operations.
Delivering an effective data-driven presentation to a nontechnical live audience isn’t the same as discussing technical details with peers or delivering a written document. You must be purposeful and diligent if you want to develop a presentation that conveys a compelling story while simultaneously avoiding myriad traps that undercut your credibility and limit your impact.
Based on the new book Winning The Room, this session will provide concrete strategies and practical tips to clarify, simplify, and refine data-driven presentations in a way that maximizes comprehensibility without sacrificing accuracy. It will also utilize instructive and memorable visuals that illustrate how you can drive your points home and help your audience understand and retain your message.
By following the advice discussed in this session, you’ll get better at creating and delivering data-driven presentations that provide information in a manner enabling it to be received, understood, and embraced by your audience.
In this talk, we will explain the approach we took in modeling user growth in Twitter in the aftermath of the Pandemic. We took a structurally causal approach in modeling user growth, which has now helped us in producing explainable forecast for our topline user metric, in performing attributable analysis for teams focused on the top of the funnel, and in projecting the contribution of our causal improvements for the next few quarters.
How does an enterprise company look to find new business opportunities? How can marketing and go-to-market strategies align their objectives and processes to drive new business growth? Understanding the ideal customer profiles (ICPs) and customer decision journey are important steps to inform the optimal strategy for success. At Zendesk, our research Center of Excellence (COE) team has conducted the ICPs (ideal customer profiles) and customer journey analysis through in-depth interviews of our customers as well as advanced analytics to map the customer decision journey from initial awareness, engagement, purchase to expansion. In this session, Zendesk's Weiwei Hu will share the key learnings and best practices on applying both qualitative and quantitative research to impact customer segmentation and targeting, go-to-market support and planning, as well as optimize brand positioning, messaging, content creation and delivery to better meet the customer needs. These key learnings and insights will enable Enterprise leaders to best leverage data and insights to understand customer's interests, activities, and touch points leading up to new and expansion deals as well as to engage customers at the right time with the right messaging, content and channels to create more relevance, stronger interests and increase loyalty among their customers.
In this session, an overview of Statistics and Machine Learning Algorithms with Supervised Learning (Logistic Regression, GLM Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Neural Networks) and Unsupervised Learning (Z-Score, IQR, DBSCAN Clustering, Principal Component Analysis, etc.) will be provided. Then, we will have a holistic comparison of each method at main analytics stages via two use cases. An insurance predictive analytics use case will be employed for the supervised machine learning comparison and a banking outlier detection analytics use case will be employed for the unsupervised machine learning comparison. Finally, the analytics result validation, implementation, and interpretability will be discussed and compared. Sample Python code will be shared.
Timely capacity planning of cardiothoracic ICU relies on early doctor’s recommendations on the type of bed a patient will need after surgery. A predictive model might be used to potentially help doctors make more accurate recommendations. Using the data of around 1500 cardiothoracic surgeries we built a gradient boosting model which predicts whether a patient will need a PACU (i.e. fast track trajectory) or an ICU bed with AUC=0.8 (precision=0.59, recall=0.84). In a hybrid scenario where the recommendation for PACU trajectory is made combining doctor’s recommendations and model’s predictions, the number of patients misclassified to PACU would be 1-in-10, which would be a significant improvement over the current 1-in-5.
Is your team facing challenges getting ML models into deployment? Bring your obstacles and roadblocks and in this session, we'll help you bust through them. Our panel of experts will share practical advice and actionable takeaways. Is your organization crushing it when it comes to deploying models? Join the discussion and help your peers who may be a little less fortunate.
Asset prices are primarily affected by fundamental, technical, and market sentiment factors. The last factor refers to the public perception of the market and individual instruments traded within. While this perception is not solely based on information published in the media, online news sources and social media websites have a high impact on market sentiment and consequently, asset prices. A journalist with an investigation disclosing publicly traded company's unethical or illegal business practices will more-than-likely cause a change in market sentiment, which in-turn would result in a stock price change. Due to the impact of online media on the financial markets, sentiment-driven trading strategies have been and still remain an integral part of hedge fund portfolios. In this session, Vladyslav Ivanov, a Quantitative Researcher for Outremont Technologies digital assets manager, will cover the current state-of-the-art transformer-based deep learning models for sentiment analysis. He will discuss use of these models in the context of forecasting technology sector asset prices and will perform empirical assessment of models performance. The subjects of models application in latency-sensitive environments, as well as integration into the signal generation processes will also be addressed.
As analytics becomes pervasive in every organization, how can you be sure the results of your analytics are driving the best decisions? Join Elder Research's Director of Commercial Analytics, Dr. Jennifer Schaff, as she walks you through the steps on how to validate and trust your models and drive more confidence in your decisions.
Artificial intelligence (AI) has quickly become a main focus topic for retail organisations worldwide. What started in small R&D environments in the "big data" revolution a few years ago has now grown into a mature practice where data scientists and data engineers work together towards common business goals, such as demand and supply chain forecasting, customer recommendations, and fraud detection. This growth also comes with challenges; machine learning models cannot live on their own and have to be incorporated into production environments. To that end, programming frameworks, tools and infrastructure are evolving at an enormous pace. New architectures and design pattern have arrived to work with these new technologies. One important field of research is MLOps, which has evolved into a way of working and set of best practices to deploy, test, manage, and monitor machine learning models in production. In this session, we'll explore this relatively new subject. Bas will explain the need for MLOps, dive into the tools and techniques, and give some examples of real-world retail solutions.
4:55pm - 5:15pm
In-App ratings can be used to train machine learning models. For instance, these ratings can serve as a valuable input to models that can generate a churn score. This score can be used to run targeted campaigns to retain users. Additionally, the users who provide a higher rating can be shown promotions to upsell premium offerings or establish loyalty programs. With such a wide range of benefits available from In-App ratings, product managers should definitely leverage them to better understand the user base, make impactful changes to the app and run personalized campaigns.
TRACK 3: CASE STUDIES - Cross-industry business applications of machine learning
5:20pm - 5:40pm
Small, local organizations and non-profits often have a great need for data, both for funding purposes and program development/course correction. One challenge organizers face is aligning these stakeholders around shared metrics and measurement. In this session, we will show how we were able to convene an array of organizations, from hospitals to universities to regional non-profits, around an interactive dashboard we built with hyperlocal insights into the state's recommended 21 SDOH (social determinants of health) indicators. Machine learning enriches this dashboard -- for example, each of these 21 indicators are predictively modeled in order to assess the likelihood of meeting 2030 goals. By making our dashboard publicly available, equity-focused, intuitive to use, comprehensive, and fully customizable by our users, we were able to support organization around shared goals for the year 2030.
Certain types of insurance that are legally required for consumers, such as auto and home insurance, typically face a high amount of regulatory scrutiny. Insurers in the United States often must obtain state regulatory approval of their pricing models before being able to sell new insurance products or change pricing of existing products. Beyond that, model users must feel assured that the variables and the relationships between variables used in the models are logical and intuitive, both to themselves, as well as to stakeholders affected by the model results. In this session, we’ll discuss best practices and case studies for analyzing and building regulator and user understanding and trust in machine learning models developed for insurance applications. We will also discuss how the data science industry is leveraging the rating and advisory organization model to obtain streamlined regulatory review and approval of complex models for use by insurance carriers.
It is difficult to estimate whether any individual patient will refill their prescription, even if we provide them a scheduled reminder. Inferring whether properties of that reminder are important, such as when it is delivered, is more difficult. Bayesian multi-factor models (also known as hierarchical or random effects models) offer a robust way to model this problem, accounting for differences across patients and explicitly accounting for model uncertainties. Fitting robust Bayesian models can be difficult in practice, but in the last few years work has been devoted to "Bayesian workflows," which offer principled approaches to building such models. In this talk, we will use one recent project, in which we built a model to understand the impact of reminder-refill timing, to discuss key elements of the workflow and how Bayesian techniques provide us with robust models that are able to quantify the uncertainties inherent in statistical inference.
Long short term memory neural networks (LSTMs) originally rose to prominence in natural language processing, but have also shown value in time series. One drawback of LSTMs, and neural networks more generally, is that it can be difficult to derive effective confidence intervals. Two primary methods for calculating confidence intervals in LSTMs have been proposed: dropout-based simulation and estimating probability distributions.
In this case study we look at dropout-based simulation, which we found to be effective and more flexible than distributional estimation. However, added flexibility came at the cost of elevated computational burden.
Long short term memory neural networks (LSTMs) originally rose to prominence in natural language processing, but have also shown value in time series. One drawback of LSTMs, and neural networks more generally, is that it can be difficult to derive effective confidence intervals. Two primary methods for calculating confidence intervals in LSTMs have been proposed: dropout-based simulation and estimating probability distributions.
In this case study we look at dropout-based simulation, which we found to be effective and more flexible than distributional estimation. However, added flexibility came at the cost of elevated computational burden.
Machine Learning Week - Las Vegas - Day 2 - Wednesday, June 22nd, 2022
The UPS Smart Logistics network is a framework that continually incorporates the latest technology trends to serve customers better and more efficiently. Today it connects all the components of the transportation value chain by integrating Operations, Technology, Data and Optimization. Among many technological innovations, machine learning plays a critical role in the planning and execution of our integrated transportation network. In this talk, we will give an overview on applying predictive analytics to different phases of the network planning. The self-learning Demand Management model will be spotlighted with technical details and business impact. As for the connection to day-to-day operation, we will share our experience on deploying machine learning models to automate the key planning and execution decisions. At the end, we will also share our vision of transforming our self-learning network to a smarter self-healing network.
Financial institutions generate a significant volume of data that is complex and varied. Such datasets originate independently in separate business units for various reasons including regulatory requirements and business needs. As a result, data sharing between business units as well as outside the organization (e.g. to the research community) is constrained. Furthermore, data containing personal information must be protected. Accordingly, it is difficult to develop and test new algorithms on original data. One solution is to synthesize financial datasets that follow the same properties of the real data while respecting the need for privacy of the parties involved.In this presentation, J.P. Morgan's Tucker Balch will review the firm's approach to synthetic data. He will highlight three main areas: 1) Generating realistic synthetic datasets. 2) Measuring the similarities between real and generated datasets. 3) Ensuring the generative process satisfies any privacy constraints.
In order for predictive analytics to have the most impact possible on patient care, we need to be able to focus on the specific population that is being treated. However, to prevent every organization from solving the same problem, we also need to create predictive analytics that generalize well. This tension drives many decisions in the model development process, from how we gather and analyze data, to how we support healthcare organizations that are deploying predictive models. We will discuss our approach to developing and deploying machine learning solutions with these two goals in mind.
The UPS Smart Logistics network is a framework that continually incorporates the latest technology trends to serve customers better and more efficiently. Today it connects all the components of the transportation value chain by integrating Operations, Technology, Data and Optimization. Among many technological innovations, machine learning plays a critical role in the planning and execution of our integrated transportation network. In this talk, we will give an overview on applying predictive analytics to different phases of the network planning. The self-learning Demand Management model will be spotlighted with technical details and business impact. As for the connection to day-to-day operation, we will share our experience on deploying machine learning models to automate the key planning and execution decisions. At the end, we will also share our vision of transforming our self-learning network to a smarter self-healing network.
Training a search relevance model using grocery shopping engagement data is challenging since the user behavior is different compared to the other e-commerce applications.
People add multiple items to their cart and buy previously purchased items regardless of search relevance, thus user engagement data becomes noisy.
In this talk, we show how to train a transformer based embedding model using this noisy data. We propose various data cleaning/augmentation techniques, self adversarial negative sampling and cascade training to efficiently utilize the data. The trained model outperforms the existing baseline on human annotated data.
Our latest research shows most models fail to deploy. Machine learning's operationalization -- the model's change to existing processes in order to improve them -- takes a lot more planning, socialization, and change-management efforts than most data scientists ever begin to realize. The problem is more in leadership than in technology; no technical solution such as MLops addresses the fundamental root of the problem. Without deployment, ML does not achieve value.This industrywide crisis stems from a lack of proper ML leadership. The great potential of ML is intact -- the value proposition is solid and the core tech, research, and analytical results are legit. And it isn't a flop -- many ML projects succeed, even if only a minority. In this talk, Machine Learning Week founder and bestselling author of "Predictive Analytics" Eric Siegel will outline the required ML leadership practice. It ain't rocket science, but it's rarely well understood and thoroughly executed.
As Financial Services increasingly embrace digitization, AI presents many opportunities for efficiency gains and automation across the entirety of a bank’s operations. However, a lot of these efforts to develop and operate AI applications have been bottlenecked by the data not being AI ready. Join Ajun Prakash, Snorkel AI’s Director of Solutions, to learn how Snorkel helps Financial Services companies solve their data challenges, and discuss a few case studies of operational efficiencies this has unlocked.
This unique expert panel will provide a balanced and international perspective on where data science is heading within the healthcare industry.
Our latest research shows most models fail to deploy. Machine learning's operationalization -- the model's change to existing processes in order to improve them -- takes a lot more planning, socialization, and change-management efforts than most data scientists ever begin to realize. The problem is more in leadership than in technology; no technical solution such as MLops addresses the fundamental root of the problem. Without deployment, ML does not achieve value.
This industrywide crisis stems from a lack of proper ML leadership. The great potential of ML is intact -- the value proposition is solid and the core tech, research, and analytical results are legit. And it isn't a flop -- many ML projects succeed, even if only a minority. In this talk, Machine Learning Week founder and bestselling author of "Predictive Analytics" Eric Siegel will outline the required ML leadership practice. It ain't rocket science, but it's rarely well understood and thoroughly executed.
In this session, Google Research Scientist Hamza Farooq will cover a data-driven, NLP-based framework for understanding Google customer feedback across any text-based data sources. He will cover a framework named Stardust that uses an ensemble of machine learning / deep learning models to parse and score topics to measure sentiment and context pertaining to trust and customer experience
Many stories have appeared in the media about AI algorithms resulting in predictions and decisions that are biased, unfair, or otherwise harmful. Sometimes the harm is intended, but more often problems occur when algorithms cause harm that is a surprise to the data science team in charge. This talk will discuss several examples, and present a framework for "Responsible Data Science." The Framework is an extension of current industry standards for technical and business best practices, and provides processes and procedures by which data science practitioners and project managers can reduce the chances of unintended harm.
Value chains have been evaluated for decades. In this era of digital transformation, understanding the Data Science Value Chain is critical, but seldom is it examined as a system, nor re its component parts subject to systematic study. In this session, it will be shown how ML exists as a component within the value chain along with data acquisition, cleansing, formatting and accessibility. A conceptual case study aggregated from several non-specific sources shows the path from good data to the benefits of ML over traditional methods such as Designed Experiments. An overview of the 4.0 culture is integrated for a broader view of the benefit of ML within the value chain. Digital transformation's effects on the value chain are also integrated within the 4.0 culture. This presentation will highlight the myth that "everything data belongs to IT" by showing management and non-IT professionals the need for more knowledge about the data science value chain and where they fit within its constructs. Proposing a collaborative activity among the non-IT parts of the organization and an analytics maturity model expands where and how ML benefits decision making throughout the organization. The collaborative process also enhances communication of ML results to assist management in seeing beyond IT as a sole resource.
Today Stochastic Gradient Boosted Trees is a workhorse algorithm that is widely used in the Data Science community in the form of algorithms such as XGBoost, but in the early part of the first decade of this century it was a cutting-edge technique that had yet to be widely adopted. In 2004 Dr. Paslaski led the creation Capital One’s Analytic Testing Lab and was responsible for it until he left Capital One in 2007. The highlight of his time there was the introduction of TreeNet, an implementation of Stochastic Gradient Boosted Trees, to the Capital One statistical community and its subsequent adoption.
Dr. Paslaski will share the story of how Capital One became an early adopter of Stochastic Gradient Boosted Trees and some of the lessons he learned about how to drive change through Data Science along the way. These lessons can be applied in a general business setting and include
- It you are trying to drive meaningful change in a large company expect failure. Success = tenacity + empathy + innovation
- It's not about finding the best idea; it’s about finding the best idea your business partners will accept
- Understand who the end user is and what they need to use your product
- Be part of a team: If you are stuck get help
Can automl be used to forecast the price of S&P 500 futures? Can various stock technical analysis indicators like moving averages, exponential moving averages, Bollinger bands, relative strength index etc. be used to forecast the S&P futures price for the next day? In this session, Jiwani will share the results of case study that used various autoML tools like H2O automl, H2O Driverless AI, Rapidminer, and other automl software packages. He will examine the results from them all and go over the performance of each one. He will also reveal that some of these are overfitting.
Long-term health consequences of COVID-19 are symptoms that continue weeks or months after first diagnoses. Symptoms span respiratory, neurological, psychological, and cardiac problems and range from mild to debilitating. Little is known about the risk factors contributing to long COVID, whether vaccines play a role or the best treatment options. UnitedHealth Group data represents millions of COVID-19 patients – some fully recovered and others that suffer from continued health consequences. Machine learning provides the opportunity to characterize these risk factors and predict probability that future disease will occur.
Value chains have been evaluated for decades. In this era of digital transformation, understanding the Data Science Value Chain is critical, but seldom is it examined as a system, nor re its component parts subject to systematic study. In this session, it will be shown how ML exists as a component within the value chain along with data acquisition, cleansing, formatting and accessibility. A conceptual case study aggregated from several non-specific sources shows the path from good data to the benefits of ML over traditional methods such as Designed Experiments. An overview of the 4.0 culture is integrated for a broader view of the benefit of ML within the value chain. Digital transformation's effects on the value chain are also integrated within the 4.0 culture. This presentation will highlight the myth that "everything data belongs to IT" by showing management and non-IT professionals the need for more knowledge about the data science value chain and where they fit within its constructs. Proposing a collaborative activity among the non-IT parts of the organization and an analytics maturity model expands where and how ML benefits decision making throughout the organization. The collaborative process also enhances communication of ML results to assist management in seeing beyond IT as a sole resource.
Historically, edge machines have been viewed as dumb devices with the simple expectation that they perform tasks on command. With the recent advancements in deep learning, can we change this norm? Yes — with adaptive intelligence but the definition of this intelligence varies by form factor and associated experiences.
This session gives an inside view of inferencing on edge devices at Qualcomm. We discuss two classes of use cases: high performance and ultra low power. Our high performance use cases span multiple verticals that leverage computer vision applications (streaming, gaming and auto). Our ultra low power use cases are Always on Vision, Always on Audio and sensing in the Mobile and IoT markets.
Thought leaders in machine learning, Dean Karl and Steven, field questions from the audience about strategies for machine learning projects, best practices, and tips, drawing from their decades of experience as consultants and company executives.
Paychex is an American provider of human resource, payroll, and benefits outsourcing services for primarily small- to medium-sized businesses. We envision delivering curated, concise labor force analytics to ~700,000 customers representing tens of millions of workers. Although we have been successful in deploying analytics on-prem for 15 years, our existing systems have been unable to meet intensifying demand to serve larger numbers of users. We would like to harness cloud scalability while minimizing disruption to our work processes and security policies. Therefore, we are exploring a hybrid approach involving on-premises application hosting joined with cloud-based data engineering and model building. This talk describes advantages and tradeoffs we've experienced while migrating models to the Azure cloud. The elasticity of the cloud makes explainable AI achievable; we see up to 1,000-fold increase in computational speed. In addition, distributed processing has enabled us to overcome data issues and implement automated tests. Challenges have included networking infrastructure and access limitations. We've had to change ourselves to be cloud-ready, while the cloud has adapted to meet our requirements.
Closely following the latest in machine learning techniques, Conde Nast will uncover how it leverages a broad set of first-party data behavioral signals from its diverse content across touch points to better align audiences with intent. This session will cover how the media company taps into this highly effective methodology to drive more relevant advertising experiences and highly performant advertising campaigns.
Safety National Casualty Corporation is the leader in Excess Workers' Compensation Insurance. Premium audit of Excess Workers Comp policies requires considerable resources and time at the end of each policy period, especially when the audit is conducted physically. To optimize the premium audit process, in collaboration with the audit and underwriting department, our data analytics team developed a set of predictive models, which leverages historical audit data and account information to predict future premium audit results. The prediction results have been applied to optimize the ordering of audits to collect more premium faster, selectively waive audits based on expected additional premium, and more efficiently allocate premium audit resources.
A key to meaningfully and sustainably accelerating patient flow, improving quality, and saving caregiver time, is having the ability to spot situations and risks early, so caregivers and expediters can intervene in the moment. Real-time, contextual information that is simple to digest and easy to access, is the currency with which to make this happen. By combining clinical expertise, real-time and predictive analytics, and pre-defined action sets, Humber River Hospital in Toronto, Canada, is unlocking capacity, improving protocol compliance and reducing patients sent to ICU, while at the same time improving care team communication and reducing caregiver stress.
Hitachi and Arviem are delivering insights from 10+ years of data in the marine cargo industry. Traditionally, businesses have had limited visibility into the condition of their shipments while they are transported. By using ML and IoT data, the team helps reduce losses and identify root causes of outcomes. Methods range from clustering to weak supervision. We answer questions such as: Is fragile cargo likely damaged? What kind of packaging would have prevented damage? Is a particular shipment likely to develop mold? In addition to manufacturers and shippers, insurers can also fine-tune underwriting models and improve claims processing.
Building an ML solution leveraging deep learning is not just about building a core model. A solution may entail building and maintaining multiple models. In addition, several enabling technologies and tools are also required. Can an organization build each of these technologies in-house? How does this impact overall costs and time to market? With several open source and commercial off-the-shelf offerings becoming available, when should an organization build these technologies and when should it buy them? In this session, we will discuss how to make these build vs buy decisions, including benefits, decision criteria and critical success factors.
How much model search power can a given dataset endure before its confessions are spurious? I’ll describe how to measure the complexity of a model using how it behaves, rather than appears -- and how complexity depends on the extent of the algorithm’s structural search as well as its fitting capability. Think of datasets as having a “complexity capacity”. By matching that to the search and fitting powers of an algorithm we can avoid the disappointment of mis-fitting, but mostly the disaster of over-search, where training results look much better than out-of-sample predictions.
I introduce a new concept and propose a way to estimate it: How much model search power can a given dataset endure before its confessions are spurious? I’ll explain “complexity capacity” with some simple controlled experiments and explore it during the search for a working investment timing strategy. By measuring the search power of an algorithm and the complementary search capacity of a dataset, we can avoid mismatches -- the disappointment of under-fit or under-search, yes, but mostly the disaster of over-search, where training results look great but out-of-sample predictions are worthless.
OSF Healthcare’s Advanced Analytics team is focused on helping our organization find the best fit data science solutions to serve our patients and mission partners. Best fit solutions may be internally developed, supplied from an existing platform, come through one of our innovation partnerships, flow in from research partnerships or be purchased from a traditional analytics vendor. As a part of our intake governance, we leverage close partnerships with multiple other areas of the business to ensure we’re selecting initial solutions that best meet our needs. We also work to continually keep up-to-date on new offerings so we can actively re-evaluate existing solution performance, adjusting approach as needed. In this talk, Dongsul will provide a high level description of Advanced Analytics’ intake governance approach and then focus specifically on three use cases: Advanced Analytics partnering with a researcher to implement a mortality model into production, model performance comparison resulting in a recommendation to continue internally developed solution & model comparison resulting in a recommendation to use a platform supplied solution.
Water is fundamental to an effective society. Individuals and organizations throughout society, including drinking water and wastewater systems, possess multifaceted relationships to water. Broadly, society seeks safe and abundant supplies of water while avoiding modern challenges of aging infrastructure, emerging contaminants, cybersecurity threats, legal compliance, customer satisfaction, climate change, ESG commitments, among other matters. Machine learning (ML) applications continue to offer meaningful solutions for drinking water and wastewater systems and other organizations with a relationship to water. The panel articulates the state of ML and its future implications in meaningfully addressing the challenges facing the water sector.
In this informal but informative presentation, Dr. James McCaffrey from Microsoft Research will explain advanced neural architectures including variational autoencoders, generative adversarial networks, Siamese networks, and Transformer architecture encoders. You’ll see running demos of each type of network. No math equations with Greek letters. Well, maybe one or two. You’ll leave this presentation with a solid understanding of what type of problem each network can (and cannot) solve, and have the all the knowledge you need to communicate with subject matter experts. No experience with neural networks is necessary to understand and benefit from this presentation.
One of the biggest challenges to making AI practical for the enterprise is keeping the AI application relevant (and therefore valuable) in the face of ever-changing input data and evolving business objectives. Join Arjun Prakash, Snorkel AI’s Director of Solutions, to learn how Snorkel is empowering practitioners combine the best of many standards in one Data-Centric approach to ensure their applications stay resilient and value-creative.
OSF Healthcare has been building and deploying predictive models into operational workflows for more than 10 years. This talk will provide a brief overview of the various methods successfully used to move models into production.
The latest poll reconfirms today's dire industry buzz: Very few machine learning models actually get deployed. This pervasive failure of ML projects comes from a lack of prudent leadership as well as various technical challenges. In this panel session, industry experts will weigh on to define which factors and practices contribute the greatest impact to ensure successful machine learning deployment. What are the most important organizational and technological ingredients? Come to this session to find out!
The latest poll reconfirms today's dire industry buzz: Very few machine learning models actually get deployed. This pervasive failure of ML projects comes from a lack of prudent leadership as well as various technical challenges. In this panel session, industry experts will weigh on to define which factors and practices contribute the greatest impact to ensure successful machine learning deployment. What are the most important organizational and technological ingredients? Come to this session to find out!
Effective capacity management for surgical departments includes models for prediction of surgery duration, ICU-bed type and length-of-stay. We used deidentified data from 3,000 cardio-thoracic surgeries to develop and validate predictive models of surgery duration that are based on random forest, extreme gradient boosting and linear regression. The ensembled predictive model of acute cardio-thoracic surgery duration reduced the number of surgeries “behind-the-schedule” by 28% (from 60% to 32%) and boosted the surgery “on-time” by 15% (from 30% to 45%). Surgery planners could benefit from the predictive models by creating an optimized surgery schedule as a prerequisite to effective capacity management and improved patient and staff experience.
This case study presents how the biggest building material company worldwide increased sales and reduced inventory by predicting 80% of its stockouts a week ahead using ML. The Company faced high stockout levels in Brazil due to limited warehouse space against high safety stock requirements affected by COVID. No historic data would explain the new pandemic-influenced demand pattern. The Company gained operational flexibility by creating a reaction process to mitigate out-of-stock risks using stockout’s forecasts in a pragmatic, simple to run and highly comprehensible approach.A scalable data structure and pipeline was set up to give full operational visibility of the previous two years, such as daily stock levels, transit stocks, production plans, schedules, demand forecasts/actuals, committed order books, stockouts history and more. The team trained a gradient-boosting algorithm that uses tree-based learning for the supervised problem of classifying whether there will be a following week stockout for a given warehouse-product combination. To have consistent results across regions and products the algorithm was trained with different time horizons, comparing accuracy and identifying new variables to explain deviations. After two months, the initiative released an 80% accuracy algorithm and F1 over 0.7 for top 5 warehouses in volume and 87 products.
Pull requests (PRs) have become de facto standard for code review/merge process by development teams using SCM tools like Github, Bitbucket. PR is a rich source of information about developers & reviewers. PRs can give us quite a lot of insights about the coding styles, logical skills of the developers as every single line of code is being reviewed and bad smells are getting highlighted by the reviewer. The comments/suggestions that reviewer gives helps in understanding the proficiency of the reviewer. We have developed a set of PR Analytics by applying Transformers based NLP, Decision Trees & Statistical Analysis on PR data.
PR Analytics can be used to perform skill assessment in order to find out the areas of improvement for the development team in a quantitative manner. PR Analytics can also help the Scrum masters & the project managers to better plan their deliverables since now they know strengths & weaknesses of the development team and can allocate the right developers for the right type of tasks
In this talk I would present some of the analytics that we have developed using data from bitbucket and how we are using them for improving the efficiency of our development teams.
Deploying a model in production is not enough. Successful machine learning models aren't just successfully deployed, they are measurably impacting the bottom line. Investment in a machine learning team comes with big ROI expectations and plenty of hype. The challenges to delivering that ROI are everywhere: from picking the right problems to managing stakeholder expectations to knowing what to monitor once your model is deployed in production. Join us to learn from successes and failures in making machine learning deliver on its promise.
Identifying anomalous observations has important business impacts across all industries. None more than in the world of fraud detection where some observations are intentionally trying to hide, which is different than most rare event problems that exist in modeling. This talk will highlight some modern approaches to anomaly detection: local outlier factors, isolation forests, and classifier adjusted density estimation (CADE). All of these techniques have foundations in places that were not originally anomaly detection. This talk will highlight these approaches as well as demonstrate the approaches using open source software.
Offering well-designed rewards is essential for increasing the engagement of the users. Boosting user activity in the desired products helps economies of scale and increases the efficiency of the platform. However, rewards incur significant cost. Therefore, designing an efficient learning algorithm to optimize marketing campaigns can significantly increase the profitability of these platforms.
In this talk, we consider the following problem: given a set of different promotion structures, assign each user to a promotion structure from the set, or nothing at all. The goal is to maximize user retention while respecting the budget/keeping the cost low. I propose a novel methodology to maximize the treatment effect in a budget constraint setting. Furthermore, we use Boltzmann-exploration to balance exploration and exploitation. This enables us to efficiently collect data and update the model regularly. Finally, I show that our approach outperforms the other alternatives including R-linear and generalized random forest.
Transaction data has immense potential to go beyond traditional data aggregation by banks, to connecting the dots and providing valuable customer insights across industries. By acquiring financial data and then cleansing and enriching it, organizations can derive insights into customer needs and behavior to provide more meaningful interactions, identify lending opportunities, uncover current risks, provide competitive analysis, improve marketing efforts of a retail giant, and identify growth opportunities for clients. In this session, we will explore the importance of utilizing transaction data and applying machine learning algorithms to datasets to clarify and categorize the transactional data. Institutions can leverage this customer data to provide personal experience and advice.
We explored the feasibility of deep learning algorithms to improve the accuracy of predicting daily emergency hospital visits by tracking their spatiotemporal association with PM concentrations. We compared predictive accuracy of the models based on PM datasets from a single but more accurate air monitoring station in each district and multiple but less accurate monitoring sites within a district in Seoul, South Korea. We used MLP (multilayer perceptron) to integrate PM data from multiple locations and then LSTM (long short-term memory) models to incorporate the intrinsic temporal PM trends into the learning process. The results reveal evidence that predictive accuracy is improved from 1.67 to 0.79 in RMSE when spatial variations of air pollutants from multi-point stations are incorporated in the algorithm as a 9-day time window. The findings suggest guidelines on how environmental and health policymakers can arrange limited resources for emergency care and design ambient air monitoring and prevention strategies.
The rate of adoption for AutoML and MLOps solutions is incredible. Despite all of the great work being done to operationalize ML across industry there are two areas which still require custom work: feature engineering and product integration. The AutoML we run is only as good as the data it has to learn from. We'll be discussing a Spark based approach to automating the feature engineering portion of any MLOps solution. The result is an abstracted, extensible solution for the feature engineering portion of your AutoML or MLOps solution.
Offering well-designed rewards is essential for increasing the engagement of the users. Boosting user activity in the desired products helps economies of scale and increases the efficiency of the platform. However, rewards incur significant cost. Therefore, designing an efficient learning algorithm to optimize marketing campaigns can significantly increase the profitability of these platforms.
In this talk, we consider the following problem: given a set of different promotion structures, assign each user to a promotion structure from the set, or nothing at all. The goal is to maximize user retention while respecting the budget/keeping the cost low. I propose a novel methodology to maximize the treatment effect in a budget constraint setting. Furthermore, we use Boltzmann-exploration to balance exploration and exploitation. This enables us to efficiently collect data and update the model regularly. Finally, I show that our approach outperforms the other alternatives including R-linear and generalized random forest.
Competition for top new analytics talent is fierce. While tech and other corporate giants are indeed vacuuming up new grads from top schools, not all great students can or want to go that route. The challenge is to put yourself in a position to attract and land them. It can be done, even if you're not a well-known brand. In this session, you will learn what works from a leader of the analytics program ranked #2 in the world by QS for the past three years. Even if you are from a giant firm, you're still competing for talent. You will come away with ideas to help you gain an advantage!
As more and more cloud providers develop and offer AI services, enterprises and s/w service providers are relying on integrating with them rather than building models themselves. Google, Amazon, IBM, and Microsoft are leading in this regard. In contact centers, there are multiple such offerings for Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and Natural Language Understanding (NLU). Building a global contact center cloud solution using these 3rd party services is a challenge, as there no one vendor is mature and performs well across all geographies, languages and use cases. How do we offer customers the choice of vendors for their use case? How can we help customers consistently and continuously benchmark multiple services on a common set of criteria and choose the right product for their use case? We are building a universal harness that allows customers to mix-and-match vendors. In addition, we are building a benchmarking platform to help customers compare and test multiple vendors for the same use case. In the presentation, we will discuss the metrics, techniques, automation, and learnings for this benchmarking solution.
Human In The Loop (HITL) is a process in which, as part of the ML workflow, experts are asked their opinion about predictions made by an ML model in order to tune and improve the model. In this talk we’ll explain how we collaborated with and integrated engineers as a core part of our machine learning process, in order to create a mechanism to automatically predict the best security policies for our customers. We’ll go through the different stages of the project, discuss the challenges we faced along the way and how we overcame them, and show how you can use a similar process for any heuristic/ML project you have.
In today’s social environment, where responsibility to justice and fairness is being reconsidered vigorously, the insurance industry finds itself in the middle of the debate. For those who say insurers should do more to help balance society’s scales, the industry’s reply has been an insistence that actuarial science is colorblind. Underwriting and pricing are built only on socially appropriate factors that are predictive of loss. Factors such as race, ethnicity, religious practice, sexual preference, or national origin are directly excluded from consideration. This industry position has been challenged for many years as the use of territory and credit have been directly criticized as proxies for race and/or income. Factors such as credit based insurance score, gender, occupation and education, all very predictive of loss, have likewise found challenges and restrictions growing. Adding to the intensity of today’s debate is a growing insistence by many that being “colorblind,” on social justice issues is no longer enough. Some critics insist that insurers, along with the rest of society, must be actively involved in promoting justice and fairness.
We have reached a point in this debate where the status quo will no longer suffice without significant support. A new law in Colorado will require insurance companies to demonstrate that their prices and practices are not unfairly discriminatory, and more regulatory action is expected in other states. Given this is a new requirement, the discussions have begun to move from should we do this to how do we do this. This presentation will discuss various definitions of bias and discrimination in rating and insurance practices, methods and techniques that have been developed in data science to uncover unintentional bias, and how they techniques can be applied to insurance industry practices.
The ultrasound examination is one of the most common techniques of medical imaging. FOLLISCAN is an analytical and predictive system based on interpretable deep learning algorithms to support healthcare practitioners in ultrasound ovaries diagnostics - antral follicles examination. The counting of follicles is of major importance, e.g., it helps estimate the ovarian reserve. Moreover, a large number of antral follicles indicates polycystic ovarian morphology.
FOLLISCAN is designed for fertility clinics, diagnostic centers and, hospitals and will respond to their well-identified need, i.e. time and cost savings - through wider access to ultrasound diagnostics to make better and more informed clinical decisions. The results of the project address the need to perform ultrasound examinations in a faster, easier, more objective and accurate manner.
In this talk, we will describe the process of building this solution and show the most important issues for creating a solution based on deep learning.
FOLLISCAN is currently tested and implemented in all INVICTA fertility clinics across Poland.
Ask our “Rockstars” anything about predictive analytics! Curious about machine learning tips, tricks, best practices and more? This is your opportunity to hear advice directly from the experts. The hardest part of machine learning can be getting models deployed into production. Our panel has done it, and is willing to spill the tea on how to make it happen. You’ll want to stick around for this ultimate session of the conference.
The complexity and computational requirement for deep learning are higher than with traditional machine learning methods such as logistic regression and decision trees. For some problems, especially those with few inputs, deep learning is overkill -- you might as well just use traditional methods. However, with continually-growing computational power and with the introduction of new modeling techniques such as transfer learning, will deep learning eventually become the de facto method for all applications and problem areas? Attend this session to hear the expert panelists weigh in and address questions from the audience.