Machine Learning Times
Machine Learning Times
A University Curriculum Supplement to Teach a Business Framework for ML Deployment
    In 2023, as a visiting analytics professor...
The AI Playbook: Providing Important Reminders to Data Professionals
 Originally published in DATAVERSITY. This article reviews the new...
Decode the Algorithm: Navigate the World of Machine Learning in Business with ‘The AI ​​Playbook’
  This article reviews the new book, The AI Playbook, by...
To Deploy Machine Learning, You Must Manage Operational Change—Here Is How UPS Got It Right
 Originally published in Harvard Data Science Review. For more...

2 years ago
Wise Practitioner – Predictive Analytics Interview Series: David Talby, PhD of John Snow Labs


In anticipation of his upcoming keynote presentation at Predictive Analytics World Healthcare, Las Vegas, June 19-24, 2022, we asked David Talby, PhD, Chief Technology Officer at John Snow Labs, a few questions about their deployment of predictive analytics. Catch a glimpse of his keynote presentation, New Frontiers in Applied Natural Language Processing in Healthcare, and see what’s in store at the Predictive Analytics World Healthcare conference.

Q: In your work with predictive analytics, what behavior or outcome do your models predict?

A: At John Snow Labs, our Spark NLP library offers state-of-the-art clinical and biomedical Natural Language Processing (NLP) models that can be used to predict and optimize healthcare outcomes. Some of these use cases include accelerating clinical trials, predicting disease progression, providing clinical decision support, analyzing real-world evidence, detecting and preventing adverse drug events (ADE), and creating a more comprehensive view of each patient’s journey.

Q: How does predictive analytics deliver value at your organization – what is one specific way in which it actively drives decisions or operations?

A: The biggest advantage of NLP models in the Spark NLP library is that they enable healthcare organizations to see a more holistic view of their patients, providers, operations, and costs. The technology achieves this by bridging the gap between structured data — claims, electronic medical records — and unstructured data — free-text clinical notes, pathology and radiology reports, lab results, research papers, clinical trial documents, social media posts, etc. Serving as the connective tissue, Spark NLP can accurately understand information from all of these unstructured formats and systems to create a clearer, more accurate picture. This enables data scientists or domain experts, in this case clinicians, to make better decisions, whether it be a form of recommended treatment or how to appropriately staff an emergency room.

Q: Can you describe a quantitative result, such as the predictive lift of your model or the ROI of an analytics initiative?

A: Used by 33% of all enterprise AI teams and 59% of healthcare AI teams, Spark NLP has secured the spot as the most widely used NLP library in the enterprise in just 6 years on the market. Additionally, Spark NLP for Healthcare has established new state-of-the-art accuracy in peer-reviewed papers on public academic benchmarks over the last year. This includes providing the most accurate models ever for medical named entity recognition, relation extraction, assertion status detection, and ADE detection. Customers include half of the world’s top 10 pharmaceutical companies, the 5 largest US payers, and 5 of the largest US healthcare systems, among others.

A few examples of what organizations have been able to achieve working with John Snow Labs include:

  • Kaiser Permanente used John Snow Labs’ AI Platform (for model training, deployment, and monitoring) and Spark NLP for Healthcare (to extract key features from EMR notes) to optimize hospital patient flow models. The solution enabled real-time decision-making and strategic planning, by predicting bed demand, safe staffing levels, and hospital gridlock.
  • Mount Sinai used Spark NLP for Healthcare to predict the aggression level of psychiatric patients based on a combination of free-text clinical notes and structured data, and is also one of many use cases of applying the software to extract key tumor characteristics – like staging, grading, location, and histology – from pathology reports.
  • For The Khuluma Project, John Snow Labs delivered the data analysis and data science aspects of the initiative carried out to enhance positive mental health amongst HIV-positive adolescents in South Africa, with the aim of reducing and ultimately preventing the spread of HIV and AIDS.

Q: What surprising discovery or insight have you unearthed in your data?

A: As the field of AI matures, we now have cases in which algorithms are bypassing human accuracy in certain tasks. One example is de-identification, or anonymizing data automatically. But how do you know when this is happening? All you can really do is compare what the algorithm says to what the human expert says and analyze the differences between them. As such, when the model disagrees with your validated data set you need to start going one-by-one through the disagreements to see which is right. This is not uncommon: some of the most well-known academic datasets have material well-known mistakes.

When the state-of-the-art accuracy of an NLP model is 92%, and it is estimated that 4% of the labels are incorrect, then roughly half of the remaining error is matching incorrect labels. This means that it’s very likely that current state-of-the-art models already learned a lot of incorrect labels. This is not necessarily a problem, but it should be taken into consideration when setting expectations and remembering that — at least for now — AI is an imperfect science.

Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Predictive Analytics World.

A: This session will help attendees understand what is possible right now for NLP in healthcare, as told by early adopters in the industry. As new advances in NLP have started to move from research to real-world production implementations, practitioners have learned important lessons along the way. From better answering medical questions and enabling real-world data, to predicting patient outcomes and population health, we can start learning from how some of the largest healthcare systems and pharmaceutical companies in the US are putting NLP to good use.

Don’t miss David’s keynote presentation, New Frontiers in Applied Natural Language Processing in Healthcare, Tuesday, June 21, 2022, from 1:30 pm to 2:15 pm. Click here to register for attendance.

Leave a Reply