By: Jeff Deal, Conference Chair, Predictive Analytics World for Healthcare
In anticipation of his upcoming conference presentation at Predictive Analytics World for Healthcare Las Vegas, June 16-20, 2019, we asked Jeff Heaton, VP, Data Scientist at Reinsurance Group of America, a few questions about their deployment of predictive analytics. Catch a glimpse of his presentation, How Much Data is Enough for Deep Learning, and see what’s in store at the PAW Healthcare conference in Las Vegas.
Q: In your work with predictive analytics, what area of healthcare are you focused on?
A: I have focused primarily upon building predictive models to analyze Electronic Health Record (EHR) data to predict the mortality and morbidity risks of insurance applicants. In the case of life insurance, mortality risk is the likelihood that an individual might die earlier than expected in their life insurance coverage period. In the case of health insurance, morbidity risk is the likelihood that an individual will be unhealthy and require medical care.
Q: What outcomes do your models predict?
A: Fluid-less underwriting is a new trend in the insurance industry that uses predictive models to decrease the need for invasive lab tests that require the collection of blood and urine. By using existing data sources, such as medical history, our models typically provide a risk score that is used to help automate life insurance underwriting. The less risky applicants can go through a quicker underwriting process. This is often referred to as accelerated underwriting.
Q: How does predictive analytics deliver value at your organization? What is one specific way in which it actively drives decisions or impacts operations?
A: Our models can be trained on underwriting data and learn to offer valuable insights that can become part of the underwriting process flow. In accelerated underwriting, a model determines the lowest and highest risk applicants and leaves the middle to human underwriters. As our models improve, the middle will shrink.
Q: Can you describe a successful result, such as the predictive lift of your model or the ROI of an analytics initiative?
A: At one point we were able to achieve a +3.1% accuracy boost by including features that were engineered using deep denoising autoencoders. The ability to perform some automatic feature engineering is one of deep learning’s greatest strengths. However, it still cannot replace engineered features created by a data scientist or subject matter expert with advanced domain knowledge.
Q: What surprising discovery have you unearthed in your data?
A: Lately, there has been much media attention comparing lack of physical activity to smoking. There have been a number of articles describing sitting as the new smoking. Our team analyzed data from the National Health and Nutrition Examination Survey (NHANES) III dataset to compare mortality risk between smoking and lack of physical exercise. The evidence to date points to one conclusion: Exercise is still not a better predictor of mortality outcomes than tobacco use, even though exercise improves mortality experience and activity becomes more important as we age. A person cannot exercise away the damaging effects of smoking, but they can smoke away the benefits of exercise.
Q: What areas of healthcare do you think have seen the greatest advances or ROI from the use of predictive analytics?
A: Healthcare is providing vast new datasets for predictive modeling. Epigenetics is the study of heritable phenotype changes that do not involve alterations in the DNA sequence. The name epigenetics implies features that are “on top of” or “in addition to” the traditional genetic basis for inheritance. I believe that epigenetics data will provide valuable insights into the health characteristics of individuals. Also, because epigenetic characteristics do not involve changes to the DNA sequence, they are typically more actionable than insights from traditional genetics.
Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Predictive Analytics World.
A: Electronic Health Records (EHR) are a rich data source made up of many different coding standards, such as RxNORM, National Drug Code (NDC), ICD-10, Logical Observation Identifiers Names and Codes (LOINC), SNOMED and others. Often this data occurs over time. This can make it difficult to determine if your data is varied enough to cover the situations that you seek to model and predict. How much EHR data is enough to build an effective predictive model? During my talk I will discuss how to access your EHR data and determine how well it fits the data that your model will be ultimately used on.