Data Reliability and Validity, Redux: Do Your CIO and Data Curators Really Understand the Concepts?
 Here are two recent entries on...
On Variable Importance in Logistic Regression
 The model looks good. It's parsimonious,...
Data-Driven Decisions for Law Enforcement in Toronto
 For today's leading deep learning methods...
AI, Machine Learning, and the Basics of Predictive Analytics for Process Management
 APQC Chair Carla O'Dell interviews Predictive...

5 years ago
Why Overfitting is More Dangerous than Just Poor Accuracy, Part II

 In part one, I described one problem with overfitting the data is that estimates of the target variable in regions without any training data can be unstable, whether those regions require the model to interpolate or extrapolate. Accuracy is a problem, but more precisely, the problems in interpolation and extrapolation are not revealed using any accuracy metrics and only arise when new data points are encountered after the model is deployed. This month, a second problem with overfitting is the model interpretation. Predictive modeling algorithms find variables that associate or correlate with the target variable. When models are overfit, the

