In part one, I described one problem with overfitting the data is that estimates of the target variable in regions without any training data can be unstable, whether those regions require the model to interpolate or extrapolate. Accuracy is a problem, but more precisely, the problems in interpolation and extrapolation are not revealed using any accuracy metrics and only arise when new data points are encountered after the model is deployed. This month, a second problem with overfitting is the model interpretation. Predictive modeling algorithms find variables that associate or correlate with the target variable. When models are overfit, the
Already receive the Predictive Analytics Times emails? As of January 2014, the Predictive Analytics Times now requires legacy email subscribers to upgrade their subscription - one time only - in order to attain a password-protected login and gain complete access.