(adapted from Chapter 13 of the Handbook of Statistical Analysis and Data Mining Applications) After a first pass of training and evaluating a model, you may find you need to improve its results. Here is a checklist of ten practical actions that I’ve found usually help: Transform real-valued inputs to be approximately Normal in distribution. Regression, for instance, behaves better if the inputs are Gaussian; extremes have too much influence on squared-error. For variables that are typically log-normally distributed, like income, this involves transforming the variable via a logarithm or the more general Box-Cox function. Remove outliers. Note the
To view this content
OR subscribe for free
Already receive the Machine Learning Times emails?
Click here to complete this one-time subscription upgrade
The Machine Learning Times now requires legacy email subscribers to upgrade their subscription - one time only - in order to attain a password-protected login and gain complete access.