(Part 2 of 11 of the Top 10 Data Mining Mistakes, drawn largely from Chapter 20 of the Handbook of Statistical Analysis and Data Mining Applications) Only out-of-sample results matter; otherwise, a lookup table would always be the best model. Researchers at the MD Anderson medical center in Houston (almost two decades ago) used neural networks to detect cancer. Their out-of-sample results were reasonably good, though worse than training, which is typical. They supposed that longer training of the network would improve it – after all, that’s the way it works with doctors – and were astonished to find that
Already receive the Machine Learning Times emails? The Machine Learning Times now requires legacy email subscribers to upgrade their subscription - one time only - in order to attain a password-protected login and gain complete access.