By: Dr. John Elder, CEO and Founder, Elder Research, Inc

 (Part 2 of 11 of the Top 10 Data Mining Mistakes, drawn largely from Chapter 20 of the Handbook of Statistical Analysis and Data Mining Applications) Only out-of-sample results matter; otherwise, a lookup table would always be the best model.  Researchers at the MD Anderson medical center in Houston (almost two decades ago) used neural networks to detect cancer.  Their out-of-sample results were reasonably good, though worse than training, which is typical.  They supposed that longer training of the network would improve it – after all, that’s the way it works with doctors – and were astonished to

