(Part 2 of 11 of the Top 10 Data Mining Mistakes, drawn largely from Chapter 20 of the Handbook of Statistical Analysis and Data Mining Applications) Only out-of-sample results matter; otherwise, a lookup table would always be the best model. Researchers at the MD Anderson medical center in Houston (almost two decades ago) used neural networks to detect cancer. Their out-of-sample results were reasonably good, though worse than training, which is typical. They supposed that longer training of the network would improve it – after all, that’s the way it works with doctors – and were astonished to
This content is restricted to site members. If you are an existing user, please log in on the right (desktop) or below (mobile). If not, register today and gain free access to original content and industry news. See the details here.