(Part 8 (of 11) of the Top 10 Data Mining Mistakes, drawn from the Handbook of Statistical Analysis and Data Mining Applications) Modeling “connects the dots” between known cases to build up a plausible estimate of what will happen in related, but unseen, locations in data space. Obviously, models – and especially nonlinear ones — are very unreliable outside the bounds of any known data. (Boundary checks are the very minimum protection against “over-answering”, as discussed in the next installment.) But, there are other types of extrapolations that are equally dangerous. We tend to learn too much from
This content is restricted to site members. If you are an existing user, please log in on the right (desktop) or below (mobile). If not, register today and gain free access to original content and industry news. See the details here.