Predictive Analytics is one of the hottest careers on the tech scene these days and with good reason: data is being collected in ever greater amounts, and this data, it turns out, is useful for improving decisions and efficiencies of organizations. No surprises here for this audience.
My last article for the PA Times argued that we don’t need to know details of the mathematics in order to be good predictive modelers. However, we predictive modelers must become better at understanding how the models are going to be used and what the predictions mean for the business; there’s more to being a good predictive modeler than building predictive models.
Predictive modelers and data scientists must move past our fascination with the numbers, statistics, and algorithms (as fascinating as they are…to me anyway) and understand the decisions they are intended to change and improve. How will the model be used? What will a consumer of the model do with it, whether the consumer be an investigator, call center operator, or an automated system triggered by the prediction? What return on investment (ROI) will the models generate? What ROI do we expect to be generated by the model solely based on the distribution of model scores we see on validation data?
Models have limitations, and we have to explain what they can and can’t do for the company succinctly. What they can’t do for the company is just as important as what they can do; we must reign in unrealistic expectations of predictive analytics.
I was teaching a course recently and one of the participants told me the reason he was there is that his boss was sold on predictive analytics. So far, so good, right? But then when he explained that what they wanted from predictive analytics, my smile turned to a frown. Predictive Analytics (PA) cannot predict what we should do unless we have examples of decisions made in the past. PA couldn’t tell us in 2001 if the Segway would become the hit that many predicted it would be. Why not? Because there were no examples in history of Segways or Segway-like inventions with a subsequent “success” or “no success” flag attached.
In some ways, looking past the numbers, the statistics, and the algorithms is much like what Cypher described himself doing in “The Matrix”. In one scene, Neo was staring at the endless stream of numbers pouring over Cyphers computer screen and asked, “Do you always look at it encoded?” Cypher replied, “Have to. … there’s way too much information to decode the Matrix. You get used to it, though. Your brain does the translating. I don’t even see the code. All I see is blonde, brunette, and redhead.”
I’m not suggesting that when we are building fraud detection models and we see a confidence interval that we really see a brunette perpetrator getting cuffed. But understanding that our models have to be used by real people who have to make sense of what we are telling them to do (based on model predictions) should change the way we view our models, and ultimately improve our final product.