Building predictive models has come to seem like one of the more glamorous jobs in business intelligence and analytics. Successful, high-profile predictive modelers like Nate Silver, founder of the ESPN blog FiveThirtyEight, and Rayid Ghani, chief data scientist for President Obama’s 2012 re-election campaign, can achieve rock star status. But looking beneath the surface, it becomes clear that most of the actual work involved in successful predictive analytics projects isn’t so glamorous.
“Everyone wants to build a model or do the business sell job” to help drive organizational decision making, said Jens Meyer, managing director of credit risk, data and portfolio management at The First Marblehead Corp., a student loan provider based in Medford, Mass. The problem is that most people don’t think enough about the process of deploying and maintaining analytical models, Meyer added during a presentation at TDWI’s The Analytics Experience conference in Boston last week.
For Meyer, building predictive models is only a small portion of the work involved. It’s often said that about 80% of analytics is collecting and cleaning data. But Meyer said even that ignores the substantial amount of work required after the data has been prepared and a model built. In order for the model to have a meaningful business impact on an ongoing basis, it needs constant tending after being put into production.
Meyer noted that predictive models don’t stay relevant forever. In fact, they often have very short lifespans, at least as originally designed. There are a number of reasons for this. For one thing, specific calculations about groups of customers or other populations naturally “drift” over time, as new data is collected and analyzed, he said. What was average when you first developed a model may soon start to become atypical. But if you aren’t constantly testing and verifying the model, you won’t catch that — and its predictive value will plummet.
External events can also diminish the effectiveness of predictive models, if they aren’t modified to match changing conditions. For example, Meyer said the 2009 housing-market crash threw many of his team’s models at First Marblehead into disarray because the metrics they were using to assess household wealth changed abruptly for many people, as home values plunged. His team had to adjust those models to account for the new reality.
For these reasons, Meyer puts an expiration date on his models, meaning they can’t be used after a certain date unless they’re revalidated. He recommended continuous A/B testing to ensure models in production use are still working as intended, and quickly making fixes if the testing illuminates flaws. “Continuously evaluate the model and see if you can do better,” he said. “Your work never stops in that respect.”
Ed Burns is site editor of SearchBusinessAnalytics. Email him at firstname.lastname@example.org and follow him on Twitter: @EdBurnsTT.