The development and application of predictive analytics solutions for organizations is not new, particularly in the area of consumer behavior. Models have been built since the end of the Second World War to predict consumer behavior with the first models being used to predict credit card loss. Direct marketers embraced these same principles and many organizations today now have a variety of predictive analytics tools which can do the following:
With all the discussion raging around AI (i.e., deep learning), and its use in many business disciplines, seasoned data science practitioners are often asked whether or not AI should be the predictive analytics or machine learning tool used in building these above solutions. But different disciplines will have different ways of evaluating success. The disciplines that have been using AI, which have seen the greatest impact, are the areas of image recognition and text recognition. These are the disciplines that are yielding the ground breaking improvements in AI.
In my early days as a data scientist in the early 90’s, data mining publications (all of the printed variety at that time) continued to publish research studies on accuracy rates of various AI techniques. Many of these studies involved the use of AI to improve the existing accuracy rates of image recognition which were ranging in the neighbourhood of approximately 45%. At that time, I realized that more advanced mathematics was being utilized than the typical mathematical tools I was using to predict marketing response and/or credit risk. However, I did understand that there was much work being done in feature engineering and the use of neural nets or deep learning in an effort to improve the image recognition rates.
Now fast forward to circa 2012. Big data and our ability to process it at incomprehensible faster speeds when compared to the tortoise-like data processing speeds of the 90’s allowed advanced analysts and data scientists to consume significantly larger volumes of data. Much like a car needs gas to run, AI and deep learning needs data and lots of it in order for its algorithms to build more optimal solutions. Data was the missing ingredient in the early studies of the 90’s. With its newfound fuel (DATA), the accuracy rates of AI techniques or deep learning in image recognition rose from 45% as seen above to well over 90%.
But as a data scientist, objectivity is the reality of our discipline. In other words, does AI or deep learning apply in all business scenarios? Can these incredible improvements in improved accuracy rates in image recognition translate to the world of predicting consumer behaviour from both a marketing standpoint and risk standpoint?
We still continue to develop models using traditional techniques which our clients understand both in terms of performance and business benefit but equally important in being able to easily explain the key drivers of a model. Let’s examine this more closely as we look at both issues of explainability and performance in assessing whether or not to implement a given model.
On the first issue of explainability, using the more traditional techniques we could produce the following type of report:
This report for a marketing response model can be clearly understood by non technical people as it highlights three key facts:
Furthermore, the business stakeholder in this example could understand the profile of a responder and identify the strong drivers versus the weak drivers of the model.
In the example of deep learning where we have our input variables entering the neural net through the input layer, the hidden layers represent a form of feature engineering in that the weights of each input variable strive for optimization. Optimization routines can vary but the more common routines utilize various versions of gradient descent which is essentially looking at the rate of change in attempting to converge to a solution that minimizes the difference between the predicted output and observed output. Convergence occurs when this rate of change between two iterations is at a minimum or close to zero in the solution’s attempt to minimize the overall error rate. Each hidden layer in a way represents an improved set of features until we arrive at the final output layer which represents the final solution or algorithm. But the final solution or algorithm represents a composite picture of input weights for each variable within each hidden layer. The actual final equation within the overall algorithm is an equation of hidden layers. Given this scenario, it is easy to understand the complexity in trying to have a good understanding of the importance and impact of each variable on the desired modeled behavior. Now let’s look at the second measure of model performance.
In much of the literature on AI and improved model performance, the discussion focuses on error rates between the predicted output and observed output. Yet, in the world of consumer behavior at both the prospect and customer level, the use of decile or gains tables represents a key business tool in assessing overall model performance. The chart (Lorenz curve) below depicts the modeled output of a response model when the model is applied against a validation sample.
The y-axis represents the actual or observed response rate while the x-axis represents deciles where records are sorted by descending model score into 10 groups with decile 1 having the highest predicted response scores and decile 10 having the lowest predicted response scores. Model performance is then determined by the steepness or slope of the blue line. A model with no performance yields the flat red line as seen above which means that the model is delivering a random solution. Our objective in building models is to maximize the steepness of this blue line presuming that we have dealt with any potential overfitting issues.
In evaluating AI as a modelling option, we want to observe whether or not AI can actually improve the slope of the model over and above traditional modelling techniques. We did some work in this area by looking at many different models where we compared traditional techniques to deep learning or AI models. Most of our research indicated virtually no improvement in model performance as depicted by the steepness of the slope which was the same for both traditional models versus deep learning. Rather than list all the models we utilized in this research, I have included one example (marketing response model) below which is representative of much our findings in this area.
In the above example, the purple and green light virtually have identical slopes indicating virtually no difference between AI (deep learning) and logistic regression in predicting marketing response.
Yet, our findings were not 100% conclusive as seen by the below chart which depicts the model’s ability to predict property claim loss.
In this example, the neural net or deep learning model clearly outperforms the traditional stepwise logistic and would deliver better business value in being able to make better decisions regarding the pricing of property policies. But why did we see improvement. First, we had access to much larger volumes of data than our typical modelling situation. Second, we also believed there was less random error in predicting a property claim loss than in trying to predict a human being’s likelihood to respond. The element of a strong signal to noise ratio is another critical factor in deep learning being able to deliver superior model performance.
Nevertheless, the ability to explain the neural net algorithm still looms as a challenge for many businesses in being able to embrace these solutions as part of their business processes. However, neural nets and deep learning are here to stay despite the explainability challenges. But it is incumbent on the research community to devote more efforts on how data science practitioners can explain these type of solutions to their stakeholders. Much research is already being done in this area alongside efforts to utilize AI in a non big data environment. As always, the data science community looks forward to the results of these efforts.
About the Author
Richard Boire, B.Sc. (McGill), MBA (Concordia), is the founding partner at the Boire Filler Group, a nationally recognized expert in the database and data analytical industry and is among the top experts in this field in Canada, with unique expertise and background experience. Boire Filler Group was recently acquired by Environics Analytics where he served as senior vice-president. Richard recently launched Boire Analytics.
Mr. Boire’s mathematical and technical expertise is complimented by experience working at and with clients who work in the B2C and B2B environments. He previously worked at and with Clients such as: Reader’s Digest, American Express, Loyalty Group, and Petro-Canada among many to establish his top notch credentials.
After 12 years of progressive data mining and analytical experience, Mr. Boire established his own consulting company – Boire Direct Marketing in 1994. He writes numerous articles for industry publications, is a well-sought after speaker on data mining, and works closely with the Canadian Marketing Association on a number of areas including Education and the Database and Technology councils.