Economic disruption is a reality which has been a gradual development over the last several decades. Artificial intelligence (AI) has simply accelerated this process. Virtually every industry has been impacted by AI and certainly data science is no exception. Yet, we may also inquire how does machine learning fit within this overall discussion. The explosion of literature on these topics over the last several years is a testament to the popularity of both topics. As a practitioner of data science over the last thirty years, it is easy to become cynical on these topics especially when many of these articles convey the perception to the average lay reader that these are new phenomena.
But let’s discuss both these concepts from a data science practical perspective and I will initially address the broader concept of machine learning. Machine learning and its accompanying methodologies have been used for decades by practitioners. In my early days as a data scientist in the late eighties and early nineties, techniques such as multiple regression, logistic regression, and decision trees were used as tools in predicting both the buying and risk behavior of consumers. Data scientists using these techniques focused their effort in creating the right analytical file or input file into these algorithms. The data scientist then lets the machine determine the most optimal algorithm given the data inputs and any assumptions regarding the distribution of the data as well as statistical confidence levels regarding the input variables.
This is machine learning in action as it learns iteratively where the machine presents a solution and then examines its predicted output versus observed output to see if the solution can be improved. The solution is deemed to be optimal once the error between the predicted output and observed output is minimized given the technique being used, and most important of all having both a training sample and validation sample. With deep learning and AI, this concept of validation extends to three data sets (training, testing, and validation)
Artificial Intelligence (AI)
The use of neural nets and deep learning, which is the fundamental math behind AI, was also an option even several decades ago , but our inability to explain a given solution to the business people was a severe limitation as well as the ability to process extremely large volumes of data. Note in today’s environment, this mitigating factor of explainability is still an issue while the limitation of data processing has been eliminated through the adoption of parallel type data type processing techniques.
AI essentially burst onto the scene through much of the R&D work in image recognition. In this area of image recognition, accuracy rates were in the neighborhood of 40%-50% in the early 90’s. But as indicated in previous articles, this accuracy rate drastically improved to well over 90% in the mid-2000’s with the advent of our ability to process extremely large volumes of data, which is the “oil” of AI.
With these dramatic improvements, data scientists needed to consider these options more closely than perhaps in the past. Sparsity, in terms of the dependent variable or objective function having too few of the actual desired outcomes, was a limitation with smaller data sets in trying to develop predictive models. For example, fraud models represent one such scenario. Yet in today’s larger volume environments, AI can leverage the very large volumes of data to produce robust fraud models which would not have been the case in a small data environment. But let’s more fully explore these concepts of machine learning and AI as it relates to consumer/customer behavior which has been the area with the deepest history of application.
Rather than just accept AI as the optimum technique in predicting consumer behavior given its mathematical superiority, we have attempted to compare the performance of traditional machine learning models versus AI models. Performance is determined by examining how well the solution rank orders the observed desired behavior. On the x-axis, we have model-ranked deciles where decile 1 represents the top 10% of predicted names while the bottom decile (decile 10) represents the lowest 10% of predicted names. The y-axis represents the actual observed (not predicted) behavior which the model is trying to optimize. A machine learning solution becomes more optimal with a steeper descending line which depicts the strength of the model’s rank ordering capability. See below.
From the above example, one can observe that solution B is the superior one given its increased rank ordered capability.
In our research, we have examined between 20 to 30 models. These models encompass a wide variety of predictive behaviors such as retention, response, and risk and they will also vary in the number of records that are being used in each solution. Our outcome for each model scenario was to first determine the best traditional machine model and to then compare it to the best optimum AI solution. In trying to determine the best AI solution for a given model scenario, we looked at many solutions, whereby we altered the number of nodes alongside the number of hidden layers. We used stochastic gradient descent as our optimization technique. From the results of all these model scenarios, we have summarized these findings under 2 main sections (Big Data: over 1 million records) and Small Data: less than 1 million records).
Under the 1 million records, we observed that there was minimal improvement between traditional machine learning and AI. Both the purple line (AI) and the green line (traditional machine learning) have identical slopes in predicting consumer response behavior.
Over 1 million records, we observed that there was indeed additional performance lift by the AI model as demonstrated by the steeper slope (purple line) for a property claim risk model. Keep in mind that this was not universal as there were many scenarios with over 1 million records where tradition machine learning solutions and AI solutions yielded equal levels of performance.
One question to ask is why we observed this result for this model. Our current hypothesis is that AI needs both large volumes of records and a relatively large signal to noise ratio (i.e. large patterns with the data that seemingly correlate to the desired behavior). In our experience as data science practitioners, we have always found that there is much randomness when trying to explain consumer behavior or a relatively small signal to noise ratio. In this type of model, we are relying less on consumer behavior and more on the state or quality of the property in predicting claim loss which may indeed have less randomness and a stronger signal to noise ratio.
So, where does this leave us in terms of conclusions? For now, small data for consumer behavior can best utilize traditional machine learning methods given its equal performance with AI solutions. This is reinforced by our ability to more easily explain traditional machine learning solutions as opposed to AI solutions. Nevertheless, as volumes increase and with the data depicting stronger patterns with the objective function (stronger signal to noise), we do observe stronger performance with AI solutions. The challenge, though, in using such a solution, is our inability to easily explain its contents to the business stakeholders.
Although the above represents our research to date, we do recognize that much work is being done to provide mechanisms and approaches in explaining AI solutions. At the same time, academics are also investigating the potential of using AI to improve performance within a small data environment. The exceptional changes that have been occurring in our industry within the last several years would suggest that hard and fast conclusions about AI are transitory at best and could very well change over the next few years.
But the main point is that data science practitioners need to continue to maintain their discipline of objective evaluation of new technologies and approaches with regards to how their solution will be applied. As a discipline, we have always been cognizant of this need and it is incumbent that the experienced practitioner mentor the new bright wave of young data scientists. This is even more relevant given today’s relentless pace of advances in technology and automation.
About the Author
Richard Boire, B.Sc. (McGill), MBA (Concordia), is the founding partner at the Boire Filler Group, a nationally recognized expert in the database and data analytical industry and is among the top experts in this field in Canada, with unique expertise and background experience. Boire Filler Group was acquired by Environics Analytics where he served as senior vice-president. Richard recently launched Boire Analytics.
Mr. Boire’s mathematical and technical expertise is complimented by experience working at and with clients who work in the B2C and B2B environments. He previously worked at and with Clients such as: Reader’s Digest, American Express, Loyalty Group, and Petro-Canada among many to establish his top notch credentials.
After 12 years of progressive data mining and analytical experience, Mr. Boire established his own consulting company – Boire Direct Marketing in 1994. He writes numerous articles for industry publications, is a well-sought after speaker on data mining, and works closely with the Canadian Marketing Association on a number of areas including Education and the Database and Technology councils.