In the midst of a recent engagement an executive suddenly asked, “Are we using Machine Learning?”. This caught us off-guard; working in the field for many years, we use the “learning sciences” virtually every day to solve hard problems. Machine Learning (ML), Data Science (DS) and Artificial Intelligence (AI) are exciting and very powerful; still, we’re happy to use conventional techniques whenever they’re the best choice to solve the client’s challenge. But I can understand where the question came from, given the hype surrounding ML, DS, and AI. (Sometimes, interest in inner details exceeds interest in more important outer results. I know the feeling; as an aviation buff; I can be more attune to the make, model, and performance of an aircraft than in whether or not I get to my destination on time!) From a business perspective though, ML is only one tool in a big tool box that we can use to help agencies and businesses leverage data to improve their day-to-day decisions.
So, what is Machine Learning (ML)?
Let’s go back to a time before we had computers and had to makes sense of data through a lot of algebra. To make the calculations practical, the relationship between the known data and what we wanted to predict was assumed to be linear. (Hence the term “Linear Regression”.)
A linear equation can be as simple as y = mx + b, where y is the thing we want to predict, m is the slope of the line (or plane in higher dimensions), x is our input data or independent attribute(s), and b is the intercept, or starting position, of the line. Using linear algebra, m and b can be calculated to minimize the (squared) error across all of our data. See Figure 1 – Example Linear Regression, for an example with two input dimensions. The blue dots are our data and the plane represents our prediction. Our computation positions the plane so the sum of squared errors between it and the blue dots is minimized across all of the data.
Now let’s fast-forward to the present when we have powerful computers with vast memory storage and extremely fast computation. We also have exponentially more data at our disposal. The key strength of Machine Learning (ML) is that the computer discovers the relationship between the data and the target, y, rather than forcing it to be linear (See Figure 2 – Example ML Structure). The predictions from an ML model are usually more accurate and can account for subtleties in the data that were overly generalized by the linear assumption. In a nutshell, we use the power of the computer to search a vast set of possibilities to find a more accurate relationship between our data and the thing we are trying to predict or better understand.
Naturally, there are some dangers that come with the use of ML. Most importantly, the structure can be over-fit to the historical data we used to train the model. When new and slightly different data is fed to the prediction model, an overly fit structure may produce poor results. Thoughtful application and rigorous testing of an ML model are required to protect against this danger.
The key benefit is that the relationships found are not easily (or ever) uncovered manually, or through traditional analysis techniques. Because the relationships are often complex, they can be difficult for human experts to interpret, and may appear to run counter to intuition and the status quo. Yet best practices in testing can show the models to work well in new, related situations, which is the most important criterion.
Some examples of business problems that are currently seeing the valuable application of ML are:
…though the applications are vast — many of which are described in detail in Elder Research case studies.
Although used interchangeably outside of our field, ML and Artificial Intelligence (AI) are not the same and their origins are quite different. AI is generally thought of as tasks that can be accomplished using a computer that were previously assumed to be exclusively in the human domain. Interestingly, many tasks that were classified as “AI” in the 1980’s, are routine now, and are no longer considered “AI”. The concepts of ML actually predate the term AI. ML refers to a set of techniques used to inductively discover relationships in data and was created by pioneers in statistical analysis and optimization, whereas AI was born out of the field of computer science. So AI needs reasons, hypotheses, or rules whereas ML needs data, examples, or labeled cases – that is, known outcomes, such as “these actions lead to higher profit and these other actions lead to lower profit”.
ML is best understood through its application, and by comparison to traditional methods. Fraud mitigation is a great example application that shows how ML is enabling organizations to perform important work in new ways.
Fraud Mitigation without ML
To attack fraud problems, a human analyst can gather intelligence and evidence from two main sources: 1) domain experts, who have worked in the field for many years, and 2) transaction data directly. AI relies primarily on extracting and automating the former — human expertise, and ML relies primarily on sifting and modeling the latter – historical data. Both can be valid ways to find patterns separating the rare examples of known, fraudulent transactions from the vastly greater numbers of (presumed) valid transactions.
Without ML, the fraud schemes must be identified explicitly, so that, for instance, a rules engine can be set up that passes or fails transactions as they happen, in real-time. Rules engines are easy to get started with, yet hard to build upon. They are “brittle” in that they are limited only to patterns of fraud that human analysts have identified. They are also difficult to maintain; analysts and other stakeholders must meet and agree on rules by consensus, based on evidence. Ultimately, fraudsters usually find gaps in the rules and figure out new schemes to exploit those gaps.
In this scenario, after well-known fraud scheme patterns are coded up, analysts can often spend thousands of hours reviewing hundreds of remaining cases trying to find a few more patterns. Unfortunately, our brains are not wired to do this well and therefore many patterns are missed altogether. And the problem becomes vastly bigger when there are millions of records and cases, and hundreds of possible variables or features to consider. Most humans can only see patterns within one or two attributes at a time. This leads to simple rules that generate lots of false alarms. More subtly, such goals waste analyst time on low-value repetitive tasks with a high error rate instead of using the advantage our brains provide to develop questions or hypothesis about how unusually discovered patterns may be connected to crime.
Fraud Mitigation with ML
With ML, a computer algorithm is built (or in the language of data science, trained) on historical data of fraudulent vs. non-fraudulent transactions. The key here is that the algorithm has a reliable set of known fraud examples and non-fraud examples in the data. A case that has been investigated and determined to be fraud is an easy example of known fraud. (What many sometimes miss is that a case that has not been investigated cannot necessarily be labeled non-fraud. More accurately, it would be labeled as unknown.) A key point is that for an ML algorithm to discover the relationship between the data and outcomes it needs accurately labeled outcomes.
To supplement human definition of a set of rules by consensus, a mathematical model is constructed based on all of the available data. The primary difference is that the ML algorithm is able to investigate millions of possibilities that would be impossible for a human or collection of humans to consider. But then we have to be open to the new findings that the algorithm surfaces!
An additional powerful benefit emerges from the model scoring all the cases: a prioritized list of likely fraud cases. Organizations can use this new information to proactively plan resource allocation to optimize case investigations. Executives may learn that there is a mismatch between their allocation of resources in a particular geographic region and the proportion of cases within that region.
As new and different fraud schemes emerge, the historical data grows, and the ML models are updated, they can discover new and anomalous patterns in the data that differ from normal transactions. Therefore, when the ML system has a feedback mechanism to incorporate new known outcomes to the data / model it can adapt to a dynamic world in a way that a non-ML system can’t. Even an AI model cannot do this!
In the machine learning scenario, computers sort millions of cases quickly to find interesting patterns that are related to crime. ML methods can consider many attributes at a time leading to more robust pattern discovery. Now analyst time is spent on the very high-value tasks of developing hypothesis about how the new and unusual patterns may be connected to crime and ultimately helping the ML algorithms to improve.
If you have read this far, then perhaps you would agree that a better question than “are we using machine learning?” is “are we using our valuable human capital and our vast computing capital in a way that best takes advantage of their relative strengths?”
About the Author
Gerhard enjoys predictive analytics and data mining, especially related to the areas of Fraud Detection, Financial Risk Management, and Health Care outcomes using various analytical methods, working with people, leading change, and timely management of complex projects. His work experience spans both private and government sectors including international experience.
Gerhard teaches at Georgetown University as an adjunct faculty member in the Math and Statistics Masters degree program. He also is an instructor for the three-day SAS Business Knowledge Series course “Data Mining: Principles and Best Practices” and been invited to teach at international conferences. Gerhard currently serves on the Institute for Advanced Analytics Advisory Board and George Washington University Masters in Science in Business Analytics Advisory Board.
Gerhard has extensive industry experience in government oversight, financial, construction and telecommunication industries both as a business owner and executive. He is a recognized expert in three dimensional roadway modeling and automated machine guidance using Global Positioning Satellite systems and has presented to various agencies including the Transportation Research Board. In his role as Chief Technology Officer and VP of Engineering for Pulse Communications, Gerhard directed the design of early digital subscriber line systems (internet over the telephone line) and was a member of the international forum defining the standards for DSL implementation. Prior to Pulse Communications he was Director of Operations for Bell Northern Research leading the design and delivery of hardware and software for large scale telephony switching and fiber optic systems.