APQC Chair Carla O’Dell interviews Predictive Analytics Times Executive Editor and Predictive Analytics World Founder Eric Siegel about predictive analytics and machine learning’s application to process management. Dr. Siegel will be speaking at APQC’s Process & Performance Management Conference this October.
Note: This is a transcript of an audio interview.
Carla O’Dell: Welcome Eric! Let’s jump in and do some of the basics first. We know that big data and predictive analytics are essential for advanced technologies like machine learning and artificial intelligence. Can you talk about why and how these things fit together and give us some examples?
Eric Siegel: Sure. Machine learning, also known as predictive analytics, is when the computer learns from data how to make predictions. Data is really experience. Data is not a bunch of dry facts or boring ones and zeros. It’s a long list of things that have happened in the past. It encodes the collective or aggregate experience of your organization, from which it’s possible to apply these machine learning methods to learn from that data. What gets learned is what’s called a predictive model; it can make predictions on a case by case basis.
These technologies learn from the data to make decisions per individual consumer, corporate client, transaction—that level of detail—in order to drive operations more effectively, such as targeting marketing by predicting who is going to buy or who is going to cancel, by targeting fraud detection, by predicting which transaction will turn out to be fraudulent, by predicting financial credit risks, managing financial credit risks by predicting which individual credit card holder for example, will turn out to be a good or bad credit risk. All kinds of operations across sectors could be rendered more effective with these methods.
People these days are often referring to machine learning with the term “artificial intelligence.” The phrase is more generally used in a broader form, so it’s not exactly just a synonym. In fact, the word artificial intelligence really is quite subjective, and it presumes this concept of human intelligence which is purely subjective, so it’s not really a scientific concept. The term AI can really tend to be a little overblown. It has a terrible case of vagueness.
When you use terms like machine learning and predictive analytics, you are referring to something that is well defined and the specific methods and technology, the value for which has been well defined in its operational deployment to improve all these types of processes.
O’Dell: You make a great point. I was just thinking when Hollywood got ahold of AI, thirty years ago, you get these visions of some kind of a general AI—Hal in 2001: A Space Odyssey, Scarlett Johansson in Her, the robots in Ex Machina—that is emulating what humans do. Whereas in reality I’m just lucky if the GPS works on my phone.
Siegel: You forgot to mention The Terminator. The problem is that all of these fiction movies, which I love in their fictional context, are anthropomorphizing the machines. It’s a ghost story or fairy tale—just like cartoons that anthropomorphize animals. There’s a little psychological trigger and for those of us not deeply familiar with the core technology, it does lead to some misleading implications about what’s possible.
I have to say that what machine learning is capable of doing is amazing enough without any fairytales. The ability for a machine to generalize from historical examples, in these cases, people who did or didn’t buy, did or didn’t cancel, employees who did or did not perform or quit their job. For all of the outcomes and all of the cases from which you might want to learn, there may be plentiful data. There may be hundreds of thousands or millions of examples in your data that’s being analyzed that you’re learning from or rather the machine is learning from.
Even if there are millions of examples, it’s still a relatively small number compared to the total number of examples that could possibly exist. One way or another, it’s a finite amount for the machine to be able to learn to draw generalizations that will hold and will apply even in new circumstances that have never before been seen. So now you’re looking at a new prospective customer, and you’re saying, what are the chances that this customer will respond to this marketing offer? What are the chances that this transactor will commit fraud? Whatever the predictions are, the machine has the ability to successfully learn and it does that. It’s deriving these rules and patterns, known as predictive models, and they apply. This is the Holy Grail for rendering all mass scale operations: that we do all of the main things we do as organizations more effectively. It’s prediction that drives the decisions. We tip the odds in this numbers’ game that we’re playing in business—with all of these mass scale operations—in our favor. And prediction is the way to do that.
O’Dell: That’s right. The caveat that I think you mentioned before is that predicting the future is not an exact science because, as you just mentioned, you don’t have the entire universe of data sets, the complete data set of anything that could have ever happened or could ever happen at the future, so you can’t be perfect. There’s a quantum level of uncertainty even if you have a lot of the data. So it’s not perfect, but it’s much, much better than guessing.
Siegel: It’s much better than guessing. For example, you might find the pocket of customers who are three times more likely to cancel than average and that’s going to be very helpful for more effectively targeting whatever expensive customer retention offers are in your marketing.
There are certain machine learning applications where you can achieve a high accuracy. If you’re doing image processing, and you use deep learning – a type of machine learning – and it’s trying to identify, “Is this a picture of a cat or is it a picture of a dog?” It turns out that just like humans, computers can also do that very well if given the right training data and you apply the right machine learning methods.
There are also things out there that, regardless of how advanced the machine is, or how intelligent the human is, neither the machine nor the human can make accurate predictions about exactly which customer is going to cancel. But what you can do is draw the trends and assign probabilities. That’s the job of the predictive model: to assign probabilities of who is more or less likely to show whatever outcome or behavior you’re trying to predict.
So you determine what would be helpful to predict, and then you find out, “I can’t predict accurately, but wow, I can predict a lot better than guessing.” Probably, in many cases, better than any human could because of all of this data at the computer’s disposal. And now that’s going to translate into real value in terms of improving the effectiveness of mass scale operations.
O’Dell: You raised a number of interesting points there. One question is, “What would be helpful for a human being, in order to set the strategic direction—what is the purpose or intent? What would be helpful to predict, especially in marketing, retention, fraud, all of the things that it is actually quite good at?”
One of the things I’ve always found really helpful—and I’d like for you to speak to this—is the question of pattern recognition. Things that we don’t know that we don’t know. When you have large data sets, brute force computing can help with pattern recognition. How does that relate? Is that a learning question again like recognizing a task versus an escalator? Talk about that a little bit. That applies to performance and process improvement I think, this pattern recognition.
Siegel: In everything I’ve just said, you could use the word “pattern recognition” as a way to define machine learning. Again, whether you call it predictive modeling, predictive analytics, machine learning, whatever its doing is extracting or drawing generalizations from the learning data and the training data. Those generalizations, the things that it has learned, make up a predictive model. That model, after the learning process is done, now has encoded within it certain patterns or rules or formulas. It can then be applied, one individual at a time, to calculate the probability of an outcome, whether it’s going to be a clinical healthcare outcome or a marketing consumer buying behavior outcome, or whatever it is. What makes the difference is really what data you give the machine, as far as what it’s predicting.
What’s encoded in the model, depending on which method you use—you could always use the word “pattern.” One way or another, it’s a pattern. In many cases it’s exactly what you might think of as a pattern. For example: all customers who live in a rural area and are within this age range and have purchased from these types of home consumer catalog mailings enough times, so you have, “If the customer fulfills these requirements, then they are four times more likely to buy or to respond to this offer.”
The “if” part—the list of requirements—that’s the pattern. You could also think of it as a business rule. That’s a relatively simple form of predictive model, and it is oftentimes very effective, enough to generate a great deal of value.
Now, if you want to improve your predictive performance, if you want to tweak that up, sometimes you use more advanced analytical methods. Those models create predictive models that are more opaque. They’re more difficult to decipher—more like a black box. But the word “pattern,” I guess, can still apply even in that case.
O’Dell: That makes a lot of sense. A question that arises is, “What if I’m trying to make predictions around something where I don’t have mass amounts of data?” What kind of thoughts have you got on that end? Any rules of thumb? Do I have data that I don’t know about? I think there are a lot of digital breadcrumbs lying around in an organization. What are your thoughts on that?
Siegel: The type of machine learning we have already discussed is a little more specifically called “supervised machine learning,” and that word “supervised” refers to the idea that you have historical cases where you already know the outcome. You already know these are pictures of cats and these are pictures of dogs. You already know these customers did or did not cancel. You don’t have to predict; they’re just examples from which to learn. That’s what makes it “supervised.”
To apply that technology successfully, you do need “enough data.” How much is enough is not a strict science, but there are some general rules of thumb. You generally need at least several thousand examples of both positive and negative types of case you’re trying to label or predict, although in some cases, it may be just a few hundred.
All the hype these days about big data is because there is a lot of data. In many, many cases, there’s plenty of data. Of course, as far as attaining business value, the lowest hanging fruit is going to be where the operations are the most massive. If I’m doing mass marketing, if I’m evaluating a large number of transactions for possible fraud—in those cases, by definition, you have a large amount of data to analyze because you’ve been tracking and storing this transactional data. That’s the magic of big data: the bigness of the dataset is what makes it predictive. Data is predictive because it encodes experience from which the machine can learn how to predict.
O’Dell: Point of capture, because what you said is really important: a lot of the value—the low hanging fruit—is going to come from addressing an organization’s large-scale operations. That’s where the value lies.
Siegel: All of this data is being collected. What’s big—the reason that it’s exciting and it’s valuable—is that it’s predictive. You can learn from it. A computer can learn from it how to predict subsequent outcomes.
The data wasn’t amassed for the purpose of machine learning. It’s just a side effect of doing business as usual—a transactional residue that accumulated. And, lo and behold, it turns out that this stuff is really valuable because you can learn from it, you can derive these patterns or whatever the machine learning method is to help the very transactional processes that have been accumulating the data in the first place.
O’Dell: It’s a lovely virtuous cycle. Its like, “Wow. We have all of this data, and before it was just a problem because we had it all, but now it’s valuable because we can learn from it.” I guess the problem with that kind of data is that because it was not collected to be used as a basis for machine learning, it can be a little dirtier than one might wish, in the sense of not being as clean or organized as would be ideal. Is that often the case? Do people complain about their data?
Siegel: Absolutely. Let me clarify that. The biggest problem is that it may be disorganized, disparate, spread around, disjoined, siloed. That’s the real problem; the fact that it has noise or incorrect values is not necessarily an issue. In machine learning as a field, all of these different methods that have been developed, like the rules method I mentioned before—what are usually called “decision trees”… it turns out that most of these methods are basically robust to noise. Remember that we’re typically not talking about accurate predictions. We’re talking about predicting better than guessing. If we’re talking about image classification, you can imagine some noise on the image, but it’s still going to be possible to say, “That’s a cat; that’s a dog,” and indeed the machine can do that.
So it’s robust against noise. Some noise in the data is not going to necessarily decrease the predictive performance that much. The thing that’s important is that the level of, or frequency of, incorrect values in the data remains roughly consistent. After you’ve done the learning process—the analysis—and you’ve got the predictive model, with patterns that you’re now going to apply moving forward, the question becomes, “What do you provide as input?” I’m looking at this individual customer today. I’m trying to predict, “Are they going to cancel their subscription?” The information I have about that customer better not usually be much noisier than it was across the dataset that the machine used to learn and analyze in the first place.
So it’s consistency that’s important. By contrast, the sort of messiness and disparate, spread out nature of data is an engineering problem, because you need to pull it together in order to do the original analysis.
O’Dell: That’s a good distinction between a noise, which may be helpful in some ways for the learning process, statistically speaking, and the disparate sources of data. All of these different departments and silos have collected this data for their own reasons. So 1) you have to discover it if you’re looking for it, and 2) you have to convince everybody that they ought to let you use it for some third purpose or so on. Can you speak to that change management question for a minute, Eric?
Siegel: Now, you’re asking me to talk about humans instead of computers and that’s not really my field of expertise. (Just kidding.) There’s no cure-all for this. It takes a certain kind of socialization and convincing and the right PowerPoint. Don’t use bullets; use a nice graph that shows a forecasted process improvement.
One way or another, the carrot at the end of the stick for this stuff is tangible, is concrete, and just needs to be conveyed and communicated in a clear way so that you can work past the organizational inertia and all of those kinds of hurdles. It takes some patience, it takes some meetings, and it takes some doubling back. With some persistence, you’re riding a wave that’s inevitable, which is that this stuff is valuable. Learning from data to improve these large-scale operations is one of the last remaining points of differentiation when you go from one large corporation to another as far as improving effectiveness and streamlining operations.
O’Dell: Well said. We know that people see the value. In fact, one of the points you made in your interview with APQC a couple of years ago, was about how to make a value proposition in a business case.
We did another study recently on automation and advanced technologies among our members, hundreds of them, and 53 percent of them plan on investing in advanced analytics over the next 12 months. That says that either they heard that it works or they tested some of the early concepts themselves. What advice would you give to those organizations who are planning on investing in advanced analytics?
Siegel: First, let me address that 47 percent of organizations said they’re not planning on investing. They’re the ones who really need to be talked to. I don’t want to pressure anyone, but “everybody is doing it…” and for good reasons. I think the assessment of potential gains from improving a large-scale operation with a predictive model, with machine learning, is not very expensive. You can do back-of-the-napkin arithmetic; some very light initial data-based inquiries, to get a sense of what the potential is. That’s what I’d recommend for the other 47 percent.
For the 53 percent who are moving forward, the first thing is to decide what you’re doing with it. Don’t buy “machine learning” and definitely beware people that are mostly branding it with the term “AI.” Now, a lot of real hardcore techies who are extremely advanced at the underlying technology do like to call it AI. There are cases where people are using the term “AI” simply as a synonym for “machine learning.” It’s the sort of front-end packaging; branding of an analytics solution is sold as “an AI solution.” So I would definitely be careful to know more precisely what it is you’re buying.
More generally, don’t buy an analytics solution as a package that you’re going to buy off-the-shelf and plug in. Start the other way around. Look at your organization’s processes, and ask which are the ones that would benefit the most from some incremental or maybe a little bit more than incremental improvement to their effectiveness. Could you decrease the number of mass mails that go out and don’t lead to a sale? Could you reduce the amount of passive churn in your customer base or your workforce? How many fraudulent transactions is your organization essentially eating? Look at these things. Do the back of the napkin arithmetic or a little bit more on an Excel spreadsheet, just to get a sense of the costs.
If I can improve this much, if I can convince four percent of the outgoing customers to stay by spending these many additional dollars on a retention campaign that I know is being targeted effectively because it’s being guided by a predictive model, what does that translate to in revenue? So do some of that math, and figure out where the lowest hanging fruit is—the biggest opportunities for a win with this type of process improvement. That’s the carrot at the end of the stick.
Then work backwards. To achieve that outcome, what do I need to predict? And to predict this, what kind of data do I need? As I mentioned earlier, you need both positive and negative examples of the thing you’re trying to predict.
Now, you can start asking, “What software do I need to do it?” By the time you get to that phase, again, the process or project doesn’t necessarily need to be driven by a choice of solutions. It can be driven more by your human resources. Who are the analytics experts or external analytics firms you’re going to engage? Those individuals might help determine the choice of software based on what they’re already deeply familiar with or which one they would recommend given the particular data set that you have on hand.
So that would be my main first advice. Figure out what the value proposition is for your company and work backwards from that carrot at the end of the stick.
O’Dell: I like it. “What processes are going to benefit?” “What kinds of data do I need to be able to make those kinds of predictions?” and then the sub-questions, like “Where is it?” Finally, “What kind of expertise am I going to need?” or “Do I have to do this work?” and “Can they help me decide what kind of tools they’re going to need to support them?” That makes so much sense. I know that people who are coming to our Process and Performance Management Conference in October are going to be very interested in digging into the details on this and hearing some more of your thoughts on it.
Those of you who want to can learn more about Eric in the blog that Holly Lyke-Ho-Gland wrote for APQC on Building A Data Driven Culture and in our other article based on an interview with Eric, Business Cases And Avoiding The Pitfalls In Building A Data Driven Culture. The kind of thinking and work you’ve been doing recently is going to really help inform people, Eric, about the technology and how to apply it to the processes in their own organizations. Thank you for joining us.
Siegel: My pleasure.