Machine Learning Times
Machine Learning Times
Visualizing Decision Trees with Pybaobabdt
 Originally published in Towards Data Science, Dec 14, 2021....
Correspondence Analysis: From Raw Data to Visualizing Relationships
 Isn’t it satisfying to find a tool that makes...
Podcast: Four Things the Machine Learning Industry Must Learn from Self-Driving Cars
    Welcome to the next episode of The Machine...
A Refresher on Continuous Versus Discrete Input Variables
 How many times have I heard that the most...

9 years ago
Thriving in a Big Data World


Three recent books offer managers expert perspectives on the increasing power and importance of analytics.

U.S. President Barack Obama’s 2012 campaign owed much of its success to quantitative analysis, with staffers able to identify, for example, which people would likely be swayed to vote for him after receiving a flyer, phone call or home visit, thus tipping the balance in the fight for crucial swing states. Wal-Mart has learned that before a hurricane strikes an area, not only does the demand for flashlights increase but also that for Pop-Tarts. Even the world of sports has become enamored of quant power, as famously popularized in the best-selling book Moneyball. But what exactly are these new quantitative techniques, and how can businesses best deploy them to their advantage?

Executives can find some answers to such questions in three recent books: Big Data: A Revolution That Will Transform How We Live, Work, and Think (Houghton Mifflin Harcourt, 2013) by Viktor Mayer-Schönberger, a professor of Internet governance and regulation at Oxford University, and Kenneth Cukier, data editor of The Economist; Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (John Wiley & Sons, 2013) by Eric Siegel, founder of Predictive Analytics World and a former assistant professor at Columbia University; and Keeping Up with the Quants: Your Guide to Understanding and Using Analytics(Harvard Business School Publishing, 2013) by Thomas H. Davenport, the President’s Distinguished Professor of Information Technology & Management at Babson College, and Jinho Kim, a professor of business and statistics at the Korea National Defense University. The first two books primarily focus on the power of big data and quantitative analytics, and the third advises how companies can tap into that power. Together, the combination of description and advice provide a good primer for executives seeking a better understanding of this emerging era of sophisticated number-crunching.

Understanding “Datafication”

According to Eric Siegel’s estimate, we are adding 2.5 quintillion bytes of data every single day. Words have become data; the physical states of our machinery have become data; our physical locations have become data; and even our interactions with each other have become data. “Data can frequently be collected passively, without much effort or even awareness on the part of those being recorded. And because the cost of storage has fallen so much, it is easier to justify keeping data than discarding it,” observe Viktor Mayer-Schönberger and Kenneth Cukier. The authors refer to this new phenomenon as the “datafication” of everything. Indeed, we are awash in information, but what does it all mean?

Certainly, companies that have become adept at selective data-crunching have uncovered all kinds of valuable correlations. Some are not entirely surprising. For instance, Siegel reports, people who buy small felt pads that adhere to the bottom of chair legs (to protect the floor) are more likely than others to be good credit risks. Other results are quite unexpected. Smokers in some workplaces tend to suffer less from carpal tunnel syndrome (perhaps because they generally take more work breaks), and vegetarians tend to miss fewer flights (maybe because they pre-order a special meal and are thus more committed to making their flight).

To gain such insights, however, executives need to adopt a mind-set completely different from the “small data” perspective of the past. In their engaging and informative book, Mayer-Schönberger and Cukier explain three new imperatives:

1. Use all the data, not just a sample. In the past, businesses did not have the economical means to capture, store and analyze all the data from their operations, so they had to settle for a sample of it. But now a company like Amazon can economically capture and store data from every single customer transaction.

2. Accept messiness. Inaccuracies in measurements are less harmful than they once were because they can often be smoothed over by the sheer quantity of data. In the authors’ words, “more trumps better.”

3. Embrace correlation. For many purposes, correlation is sufficient and people don’t need to know causality. Mayer-Schönberger and Cukier report that one analysis of used cars found that orange vehicles are about half as likely as others to have defects. That correlation between orange and defects may be valuable information even if the underlying cause is unknown. (Perhaps owners of orange cars are more likely to be passionate about their vehicles and thus take better care of them?)

IMG ALT Another important lesson of big data is that many applications can arise far from the purposes for which the data was collected. Take, for instance, location information that cellphone companies collect so that they can efficiently route calls. The same data can be used to identify where people tend to gather on weekend nights — information that could be useful in predicting real estate prices. Indeed, Mayer-Schönberger and Cukier contend that “Much of the value of data will come from its secondary uses, its option value, not simply its primary use.” In fact, the authors predict, “Every single dataset is likely to have some intrinsic, hidden, not-yet-unearthed value, and the race is on to discover and capture all of it.” That said, many potential applications could skim along the edges of what might be ethical, moral or even legal. A person’s social network, for example, might be used to determine his or her credit risk. If a person, for example, has a close circle of friends who are credit deadbeats, then, applying a “birds of a feather” assumption, might he or she not also be more likely to default on a loan?

Quantifying the likelihood that a particular person will do something — whether it is defaulting on a loan, upgrading to a higher level of cable service or seeking another job — is at the heart of Siegel’s Predictive Analytics.IMG ALT The author describes how quantitative techniques can be deployed to find valuable patterns in data, enabling companies to predict the likely behavior of customers, employees and others. FedEx can reportedly identify (with 65% to 90% accuracy) which customers are likely to defect to a competitor. Citizens Bank was able to curtail losses from check fraud by 20% thanks to more sophisticated quantitative analyses. And Hewlett-Packard has relied on predictive analytics to identify which employees are most likely to leave, allowing managers time to implement measures to retain those individuals or prepare for their departures. (Interestingly, in one HP division, employees who had received a promotion were actually more likely to leave unless they had also received a significant salary increase.)

Of course, each human being is unique, and the possibility of “black swan” events must never be discounted. But, as a whole, people do tend to be creatures of habit, and that regularity enables companies to predict the likelihood of certain behaviors. Moreover, Siegel makes a clear distinction between forecasting and predictive analytics: “Whereas forecasting estimates the total number of ice cream cones to be purchased next month in Nebraska, predictive technology tells you which individual Nebraskans are most likely to be seen with cone in hand.”

IMG ALT A bit talky at times (one long chapter focuses solely on IBM’s Watson computer and its success on “Jeopardy!”), Predictive Analytics nevertheless contains enough pithy insights to make the book at least worth skimming. One of those insights is what Siegel calls “The Prediction Effect.” To wit: Even a modest increase in the accuracy of predictions can often result in substantial savings. For example, according to Siegel, an insurance business has been able to save almost $50 million a year by using predictive analytics to shave just half a percentage point off its loss ratio (the total amount paid in claims divided by the total amount collected in premiums).

Harnessing Quant Power

Understanding that predictive analytics can save a company $50 million a year is one thing; tapping into that power is quite another. Indeed, executives must go far beyond the “gee whiz” fascination with big data and quantitative techniques to learn how their businesses can profit best from this new era of computational sophistication. For that journey, Keeping Up with the Quants is a basic guide. As its title suggests, the book is geared toward executives who are not themselves analytics experts but whose jobs increasingly require them to understand and deal with those who have such expertise, both inside and outside their organizations.

In their book, authors Davenport and Kim provide a logical approach for thinking more like a quantitative analyst. The framework consists of three major steps, which the authors describe as “framing the problem,” “solving the problem” and “communicating and acting on results.”

1. Framing the problem. This step might at first seem simple and straightforward, but it is often neither. Take, for example, the company that wants to learn the success rate of its direct mail campaign, so it asks, How many people will buy the product after receiving the mailing? Instead, the question it should ask is this: How many people who wouldn’t have bought the product will now buy it after receiving the mailing? (That is, in this instance causality is important. The company wants to know how effective the mailing is.)

This new era of computational prowess does not obviate the need for intuition and creativity, and that is especially true in the important first step of framing a problem.

In framing a problem, executives must involve all the stakeholders, not just to get their perspective but also to get a sense of whether they will buy into the results after the analysis is complete. A key question to ask is: What actions will be taken based on the analysis? Davenport and Kim recount the story of a restaurant chain that wanted to investigate the profitability of each item on its menu. When the executives were asked what they intended to do with the results of that analysis, one replied that they should consider whether to remove the unprofitable items, but another executive countered that the company had not removed a single item from its menu over the past 20 years. After further discussion, the executives decided to focus the study on pricing and not profitability.

2. Solving the problem. This step consists of modeling, data collection and data analysis. Here the authors emphasize how valuable a new source of information can be — and that more and better data will often trump a better algorithm for analyzing that information. Case in point: the insurance company Progressive, which gained a competitive edge over rivals by using FICO credit scores and other data to assess the likelihood that a particular person would be involved in a car accident in the future. And, thanks to tools like Hadoop and MapReduce, companies can consider not only structured data (such as a person’s age and income) but also unstructured information (such as text and images).

3. Communicating and acting on results. Many quantitative analysts make the mistake of assuming that “the results speak for themselves.” Well, they don’t. “The clearer the results presentation, the more likely that the quantitative analysis will lead to decisions and actions — which are, after all, usually the point of doing the analysis in the first place,” write Davenport and Kim. And sometimes it’s not enough just to be clear; the results also have to be presented in an engaging, user-friendly format. For example, for Delta Air Lines, Deloitte Consulting developed an iPad app that enables executives to quickly query the airline’s operations. Different colors indicate the performance at particular airports, and touching an airport on a map brings up additional data about the location’s operations. Executives can then drill further down to obtain granular information on staffing, customer service levels and problems.

An important point made in Keeping Up with the Quants is that this new era of computational prowess does not obviate the need for intuition and creativity, and that is especially true in the important first step of framing a problem. “Half the battle in problem solving and decision making is framing the problem or decision in a creative way so that it can be addressed effectively,” assert Davenport and Kim. For example, a clever researcher — Junxiang Lu — figured out a way to predict customer lifetime value in the telecom industry by creatively reframing the problem in terms of “survival analysis,” a biological statistical technique used to determine the proportion of a population of living organisms that will survive past a certain time.

Unresolved Issues

To be sure, the use of big data and predictive analytics raises a number of difficult issues. One very hot topic is privacy concerns. In 2012, Target ignited a media firestorm after consumers learned that the company was using its quantitative methods to predict which customers were pregnant. (Siegel discusses the controversy in Predictive Analytics.) And, as is the case with many new tools, the technology often outpaces the laws and regulations governing its deployment. According to Mayer-Schönberger and Cukier, “Society has built up a body of rules to protect personal information. But in an age of big data, those laws constitute a largely useless Maginot Line.”

Another prickly issue is figuring out what all these data are worth in monetary terms. In the past, companies have struggled with trying to assess the value of their brands, patents, trade secrets and other intellectual property. Data should now be part of that discussion — but exactly what’s the value of all the “likes” that Facebook has amassed? And what about the value of all that Google search information? Moreover, do consumers have any right to some of that value, especially if information is used to reap profits in ways other than the purpose it was originally collected for?

Such thorny issues aside, one thing is certain: The emerging era of big data and quantitative analytics has only just begun. “Seeing the world as information, as oceans of data that can be explored at ever greater breadth and depth, offers us a perspective on reality that we did not have before,” write Mayer-Schönberger and Cukier. Those companies that grasp this new reality will likely outperform those that don’t — and that’s a view of the future business landscape that predictive analytics itself might well have foreseen.

By: Alden M. Hayashi, freelance business writer and editor; former senior editor of MIT Sloan Management Review and Harvard Business Review.
Originally published at membership required)

Leave a Reply