May 27th 2014
March 5th 2014
This article will make you feel better. And you do need to feel better, if you are one of the many of us who practice analytics—or who must consume and rely on analytics—and find ourselves carrying tension in our shoulders or sometimes losing sleep.
The fear stems from a well-known warning of tragic mishap: "If you torture the data long enough, it will confess," as stated by University of Chicago economics professor Ronald Coase. There is a general sense that math could be wrong and that analytics is an art.
As John Elder of Elder Research put it, "It's always possible to get lucky (or unlucky). When you mine data and find something, is it real, or chance?" How can we confidently trust what a computer claims to have learned? How do we avert the dire declension, "Lies, damned lies, and statistics"?
There is a simple, elegant solution from Elder—but first, let me further magnify your fear: Even the very simplest predictive model risks utter failure. Mistaken, misleading conclusions are in fact terribly easy to come by.
A conclusion drawn about one single variable—even without the use of a common multivariate model (such as log-linear regression)—can go awry. In fact, one of the more famous such analytical insights, "an orange used car is least likely to be a lemon," has recently been debunked by Elder and his colleague Ben Bullard at Elder Research, Inc.
Big data, with all its pomp and circumstance, can actually mean big risk. More data can present more opportunities to inadvertently discover untrue patterns that appear misleadingly strong within your dataset—but, in fact, do not hold true in general. To be more specific, "bigger" data could mean longer data (a longer list of examples, which generally helps avert spurious conclusions), but also could mean wider data (more columns—more variables/factors per example). So, even if you are only considering one variable at a time, such as the color of each car, you are more likely to come across one that just happens to look predictive in your data by sheer chance alone. This peril that arises when searching across many variables has been dubbed by John Elder vast search.
Dr. Elder puts it this way: "Modern predictive analytic algorithms are hypothesis-generating machines, capable of testing millions of 'ideas.' The best result stumbled upon in its vast search has a much greater chance of being spurious… The problem is so widespread that it is the chief reason for a crisis in experimental science, where most journal results have been discovered to resist replication; that is, to be wrong!"
A few years ago, Berkeley Professor David Leinweber made waves with his discovery that the annual closing price of the S&P 500 stock market index could have been predicted from 1983 to 1993 by the rate of butter production in Bangladesh. Bangladesh's butter production mathematically explains 75 percent of the index's variation over that time. Urgent calls were placed to the Credibility Police, since it certainly cannot be believed that Bangladesh's butter is closely tied to the U.S. stock market. If its butter production boomed or went bust in any given year, how could it be reasonable to assume that U.S. stocks would follow suit? This stirred up the greatest fears of PA skeptics, and vindicated nonbelievers. Eyebrows were raised so vigorously, they catapulted Professor Leinweber onto national television.
Crackpot or legitimate educator? …
February 2nd 2014
I was honored to have my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die, covered in this article in GARP.org. Here is an excerpt from the article.
Predictive Analytics' New Wave
By Katherine Heires, Risk News (published by the Global Association of Risk Professionals)
Fascination with the future is part of human nature. In a commercial or financial context, accurate predictions have major business implications; in finance, many quantitative processes and innovations are designed to deliver such insights for competitive advantage.
One doesn't need a crystal ball — or sophisticated software, for that matter — to measure the demand for the latest in predictive-analytics tools or to understand how and why it is accelerating. Predictive analytics can be seen everywhere from the micro level, as in credit scoring on loan applications, a technique banks have been employing since the 1940s; to the macro analyses that regulatory bodies are developing to assess systemic risks. Financial and nonfinancial enterprises alike aspire to mine Big Data for patterns and insights that can foretell, and create opportunities out of, market trends or customer behaviors, while also taking emerging risks into account.
An ever-widening array of entities — in advertising, health care, insurance, energy, and even government agencies like the Securities and Exchange Commission — have embraced predictive analytics in some form…
A former Columbia University professor of computer science, Eric Siegel is a consultant and author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die, one of a number of titles indicating a growing level of popular and business interest in the field. Founder of the Predictive Analytics World conferences, he notes that in 2009 he hosted two conferences, and 10 are on the calendar for 2014. "The two new predictive analytics verticals we will focus on this year will be manufacturing and health care," he says.
January 28th 2014
Personalization Is Back: How to Drive Influence by Crunching Numbers
Standard predictive analytics does not directly address what is the greatest challenge faced by marketing and healthcare: Across large numbers of individuals, deciding who to treat in a certain way.
Yes, you heard me correctly. Predictive analytics still needs a certain tweak before it’s designed to optimize organizational activities.
Let’s take a step back. The world is run by organizations, which serve us as individuals by deciding, for each one, the best action to take, i.e., the proper outgoing treatment:
TREATMENTS: Marketing outreach, sales outreach, personalized pricing, political campaign outreach, medication, surgery, etc.
That is, organizations strive to analytically decide whom to investigate, incarcerate, set up on a date, or medicate.
Organizations will be more successful, saving more lives or making more profit—and the world will be a better place—if treatment decisions are driven to maximize the probability of positive outcomes, such as consumer actions or healthcare patient results:
OUTCOMES: Purchase, stay (retained), donate, vote, live/thrive, etc.
In fact, the title of my book itself includes a list of such outcomes: Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.
But this book title—and predictive analytics as a field in general—may lead you astray by implying the best way to improve the probability of these actions (or, alternatively, the probability of averting them, in the case of the latter two, lie and die) is to predict them. However, predicting an outcome does not directly help an organization drive that outcome. Instead, treatment decisions are optimized when organizations predict something completely different from outcome or behavior:
WHAT TO PREDICT: Whether a certain treatment will result in the outcome.
January 13th 2014
I was honored to have my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die reviewed by Richard Boire (of the Boire Filler Group) in the Journal of Marketing Analytics. Here is an excerpt from the review.
This book is really the first book on data mining or predictive analytics that attempts to communicate the impact of predictive analytics to our society at large. Historically, the rationale for not reaching out to the general audience was that data mining and predictive analytics were specialized areas of expertise that would only be of interest to its practitioners and academics. There was no real sense of its tremendous significance within our everyday lives and more importantly, the benefits that were conferred by this discipline. The knowledge/information revolution has changed the paradigm and how we view this new discipline. This book does an excellent job in reinforcing the growing impact of this discipline as the author, Eric Siegel, in what is often referred to as a very dry topic, transforms it into a discipline with wide appeal and interest among all sectors of society.
Examples abound throughout the book in all sectors as the author explores the impact of predictive analytics on everyday facets all of us face during the course of our normal day…
This book is a must read for the normal lay person presuming there is interest in how society can best use information in our evergrowing Big Data world. At the same time, the seasoned practitioner will appreciate the real-world examples.
December 9th 2013
The Winning Formula to Being a Kaggle Data Scientist
Is there a formula to be a data science "guru"? If so, what does it include? Is the most significant factor education, experience or pure talent?
Software Advice, which researches and compares business intelligence software, tackled this question with a study to examine the top analysts within the world’s largest data scientist community, Kaggle.
Kaggle is the largest and leading host of predictive analytics competitions, offering companies the chance to tap into its community of more than 100,000 analysts in order to undertake various big data challenges. I wrote about Kaggle in Chapter 5 of my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. The study analyzed the top 100 Kaggle users (as of October 2013) to learn more about what these data superstars have in common.
Interesting study results:
Education: Over 80 percent of the top 100 performers have a Master’s degree or higher, and 35 percent have a Ph.D. The top 21 performers all have an M.S. or higher: 9 have Ph.D.s and several have multiple degrees (one member even has two Ph.D.s).
Background/Disciplines: Analysts come from a broad variety of educational backgrounds, with computer science and mathematics as the top areas of study. While most of the areas of study centrally involve quantitative skills, a few surprising programs surfaced, such as philosophy and law.
Where in the World: These “data wizards” hail from all over the globe, with 29 countries represented in the top 100 performers group. The United States has the most members in this list (30), followed by Russia (nine) and India (six).
Sticktoitiveness: The number of contests entered also correlates with a higher chance of winning competitions and becoming a member of the top Kaggle prize-winners.
The Prize Winning Group
In the end, the study concludes that the skills necessary to be one of these elite Kaggle performers can be developed by growth in any one of multiple disciplines, with various levels of study. The name of the game is persistence and a high level of activity in the community.
December 2nd 2013
Announcing the inaugural PAW Healthcare
Attend Predictive Analytics World for Healthcare, coming to Boston, October 6-7, 2014, and witness today's rapidly emerging movement to fortify healthcare with big data's biggest win: the power to predict. The premier cross-vendor networking event, this conference assembles the industry's leaders to deliver case studies and expertise, revealing how predictive analytics:
- Improves patient care
- Reduces costs
- Brings greater efficiencies to the healthcare industry
Predictive analytics addresses today's pressing challenges in healthcare effectiveness and economics by improving operations across the spectrum of healthcare functions:
Personalized medicine. Per-patient prediction and analytically enhanced diagnosis drives individual clinical treatment decisions
Insurance. Predictively guided decisioning combats risk and renders insurance more equitable and profitable
Hospital administration. Analytics detects and recoups loss due to fraud and waste
Healthcare marketing. From medical suppliers to healthcare screening service providers, the performance of industry enterprises hinges on analytically targeted marketing
Drug development. Analytics advances pharmaceutical engineering, testing, and other processes
Much more. Other applications include predicting per-patient disease progression, mortality risk, availability of clinical trial participants, consumer prescription adherence, and more.
Who should come? PAW Healthcare provides unique learning and networking opportunities for physicians, medical researchers, administrators, marketers, and analytics professionals from:
- Major medical centers
- Information system companies
- Pharmaceutical organizations
- Medical device manufacturers
- Medical insurance providers
- Dental insurance providers
- Clinical laboratories
November 18th 2013
I was honored to have my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die reviewed by Shakthi Poornima in Product Margins. Here is an excerpt from the review.
The power to predict who will click, buy, lie or die
Working in the field of Big Data means taking into consideration not hundreds or thousands, but millions, billions, or even bigger datapoints. And underneath all that data, lies unparalleled potential. Just imagine being able to predict one’s location up to multiple years beforehand by using GPS data (Microsoft), or being able to predict one’s risk of death in surgery (Riskprediction.org.uk). That’s what the book, “Predictive Analytics: The power to predict who will click, buy, lie or die” is about. It covers building applications in marketing, health care, fraud, finance, human resources., etc by a variety of parties — companies, banks, governments, even universities. Everyone has an interest in data.
…overall, the examples in the book are well-researched. What was interesting to me was the possibility of taking the predictions from various studies to building new products.s For example, Orbitz found that Mac users book more expensive hotels. “Orbitz applies this insight, altering displayed options according to your operating system” (p.81). A different study found that one’s inclination to buy online varies by the time of day: 8pm for retmail, late night for dating, 1pm for finance, and so on. Combining the insights from both studies can come in handy for marketing a new product, or starting an A/B test for that product. The potential for meshing various different types of data grows as different applications are developed around same or similar datasets, and as these datasets grow in size.
November 11th 2013
I was honored to have my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, reviewed by Patrick Tucker in The Furturist. Here is an excerpt from the review.
Expanding the Predictable Universe
Data scientist Eric Siegel explains the brave, new, and surprising world of predictive analytics.
Whenever you go to a major merchandise retailer and pull items off the shelf, you create a little piece of information that the retailer stores in a database. As more people pull items off those shelves, the retailer has the opportunity to learn something about all of you, in real time, and can use that information to predict what you might be interested in buying next. With the emergence of extremely large databases and ever-better transaction records, the relationship between what we buy, where we go, and what we might do next is becoming ever more clear.
In his new book, Predictive Analytics, researcher Eric Siegel refers to this computerized semi-clairvoyance as “the prediction effect.” Siegel achieved some small notoriety in 2012, when New York Times writer Charles Duhigg interviewed him on a story about predictive analytics (PA). Siegel recalls that Duhigg “asked for interesting discoveries that had come from PA. I rattled off a few that included pregnancy prediction.” Siegel directed him to a video from one of the many PA conferences that Siegel runs.
The video was a keynote presentation by data scientist Andrew Pole of Target, discussing how Target used data from its massive baby-registry service to predict pregnancy through consumer habits. For instance, many women, upon discovering that they are pregnant, may put unscented skin lotion on their registries, since pregnancy can dry out skin and scented lotion can have a negative effect on a developing fetus. The switch to unscented baby lotion can serve as one of many predictors of pregnancy—an issue of keen interest to Target, since expectant mothers can become much more profitable customers.
The Target model, in the words of Siegel, “identified 30 percent more customers for Target to contact with pregnancy-oriented marketing material—a significant marketing success story.”
November 4th 2013
I was honored to have my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die reviewed by The Seattle Post-Intelligencer. Here is an excerpt from the review.
Review of Predictive Analytics in The Seattle Post-Intelligencer
Can computers learn? How can computers increase our predictive capacities? If you've always wondered about these questions, Predictive Analytics: The Power to Predict Who will Click, Buy, Lie, or Die is for you!
We seem to be obsessed with prediction. We'd love to predict and know what will happen in our future. We go to palm readers, read our horoscopes daily or weekly, and feast upon fortune cookies to get some idea, however, inaccurate, of what may happen to us in the future.
But is prediction of this sort accurate? Regardless, people are very interested in this type of prediction and will spend any money and effort to achieve it.
Most people don't really know what predictive analytics means and how anyone can be interested in such a mysterious discipline. But after reading Eric Siegel's book, readers will find this a mesmerizing and fascinating study. I know I did! And given my background in philosophy, I was entranced by the book.
Predictive analytics is intuitive, powerful, and awe-inspiring. A little bit of prediction can go a long way towards combatting financial risk, fortifying healthcare, conquering spam, toughing crime fighting, and boosting sales. It can even be used to predict when someone is going to die.