Preface from Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die?
Inside the Secret World of the Data Crunchers Who Helped Obama Win
Why Predictive Modelers Should be Suspicious of Statistical Tests
SAS Salford Systems Pitney Bowes logo Rising Media Prediction Impact

Predictive Analytics Times Newsletter:

Happy New Year! If 2012 was all about Big Data, then 2013 will be the year to Maximize Big Data. PA Times is here to help you navigate the universe of Predictive Analytics and I can't think of a better way to maximize your data!

Eric Siegel, founder of Predictive Analytics World, has a 2013 goal of taking predictive analytics mainstream. His article offers five reasons his soon to be released book, on predictive analytics, will matter to experts and managers alike (preorder the book now and get free online training from author Eric Siegel). Additionally, we're excited to offer two articles focused on predictive model deployment and analytics stories from the front lines.

I know you'll enjoy this months' issue.

Kind regards,

Adam Kahn Adam Kahn
Publisher, Predictive Analytics Times

Not Subscribed? Sign Up For The Predictive Analytics Times Newsletter:

* required

Five Reasons Siegel's Predictive Analytics Book Matters to Experts 1
Training Program in Predictive Analytics – April in New York City 3
Three Ways to Get Your Predictive Models Deployed 4
Online Course: Predictive Analytics Applied – On demand any time 6
Analytics Success Stories From the Front Lines 8
ANALYTICS SOFTWARE: Salford Systems - Predictive Modeler 9
On-Demand Webinar - How-To: Effectively Realize Data Visualization 10
Predictive Analytics World

My new book—Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (coming next month from publisher John Wiley; foreword by Thomas H. Davenport)—is a revealing, accessible primer positioned to appeal well outside our industry.

But, if you're already an expert, here are five reasons to read it nonetheless:

  1. New detailed case studies
  2. Advanced topics (ensembles, uplift, etc.)
  3. An in-depth, startling treatise on privacy
  4. A compendium of 147 mini-case studies
  5. A means to share your field with your family, friends, or supervisor

I took on a rewarding challenge: sharing with layreaders at large a complete picture of predictive analytics, from the way in which it serves actionable value to organizations, down to the inner workings of predictive modeling. It's high time the predictive power of data—and how to analytically tap it—be demystified to reveal its intuitive yet awe–inspiring nature. As you and I know, learning from data to predict human behavior is not arcane. Rather, it is a broadly applicable no–brainer. If we spread the word with an appropriately friendly overview, we'll readily earn broad buy in, much to the benefit of our blossoming industry.

More than a string of anecdotes, this book delivers complete conceptual coverage of the field and places predictive analytics into a worldview perspective, defining its societal and even cultural context. Although packaged with catchy chapter titles and brand name stories, the conceptual outline is fundamental: 1) deployment, 2) civil liberties, 3) data, 4) core modeling, 5) ensembles, 6) IBM's Watson, and 7) uplift modeling (aka net lift or persuasion modeling).

Preorder the book now and get free online training from author Eric Siegel

Although this pop science, mathless introduction is readable by everyone, you as an expert will also benefit from reading it. While some endorsers proclaim it is "The Freakonomics of big data" that "reads like a thriller!", others speak to the practitioner:

"The definitive book of this industry has arrived. Dr. Siegel has achieved what few have even attempted: an accessible, captivating tome on predictive analytics that is a 'must read' for all interested in its potential–and peril."
—Mark Berry, VP, People Insights, ConAgra Foods

"Written in a lively language, full of great quotes, real-world examples, and case studies, it is a pleasure to read. The more technical audience will enjoy chapters on The Ensemble Effect and uplift modeling—both very hot trends. I highly recommend this book!"
—Gregory Piatetsky-Shapiro, Editor, KDnuggets; Founder, KDD Conferences

Here's a bit more on the five reasons this book matters to you:

  1. 1. New case studies. Find detailed stories you have never before heard from Hewlett-Packard, Chase, and the Obama Campaign. And did you know that John Elder once invested all his own personal money into a blackbox stock market system of his own design? That's the opening story of Chapter 1.
  1. 2. Advanced topics. Dive into ensemble models, crowdsourcing predictive analytics, uplift modeling (aka net lift or persuasion modeling), text analytics, and social media-based financial indicators. Plus, enjoy a fun yet fairly deep chapter on IBM's Jeopardy!-playing Watson computer.

  2. 3. Privacy and other civil liberty concerns. This ethical realm is so intractable and inconstant, no one is a true expert, in a sense. My treatise on it, a chapter entitled "With Power Comes Responsibility," addresses the questions: In what ways does predictive analytics fuel the contentious flames surrounding data privacy, raising its already-high stakes? What civil liberty concerns arise beyond privacy per se? What about predictive crime models that help decide who stays in prison?

continued >>>

Predictive Analytics World Toronto 2013
  1. 4. A cross-industry compendium of 147 cases. This comprehensive collection of mini-case studies serves to illustrate just how wide the field's reach extends. This color insert includes a table for each of the verticals: Personal Life, Marketing, Finance, Healthcare, Crime Fighting, Reliability Modeling, Government and Nonprofit, Human Language and Thought, and Human Resources. One PhD-level technical book reviewer complimented me by saying, "The tables alone are worth the price of admission."
  1. 5. Share your field of expertise. Would you like your colleagues and manager to better understand the value and potential of your work? Would you enjoy seeing your loved ones not only learn what the heck it is you do and why it's so important, but enjoy it and get excited? Give this book to your family, friends, and boss.

For more information about the book Predictive Analytics: Click here for a book overview, more endorsements, or to order the book.

 

However, the reality is that there is much more to the transition from cool model to actual deployment than a nice slide deck and paper accepted at one's favorite predictive analytics, data mining or big data conference. In these venues, the winning models are those that are "accurate" (more on that later) and have used creative analysis techniques to find the solution; we won't submit a paper when we only had to press the "go" button and have the data mining software give us a great solution!

For me, the gold standard is deployment. If the model gets used and improves the decisions an organization makes, I've succeeded. Three ways to increase the likelihood your models are deployed are:

1) Make sure the model stakeholder designs deployment into the project from the beginning

The model stakeholder is the individual, usually a manager, who is the advocate of predictive models to decision-makers. It is possible that a senior-level modeler can do this task, but that person must be able to switch hit: he or she must be able to speak the language of management and be able to talk technical detail to analytics.

continued >>>

Predictive Analytics World Toronto 2013

This may require more than one trusted person: the manager, who is responsible and makes the ultimate decisions about the models, and the lead modeler, who is responsible for the technical aspects of the model. It is more than "talking the talk" and knowing buzz-words in both realms; the person or persons must truly be "one of" both groups.

For those who have followed my blog posts and conference talks, you know I am a big advocate of the CRISP-DM process model (or equivalent methodologies, which seem to be endless). I've referred to CRISP-DM often, including on topics related to what data miners need to learn and Defining the Target Variable, just as two examples.

The stakeholder must not only understand the business of objectives of the model (Business Understanding in CRISP-DM), but must be present during discussions take place related to which models will be built. It is essential that reasonable expectations are put into place from the beginning, including what a good model will "look like" (accuracy and interpretability) and how the final model will be deployed.

I've seen far too many projects die or become inconsequential because either the wrong objectives were used in building the models, meaning the models were operationally useless, or because the deployment of the models was not

considered, meaning again that the models were operationally useless.

As an example, on one project, the model was assumed to be able to be run within a rules engine, but the models that were built were not rules at all, but were complex non-linear models that could not be translated into rules. The problem obviously could have been avoided had this disconnect been verbalized early in the modeling process.

2) Make sure modelers understand the purpose of the models

The modelers must know how the models will be used and what metrics should be used to judge model performance. A good summary of typical error metrics used by modelers is found here. However, for most of the models I have deployed in customer acquisition, retention, and risk modeling, the treatment based on the model is never applied to the entire population (we don't mail everyone, just a subset). So the metrics that make the most sense are often ones like "lift after the top decile", maximum cumulative net revenue, top 1000 scores to be investigated, etc. I've actually seen negative correlations between the ranking of models based on global metrics (like classification error or R^2) vs. the ranking based on subset selection ranking, such as top 1000 scores; very different models may be deployed depending on the metric one uses to assess them. If modelers aren't aware of the metric to be used, the wrong model can be selected, even one that does worse than the current approach.

Second, if the modelers don't understand how the models will be deployed operationally, they may find a fantastic model, one that maximizes the right metric, but is useless. The Neflix Prize is a great example: the final winning model was accurate but far too complex to be used. Netflix extracted key pieces to the models to operationalize instead. I've had customers stipulate to me that "no more than 10 variables can be included in the final model". If modelers aren't aware of specific timelines or implementation constraints, a great but useless model can be the result.

 

3) Make sure the model stakeholder understands what the models can and can't do

In the effort to get models deployed, I've seen models elevated to a status they don't deserve, most often by exaggerating their accuracy and expected performance once in operation. I understand why modelers may do this: they have a direct stake in what they did. But the manager must be more skeptical and conservative.

contined >>>

Predictive Analytics World Online Training
Course outline, sneak preview, discount offers and registration

One of the most successful colleagues I've ever worked with used to assess model performance on held-out data using the metric we had been given (maximum depth one could mail to and still achieve the pre-determined response rate). But then he always backed off what was reported to his managers by about 10% to give some wiggle room. Why? Because even in our best efforts, there is still a danger that the data environment after the model is deployed will differ from that used in building the models, thus reducing the effectiveness of the models.

A second problem for the model stakeholder is communicating an interpretation of the models to decision-makers. I've had to do this exercise several times in the past few months and it is always eye-opening when I try to explain the patterns a model is finding when the model is itself complex. We can describe overall trends ("on average", more of X increases the model score) and we can also describe specific patterns (when observable fields X and Y are both high, the model score is high). Both are needed to communicate what the models do, but have to connect with what a decision-maker understands about the problem. If it doesn't make sense, the model won't be used. If it is too obvious, the model isn't worth being used.

The ideal model for me is one where the decision-maker nods knowingly at the "on average" effects (these should usually be obvious).

Then, once you throw in some specific patterns, he or she should scrunch his/her eyes, think a bit, then smile as the implications of the pattern dawns on them as that pattern really does make sense (but was previously not considered).

As predictive modelers, we know that absolutes are hard to come by, so even if these three principles are adhered to, other factors can sabotage the deployment of a model. Nevertheless, in general, these steps will increase the likelihood that models are deployed. In all three steps, communication is the key to ensuring the model built addresses the right business objective, the right scoring metric, and can be deployed operationally.

Dean Abbott is President of Abbott Analytics in San Diego, California. Mr. Abbott has over 21 years of experience applying advanced data mining, data preparation, and data visualization methods in real-world data intensive problems, including fraud detection, risk modeling, text mining, response modeling, survey analysis, planned giving, and predictive toxicology. In addition, Mr. Abbott serves as chief technology officer and mentor for start-up companies focused on applying advanced analytics in their consulting practices.

See the full article and a related video at Forbes.com.

$50 billion. That's about how much marketers are spending on Big Data and advanced analytics (according to a BMO Capital Markets report) in the hopes of improving marketing's impact on the business.

This commitment reflects a belief that big data and advanced analytics can transform business. While, at times, the promise has fallen short of the reality, some companies are already seeing significant value. Recent academic research found that companies that have incorporated data and analytics into their operations show productivity rates 5 to 6 percent higher than those of their peers. Now is the time to define a pragmatic approach to big data and advanced analytics that is rooted in performance and focused on impact (see "Making advanced analytics work for you").

Here are four stories "from the front lines" that illustrate how companies have used advanced analytics to deliver impact.

1. Asking the right questions
The more data-rich your business becomes, the more important it is to ask the right questions at the beginning of the analytical process. That's because the very scale of the data makes it easy to lose your way or become trapped in endless rounds of analysis. Good questions should identify the specific decisions that data and analytics will support to drive positive business impact. Asking two simple questions, for example, helped one well-known insurer find a way to grow its sales without increasing its marketing budget: First, how much should be invested in marketing, and second, to which channels, vehicles, and messages should that investment be allocated? These clear markers guided the company as it triangulated between three sources of data, helping it develop a proprietary model to optimize spending across channels at the zip code level. (For more on this, read "What you need to make Big Data work: The pencil").
continued >>>

One telecom company in emerging markets recognized that its data could solve a longstanding quandary faced by financial service companies: how to meet the need of millions of low-income individuals for revolving credit, similar to credit cards, without a credit–risk model. Executives at the telecom realized that the payment histories of their mobile network could be used as a way to solve that conundrum. Using this data, the company created an innovative risk model that could assess a potential customer's ability to repay loans. Now the company is exploring an entirely new line in emerging market consumer finance that uses these analytics as a core asset.

3. Optimizing spend and impact across channels
Business is all about tradeoffs: price versus volume, cost of inventory versus the chance of a stock-out. In the past, many such tradeoffs have been made with a little data and a lot of gut instinct. Even now, in the age of cookies and click–throughs, it's not always easy to optimize spending allocations. Big Data and advanced analytics – particularly more real–time data – can eliminate much of the guesswork. One transnational communications company had spent heavily on traditional media to improve brand recognition, and invested in social media as well. However, its traditional marketing-mix models could not measure the sales impact of the social buzz.

Salford Systems

Combining data from traditional media, sales, and customer use of key social–media sites yielded a model that demonstrated that social media had a much higher impact than company strategists had assumed. More critically, company analysts found that the primary driver of social–media sentiment was not its television commercials but customer interaction with the company's call centers–and in fact, that poor call-handling was subtracting almost as much value as the TV spots were adding. By reallocating some media spending to improve call–center satisfaction, the company increased its customer base significantly and gained several million dollars in revenues. (Read more on this topic "Getting beyond the buzz."

4. Keeping it simple
Too much information is overwhelming. That's why it's important to keep reports simple or they won't be used. One large B2B manufacturer, for example, recognized that a large percentage of the company's sales flowed from a small proportion of its customer base, but sales growth with those big customers was sluggish. Managers wanted local sales representatives to find new customers, so the company created a central analytics team that gathered detailed data and built predictive models that identified the local markets with the highest new-customer sales potential.

See the full article and a related video at Forbes.com.

 

SAS

Predictive Analytics World
     



PAW



 Rising Media   Prediction Impact