Preface from Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die?
Inside the Secret World of the Data Crunchers Who Helped Obama Win
Why Predictive Modelers Should be Suspicious of Statistical Tests
SAS Salford Systems Pitney Bowes logo Rising Media Prediction Impact

Predictive Analytics Times Newsletter:

I was excited to read Dean Abbott's article: What We Should Take Home From Predictive Analytics Conferences. Having worked in the conference and exhibits industry the past 15+ years, it's always exciting to see someone else express value in attending conferences. Our industry leading conference Predictive Analytics World is the benchmark as far as predictive analytics conferences. I hope to see you at one of our 2013 conferences.

I understand that your time and training dollars can be limited. Therefore, we are thrilled that you are reading Predictive Analytics Times and staying current in the ever changing and maturing predictive analytics industry.

I hope you enjoy our February issue and it reveals the insight necessary to help you in your daily endeavors. As always, if you like what you see, please spread the word with your colleagues and peers.

Kind regards,

Adam Kahn Adam Kahn
Publisher, Predictive Analytics Times

Not Subscribed? Sign Up For The Predictive Analytics Times Newsletter:

* required

How the Obama Camp Analytically Persuaded Millions of Voters 1
What We Should Take Home From Predictive Analytics Conferences 3
On-Demand Webinar - How-To: Effectively Realize Data Visualization 4
Analytics Vendors Must Make Prediction Easier, Forrester Says 6
Training Program in Predictive Analytics – April in New York City 7
ANALYTICS SOFTWARE: Salford Systems - Predictive Modeler 9
Predictive Analytics World

This article from the The Fiscal Times (headlined as "The Real Story Behind Obama's Election Victory") is an excerpt from Eric Siegel's book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (releasing today, February 19, from publisher John Wiley; foreword by Thomas H. Davenport). See also this Newsmax TV interview with the author, as well as his widely-distributed blog post, Team Obama Mastered the Science of Mass Persuasion-And Won.

Elections hang by a thinner thread than you think.

By now you probably know that Barack Obama's 2012 campaign for a second term "moneyballed" the election, employing a team of over 50 analytics experts.

You may also know that the huge volume of contentious and costly presidential campaign tactics – executed in the eleventh hour in pursuit of the world's most powerful job – ultimately served only to sway a thin slice of the electorate: swing voters within swing states.

RELATED: How You're Shaping the Future Through Big Data

But what most people don't realize is that presidential campaigns must focus even more narrowly than that, taking micro-targeting to a whole new level. The Obama campaign got this one right, breaking ground for election cycles to come by applying an advanced form of predictive analytics that pinpoints rare gems: truly persuadable voters.

This is the new microcosmic battleground.

We've heard a great deal about Nate Silver lately. Silver has soared past the ranks of sexy scientist to become the face of prediction itself. If mathematical "tomorrow vision" has a name, it's Nate. Even before his forecasts were vindicated by the election results, it was hard to find a talk show host who hadn't enjoyed a visit from Silver.

But an election poll does not constitute prognostic technology – it is plainly the act of voters explicitly telling you what they're going to do. It's a mini–election dry run. There's a craft to aggregating polls, as Silver has mastered so adeptly, but even he admits it's no miracle of clairvoyance. "It's not really that complicated," he told late–night talk show host Stephen Colbert the day before the election. "There are many things that are much more complicated than looking at the polls and taking an average... and counting to 270, right?"

Moving beyond poll analysis, true power comes in influencing the future rather than speculating on it. Nate Silver publicly competed to win election forecasting – while Obama's analytics team quietly competed to win the election itself.

This reflects the difference between forecasting and predictive analytics. Forecasting calculates an aggregate view for each U.S. state – but predictive analytics delivers action–oriented insight: predictions for each individual voter.

SWING VOTERS: A MYTH
The concept of swing voters is ill–defined and subjective. The Democratic National Committee (DNC), in one approach, labels as "not very partisan" those voters who self–reported as independent, or for whom their party is (for any reason) unknown.

Despite this information about them, many such voters have indeed made up their minds and are unswingable.

Instead of mythical swing voters – or unicorns, for that matter – what matters to a campaign is concrete and yet quite narrow: Who will be influenced to vote for our candidate by a call, door knock, flyer, or TV ad? That is, who is persuadable?

Read the rest of this article.

Predictive Analytics World Toronto

Why should one go to a predictive analytics conference? What should one take home from a conference like PAW? There are lots of reasons conferences are valuable including interacting with thought leaders and practitioners during sessions and informal conversations, seeing software and hardware tools (the exhibit hall), and learning principles of predictive analytics in workshops and case studies. It is this last item, learning from case studies that will be addressed here.

Seeing how someone else solves a problem much like the problem we face in our organization is a powerful way to learn; there is no quicker way to upgrading our analysis than having someone else who has "been there" tell us how they succeeded in their development and implementation of predictive models.

When I go to conferences, this is at the top of my list of take-homes. Questions I love to ask are ones like: is there a different way of looking at a problem than I had considered before? How did the practitioner overcome his or her inevitable obstacles? How was their target variable defined? What data was used in building the models? How was the data prepared? What figure of merit did they use to judge a model's effectiveness?

Almost all case studies we see at conferences are success stories; we all love winners. Cognitively, we all know that we learn from mistakes, and many case studies actually enumerate mistakes. But success sells and given time limitations in a 20-50 minute talk, few mistakes and dead-ends are described. And, as we used to say when I was doing government contracting, you do the research and then when the money runs out, you declare victory.

Putting a more positive spin on the process, we do as well as we can with the resources we have, and if the final solution improves the current system, we are indeed successful.

But once we observe the successful approach, what can we really take home with us? I argue that there are three reasons we should be skeptical taking case studies and applying them directly to our own problems.

The first two reasons are straightforward. First, your data is different from the data used in the talk. Obviously. But it is likely to be different enough that one cannot not take the exact same approach to data preparation or target variable creation that one sees at a conference.

Second, your business is different. The way the question was framed and the way predictions can be used are likely to differ in your organization.

If you are building models to predict Medicare fraud, they way the "suspicious" claim is processed and how descriptors of why the claim was suspicious is handled differently with different health care providers.

The third reason is subtler. In a fascinating New Yorker article entitled, "The Truth Wears Off: Is there something wrong with the scientific method?", author Jonah Lehrer describes an effect seen by many researchers over the past few decades where statistically significant findings in major studies become harder and harder to replicate by the original researcher and by others over time. This is a huge problem because replicating results is what we do as predictive modeler: we assume that behavior in the past will be replicated in the future.

Continue >>>

SAS

In one example, Jonathan Schooler (who was originally at the University of Washington as a graduate student) "demonstrated that subjects shown a face and asked to describe it were much less likely to recognize the face when shown it later than those who had simply looked at it. Schooler called the phenomenon "verbal overshadowing. The study turned him into an academic star."

A few years later, he tried to replicate the study but didn't succeed. In fact, he tried many times over the years and never succeeded. The effect he found at first waned each time he tried to replicate the study with additional data. "This was profoundly frustrating. It was as if nature gave me this great result and then tried to take it back."

There have been a variety of potential explanations for the effect, including "regression to the mean". This might be the case because even when case studies show statistically significant results, defined by a p value of less than 0.05, this result is set in stone. There is a chance even with a small p value that the effect found was not really there at all. Over thousands of studies, dozens find effects therefore that aren't really there.

Let's assume we are building models and there is actually no significant difference between responders and non–responders (but we don't know that).

However, we work very hard to identify an effect, and eventually we find the effect on training and testing data. We publish. But the effect isn't there; we had a 1 in 20 chance of finding the effect when it wasn't there to begin with and just had bad luck. Even if the chance of finding the effect by chance is 1 in 100, or 1 in 1000, if we experiment enough and search through enough variables, we may happen upon a seemingly good effect eventually. This process is called "over searching" by Jensen and Cohen (see "Multiple Comparisons in Induction Algorithms").

Since our problems are different than those we see in the case study, and since we even can't be sure that even the case study solution we see will stand the test of time, what should we do? We should take home with us the ideas, the principles, and the approaches. It should spur us to try ideas we either hadn't yet tried or even thought about before.

Dean AbbottDean Abbott is President of Abbott Analytics in San Diego, California. Mr. Abbott has over 21 years of experience applying advanced data mining, data preparation, and data visualization methods in real-world data intensive problems, including fraud detection, risk modeling, text mining, response modeling, survey analysis, planned giving, and predictive toxicology. In addition, Mr. Abbott serves as chief technology officer and mentor for start-up companies focused on applying advanced analytics in their consulting practices.

Read the full article at informationweek.com.

There were few surprises in the rankings in a "Forrester Wave: Big Data Predictive Analytics" report released on Monday, but the analyst firm had urgent advice for SAS, IBM and eight other vendors evaluated in the report: make predictive analytics more accessible to business users.

The ten vendors in the report are, in Forrester's ranked order, SAS, IBM, SAP, Tibco, Oracle, StatSoft, KXEN, Angos Software, Revolution Analytics and Salford Systems. The rankings are based on 51 evaluation criteria rolled up into 13 overall scores on current product offerings, company strategy and market presence. SAS has the largest market presence with more than 3,000 predictive analytics customers, versus more than 1,500 for IBM, but the product and strategy ratings are what drove the rankings, according to Forrester's report. (A free copy of the report, which was not sponsored by any single vendor, is available from SAS without registration requirements.)

SAS offers "a broad set of tools for predictive analytics, an architecture that supports multiple platforms, in-database analytics and in-memory analytics," the report notes.

To stay on top, the analyst firm says SAS needs to "provide more sophisticated solutions for real-time analytics, such as stream processing [and] offer predictive modeling tools that business analysts find more usable."

Forrester gave IBM high marks for its worldwide footprint and broad analytics portfolio, which includes SPSS and the PureData System for Analytics (formerly Netezza) in-database capabilities among other assets. Forrester encouraged the vendor to make its total portfolio "less confusing" and to create more solutions that "customers can use out of the box." Reading between the lines, that suggests less dependency on IBM consultants to integrate multiple products.

One surprise in the assessment, according to Forrester analyst and report author Mike Gualtieri, was that SAP scored as well as it did considering that the vendor is relatively new to advanced analytics. The company introduced an advanced analytics module three years ago based on SPSS software and then introduced a replacement, called Predictive Analysis, in 2012.

Continue >>>

Predictive Analytics Training in NYC

SAP has a modeling tool (akin to SAS Enterprise Miner and SPSS Modeler) and an analytics library (akin to those offered by Teradata and IBM PureData for Analytics). Forrester's Wave report describes SAP's strategy of running analytics inside the Hana in-memory database as bold. Gualtieri told InformationWeekthat Forrester's Q3 2012 review found that SAP's technologies were more scalable for big data use than those offered (at the time of the review) by Tibco, then next vendor in the ranking.

Another surprise in the rankings was that products focused on the open source R programming language, such as those from Oracle and Revolution Analytics, did not fare better. R has received a lot of attention and support from multiple vendors, but Gualtieri said that this powerful programming language is difficult to learn and only appropriate for direct use by data scientists and high-level analytics professionals.

"The idea of business types using R to develop predictive analytics would be like asking them to use Java to write BI reports – it's just not going to happen," he said.

In addition to providing abstracted, business-user-oriented interfaces, the path to eliminating complexity from predictive analytics -- whether from commercial products or open-source environments like R -- will be to take advantage of automation and machine-learning, Gualtieri said. The idea is to automatically test a variety of algorithms and implement the most effective ones without forcing users to go through complex, iterative testing.

"We're not there yet, but lots of vendors are working on this approach," he said.

The list of companies pursuing automation and/or machine learning includes analytics vendors Alpine Data Labs and KXEN, BI vendors Alteryx, Birst and Pentaho, and process-management vendors Pegasystems and Rage Frameworks.

Revolution Analytics' David Smith, VP of marketing, said he's glad to see a significant report focused on the hot predictive analytics arena. "It's a topic that has gone from being a side issue into the core focus of companies now that they've figured out the value of collecting data and applying predictive analytics to that data," he said.

Read the full article at informationweek.com.

Salford Systems

Predictive Analytics World
     



PAW



 Rising Media   Prediction Impact