November 29th 2015
By Eric Siegel
This Newsweek article, originally published in Newsweek’s opinion section and excerpted here, resulted from the author’s research for a new extended sidebar on the topic that will appear in the forthcoming Revised and Updated, paperback edition of Eric Siegel’s Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (coming January 6, 2016). Preorder today to delve more deeply on this topic.
I must disagree with my fellow liberals. The NSA bulk data shutdown scheduled for November 29 is unnecessary and significantly compromises intelligence capabilities. As recent tragic events in Paris and elsewhere turn up the contentious heat on both sides of this issue, I'm keenly aware that mine is not the usual opinion for an avid supporter of Bernie Sander (who was my hometown mayor in Vermont).
But as a techie, a former Columbia University computer science professor, I’m compelled to break some news: Technology holds the power to discover terrorism suspects from data—and yet to also safeguard privacy even with bulk telephone and email data intact. To be specific, stockpiling data about innocent people in particular is essential for state-of-the-art science that identifies new potential suspects.
I'm not talking about scanning to find perpetrators, the well-known practice of employing vigilant computers to trigger alerts on certain behavior. The system spots a potentially nefarious phone call and notifies a heroic agent—that's a standard occurrence in intelligence thrillers, and a common topic in casual speculation about what our government is doing. Everyone's familiar with this concept.
Rather, bulk data takes on a much more difficult, critical problem: precisely defining the alerts in the first place. The actual “intelligence” of an intelligence organization hinges on the patterns it matches against millions of cases—it must develop adept, intricate patterns that flag new potential suspects. Deriving these patterns from data automatically, the function of predictive analytics, is where the scientific rubber hits the road. (Once they’re established, matching the patterns and triggering alerts is relatively trivial, even when applied across millions of cases—that kind of mechanical process is simple for a computer.)
It may seem paradoxical, but data about the innocent civilian can serve to identify the criminal. Although the ACLU calls it “mass, suspicionless surveillance,” this data establishes a baseline for the behavior of normal civilians. That is to say, law enforcement needs your data in order to learn from you how non-criminals behave. The more such data available, the more effectively it can do so.
Here's how it works. Predictive analytics shrinks the unwieldy haystack throughout which law enforcement must hunt for needles—albeit by first analyzing the haystack in its entirety. The machine learns from the needles (i.e., known perpetrators, suspects, and persons of interest) as well as the hay (i.e., the vast majority that is non-criminal) using the same technology that drives financial credit scoring, Internet search, personalized medicine, spam filtering, targeted marketing, and movie, music, and book recommendations. This automatic process generates patterns that flag individuals more likely to be needles, thereby targeting investigation activities and more productively utilizing the precious bandwidth of officers and agents. Under the right conditions, this will unearth terrorists who would have otherwise gone undetected.
This increasingly common practice also drives other crime fighting functions. Today's law enforcement organizations predictively investigate, monitor, audit, warn, patrol, parole, and sentence…
Eric Siegel, Ph.D. is the founder of the Predictive Analytics World conference series—which covers both business and government deployment—the author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Revised and Updated Edition (Wiley, January 2016), and a former computer science professor at Columbia University.