Machine Learning Times
Machine Learning Times
AI Success Depends On How You Choose This One Number
 Originally published in Forbes, March 25, 2024. To do...
Elon Musk Predicts Artificial General Intelligence In 2 Years. Here’s Why That’s Hype
 Originally published in Forbes, April 10, 2024 When OpenAI’s...
Survey: Machine Learning Projects Still Routinely Fail to Deploy
 Originally published in KDnuggets. Eric Siegel highlights the chronic...
Three Best Practices for Unilever’s Global Analytics Initiatives
    This article from Morgan Vawter, Global Vice...

10 years ago
Big predictions for Big Data for community banks


Department store conglomerate Target Brands Inc. knows when its female customers are pregnant long before they start buying cribs. Google Inc. was able to track the H1N1 flu in real-time using Internet search queries when the government couldn’t. Parole and sentencing decisions in some states are influenced by software that predicts the likelihood of a repeat offense.

Welcome to the world of Big Data, one where ever-expanding pools of data can be exploited for new insights into consumer, economic, behavioral and other patterns at an unprecedented level of detail. From finding ways to make traffic flow more efficiently and improving the performance of cell phones and car parts to creating concerns over privacy and overconcentration of power as the largest companies hold huge troves of data, it’s shaping society in ways big and small.

IBM Corp. estimates that the world had about 800,000 petabytes stored in 2000 (one petabyte has been described as 20 million four-drawer filing cabinets filled with text). Today, Twitter Inc. generates 10 petabytes each day transmitting microblogs. By 2020, the world’s computers are expected to hold 35 zettabytes of data (one zettabyte is one million petabytes).

While scientists have been collecting data for ages, the hype around Big Data comes down to what IBM calls the three Vs: volume, variety and velocity. Some add a fourth V: veracity. Not only do we have epic amounts of data, often recorded in real-time, but the data takes more forms than ever before. Now traditional data in forms such as spreadsheets and databases are analyzed in tandem with unstructured data—things like photos, videos, emails, social media posts and recorded conversations. It’s a veritable smorgasbord—but only if it’s analyzed correctly.

“In reality, we found in the last few years that without some kind of data transformation—an effort to turn data into information—it doesn’t tell you anything more than it used to,” observes Jason Malo, research director at CEB TowerGroup, a business technology consulting firm in Boston. “You almost never see Big Data by itself. It’s always paired with analytics.”

Predicting the future

Big Data has given rise to predictive modeling, the ability to anticipate from the collective experience of an organization to make customer and event predictions at the individual level. Just as IBM’s super-fast computer Watson was able to beat human Jeopardy! game show champions by studying past answers to future ones, today’s predictive modeling technology relies on past data to predict what will happen in the future, says Eric Siegel, author of “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die” and founder of Predictive Analytics World.

“It’s literally learning [from past cases] for situations never seen before,” he says.

Consider Google’s efforts to track the flu. By asking doctors to report cases of the flu, the Centers for Disease Control and Prevention was able to publish information about the location and severity of the flu once a week with a two-week lag. Google, a company known for its data obsession, took a different approach. The company compared the 50 most common searches in America with CDC flu data from 2003 to 2008 to find correlations between search phrases and the spread of the flu, according to “Big Data: A Revolution That Will Transform How We Live, Work and Think” by Oxford University Professor Viktor Mayer-Schönberger and Kenneth Cukier, data editor at The Economist.

Google didn’t limit itself to obvious searches like “cough remedies” or “fever.” It used its huge computational power to study all 50 million searches—finding 45 search terms that when used in a mathematical model could reliably predict where the flu had spread in real time, the authors report.

It’s a huge feat, one that’s being emulated in many sectors, including financial services—the industry with the most stored data, according to McKinsey Global. Wall Street uses roadway traffic patterns around shopping centers to gauge the economy, while hedge funds use Twitter to predict the stock market’s performance. SWIFT, the Society for Worldwide Interbank Financial Telecommunication, offers GDP forecasts based on their funds transfers data, say Mayer-Schönberger and Cukier.

The goal is to pounce on opportunities by finding the connections between data sets—something possible even at community banks, says Siegel, especially in the areas of marketing, fraud and credit scoring. (See the January 2013 ICBA Independent Banker story “Guessing Game” by Katie Kuehner-Hebert on predictive modeling at

For instance, technology giant Hewlett-Packard Development Co. used its employee data to develop a model predicting which staffers are most likely to quit. The tool gives managers the opportunity to remedy the situation or prepare for a likely departure. By looking for commonalities among customers who left the bank and using those examples to identify which current customers are most likely to close their accounts as well, Siegel says, it’s possible for community banks to monitor and respond similarly with their customers.

Mining for patterns

“It’s all about data discovery,” says Ray Wilson, program director at Teradata Corp., a data warehousing and analytics firm famous for making the surprising discovery that Wal-Mart customers stock up on strawberry Pop-Tarts before a hurricane hits. Banks can now identify what leads consumers to specific actions—from clicks on a website or comments made to an employee to specific transactions. “Once we can understand that, we can alert the system to future patterns demonstrated by various customers going down a similar path,” he says.

“If for each customer you have [his or her] payment history, credit limit, demographics and so many other pieces of information, each is potentially predictive and in concert they could be particularly predictive.”
—Eric Siegel, technology author and professor

To do this, community banks need a smart data governance plan that defines what information their systems are capturing, how frequently it’s captured and how it’s protected, says Malo. Going beyond an IT document, the plan must address the business needs of the bank—including what value specific data is likely to have to other operations within the bank. Community banks may need to take steps to connect their data, for instance allowing analytics to access both transaction history and emails, which may be in different locations. This may be complicated for banks with older systems, but this step is a factor to consider during system upgrades.

One area that the largest banks are exploiting to improve customer satisfaction is so-called unstructured data. Teradata’s Wilson shares the example of a megabank whose wealth management division kept getting hit with Consumer Finance Protection Bureau fines due to customer complaints. The bank had set aside $30 million for future fines.

To head off complaints, the same large bank reviewed one month of conversations across different channels—everything from its Web chat, email and blog to anything publicly available on social media. It looked for attitudes, comments or text strings that indicated a potential CFPB complaint. Once it found them, the bank configured its software to flag these, giving staff a chance to mitigate potential complaints. As a result, the bank cut its reserves for regulatory fines by 30 percent, Wilson says.

Of course, community banks have been analyzing customer data for years, reporting on account sizes, credit risk or customer segments. But the difference between old-fashioned analysis and customer relationship management is the difference between having a few rules versus having a coded program that can make a decision, Malo says. It’s not putting each customer into a category—it’s making specific predictions for each customer.

For instance, in the past a bank’s fraud department might decline a debit transaction because a customer’s purchase was unusual or in a foreign country. It applies a black-and-white rule. In a Big Data world, a bank can make a far more complex decision based on a huge range of factors such as the customer, the device used and account numbers involved, Malo says. So while a big-screen TV may be an irregular purchase for a customer, his bank may decide not to decline the transaction because it’s taking place at a big box retailer five miles from the customer’s home.

A data dilemma

But just because data is available doesn’t mean customers are comfortable with others using it, Malo points out. Location capture, based on usage of a mobile app or other identifiers, is becoming increasingly used by large financial institutions, giving them the ability to tell if a customer is in the same place as his or her credit or debit card. But like the recently revealed National Security Agency monitoring program that harvests vast data tracking Americans’ Internet usage, email and phone calls, it may be viewed as intrusive or creepy, which could trigger a consumer backlash. Then there are the revelations that Google’s StreetView cars, meant to collect GPS information for maps, also collected email and other personal information from unsecured home networks. Now the courts are sorting these privacy issues out.

Then there’s the question of what these companies will do with the data they amass. Shoppers may not mind it when online retailer Amazon uses their purchase history to recommend a book they might like, but what about if Amazon combines its data with those at other large organizations, everything from corporate conglomerates and public utilities to social media providers and local governments?

“It’s all about data discovery.”
—Ray Wilson, technology expert

In his best-selling book “Who Owns the Future?” computer scientist Jaron Lanier raises the concern of concentrated power in a data-rich world where large private companies exploit relatively small advantages with powerful computers, and in turn controlling commercial marketplaces in the process. “They are gathering huge amounts of data from the world [to combine] with statistics to create models of everyone and selling the results to parties that can benefit from predictions,” he says. What’s to stop someone, Lanier wonders, from developing a model for who is likely to get sick and selling it to insurance companies? There are distinct advantages to having large pools of data, agrees Siegel. “If for each customer you have [his or her] payment history, credit limit, demographics and so many other pieces of information, each is potentially predictive and in concert they could be particularly predictive. Getting a few more key ingredients could make the difference,” he says, giving a potentially dominant and controlling advantage to those with access to the best and largest amounts of data.

Seeking opportunities

The good news for community banks, says Wilson, is that they potentially have access to the same channels of data and transaction information as the biggest banks. Big banks face the same struggle of streamlining disparate data sources as they work to integrate even more complicated systems. While community banks need to be selective about storing data due to the cost, they also need to keep in mind the future value of data, suggest Mayer-Schönberger and Cukier in their book.

One missed opportunity for some community banks was failing to issue their own credit cards, the authors note. By turning their credit card portfolios over to large issuers, some community banks lost out on the opportunity to obtain spending patterns and other valuable data. Instead, large banks and card issuers like Visa and MasterCard are controlling and heavily analyzing that data. For example, MasterCard Advisor, the division that collects and analyzes billions of worldwide transactions, has found that those who fill up their gas tanks around 4 p.m. are likely to spend between $35 and $50 at a grocery store or restaurant within the next hour, creating retail opportunities that community banks might share with their retail customers, Mayer-Schönberger and Cukier report.

Yet while Big Data will continue to influence everything from political campaigns to public policy, there will always be something to be said for the gut. While Google infamously uses data to carefully analyze every little decision—down to how to best display snacks so that employees will make healthy choices—visionaries like Steve Jobs developed the iPhone and other inventions using their innate instincts.

“There’s a concept that because we have all this data that something magical will happen and we’ll get all this insight from it,” Malo adds.

But even when robust analytics are used to transform data into information, people will still be needed to understand and make use of the results. “You’re rarely going to remove people from the equation,” Malo says. “Someone sits at the bottom of the funnel to deal with it.”

Kelly Pike is a freelance writer in Annandale, Va.
Originally published at

Leave a Reply