PayPal is using advanced predictive data analysis to help protect its users from fraud and preserve the service’s brand. Constant evolution is the key to PayPal’s success.
Scam artists continually find new ways to defraud unsuspecting people, and PayPal is fighting back with predictive data analysis.
By now most people are familiar with some variant of a common scam: An email arrives in your inbox from someone you don’t know promising you a share of a large pool of money. All you have to do is send an upfront investment, often through a payment processing service like PayPal.
Constant attacks from fraudsters make life difficult for Hui Wang, PayPal’s senior director of global risk sciences. She is tasked with stopping fraudulent transactions before they are processed. There’s a lot at stake here for PayPal. When the service is used for fraud it erodes the public’s trust in the payment platform and damages the brand. Over time this could threaten to diminish the user base.
To stop fraud and scams, Wang and her team have turned to predictive data analysis to identify potentially fraudulent transactions. Given the evolving nature of threats, Wang’s work never slows down. “For PayPal we are unique in the sense that our problems are very dynamic,” Wang said.
To identify potential cases of fraud, Wang’s team analyzes historic payment data to identify features that may indicate an attempted scam. Things like what type of device the requester is using, what country the request originates from and details from the users’ PayPal profile all can be correlated with fraud. The team uses this data to build machine learning algorithms that assess each transaction for potential signs of fraud. Over time the algorithm learns and sharpens its predictions.
Given the evolving nature of threats to PayPal users, this learning approach to predictive data analysis has always been the goal of Wang’s risk modeling team. But she said it’s only been possible to implement in the past five years or so. Before that time the computing power simply wasn’t available to run complex algorithms on such large volumes of historic data.
But over the past the five years a number of new technologies have emerged, particularly from the open source community, that have enabled Wang’s team to move beyond traditional risk modeling. The team uses products from Teradata and Oracle for data management and from SAS Institute for analytics, but is becoming a bigger user of open source tools like Hadoop and Spark. Wang said the open source resources give them a great amount of flexibility in the number of tools they support, which enables data scientists to work with whatever they feel most comfortable.
“Many times commercial software doesn’t meet our needs completely, so, in this case, open source really comes in handy,” she said. “We are able to take them and do all kinds of adjustments ourselves. That really unleashed the power of our data scientists.”