Machine Learning Times
Machine Learning Times

This excerpt is from searchbusinessanalytics.techtarget. To view the whole article click here

5 years ago
Business Decisions Needed in Data Privacy Protection


Much of the digital economy relies on consumers giving up data for a service. This forces businesses to use their data privacy policies to build trust.

When Philip O’Brien wanted to start modeling employee churn at Paychex, he knew there was a lot of potential to reduce the number of employees who leave. But he also realized that if he wasn’t careful about data privacy protection, there was also potential to run afoul of employment laws.

Replacing employees who leave the company for new jobs is expensive, which makes it a ripe field for predictive modeling, O’Brien, MIS and portfolio manager at Paychex, said in a presentation at the Predictive Analytics World conference in Boston. It’s relatively simple to develop a model that predicts who is likely to leave and then develop an intervention to reduce the chances of that person looking for work elsewhere. But things get complicated when you take into consideration workplace anti-discrimination laws.

For this reason, O’Brien and his team had to be careful about what kind of variables they selected for their model at Paychex, which is a payroll services company in Rochester, N.Y. Traits like race, gender and age were disqualified right off the bat because federal laws prohibit employment decisions based on such factors.

But O’Brien said there were subtler decisions. For example, he said that a person’s zip code could be predictive of their likelihood of leaving, but zip code also can often be used as a proxy for race. The team eventually settled on a set of variables that generally relate to how long employees have been with the company, what types of clients they work with, how heavy their workload is and whether they work in an office or primarily work from home. O’Brien said he wanted to say as far from as possible from anything that could possibly be construed as discriminatory.

“Your model doesn’t care if it discriminates, but the laws do,” O’Brien said. “You have to be really sure you aren’t including discriminatory variables.”

Data privacy is a matter of trust

The fundamental problem when it comes to data privacy protection comes down to how you use information about people, and it’s one that a growing number of businesses are facing as reliance on big data grows. The more data you have about people, the more you can model their behavior. But doing so runs the risk of alienating people.

Your model doesn’t care if it discriminates, but the laws do.
Philip O’Brien, MIS and portfolio manager, Paychex

O’Brien said he was concerned about more than anti-discrimination laws. Once he and his team accounted for possible sources of bias in the model, they also had to think about how to use it. He said grading individuals on their propensity to leave could lead to bigger problems down the line. For that reason they decided to anonymize individual scores. Paychex corporate managers can only see aggregates by branch office. This allows them to implement churn-reduction interventions at the office-level, rather than singling out individuals.

This step was necessary to develop a data privacy policy that avoids the perception of “creepy big data” and ensures that individuals didn’t feel like their managers are trying to influence them based on factors outside of their control, O’Brien said.

“I don’t know about you, but I would not want to know there’s a model out there that’s giving me a D or F score,” O’Brien said.

This excerpt is from searchbusinessanalytics.techtarget. To view the whole article click here.

By: Ed Burns, Site Editor,
Originally published at

Leave a Reply

Pin It on Pinterest

Share This