Pasha Roberts will present at PAW San Francisco on this case study.
A call center for a global financial firm had a problem – attrition. In order for a new employee to even be permitted to speak with a single customer, the new representative had to go through 12 weeks of expensive training and pass a federal “Series 7” exam. Many new hires dropped out during training, and generally 33 percent do not pass this six-hour exam.
Most call centers face a similar issue – thousands of dollars of training before a new hire can even begin to bring value to the firm. Does the candidate have the interest, attitude, and willingness to go the distance? The differentiator is something not seen on a resume or gleaned from an interview.
The managers of this financial call center took the initiative to use data science to answer the question, “Is it possible to quantify the mindset of the call center representative?”. The quantified mindset can then be used to predict those most likely to succeed at training, passing the exam, and graduating into the call center. Hiring managers then use this predictive model to hire better fitting candidates, decrease attrition and dramatically cut costs.
Data science has a long and mature history of measuring the human attributes of consumers to predict advertising responses, upsell patterns and other purchasing patterns. Employees are equally human, and have even more impact on the success or failure of a venture – making it worthwhile to use a data science approach here as well, to model and optimize teams and companies.
The underlying concept in this analysis is that humans have personal, intrinsic attributes, different from skills and training, that are strengths or weaknesses in different roles. The mindset has been called “raw talent” or aptitude, and consists of many factors that can, in fact, be measured.
Dozens of factors have been measured in different ways, but a handful are relevant to the workplace. The 11 that we measure cover areas such as curiosity, problem solving approach, degree of cooperation, service orientation, and focus on results to name a few. Some factors are mutually exclusive – an aggressive, independent problem solver is not naturally cooperative. All of these factors are value-neutral – they are simply human building blocks.
Several raw talent statistics from the call center sample. Each has a tight, statistically significant value range.
The interesting part happens when we apply these human factors to the performance of a specific job role. In each role, certain combinations of traits do well and others don’t. Attention to detail helps an accountant, but can just slow a sales representative down. Service orientation is often useful for a service role, but might impair a collections agent. Different aggregated human characteristics are seen in top performers of different roles.
These models cannot be built by anecdotal, subjective judgments. It is too complex for the human brain to process fairly and consistently. To eliminate bias, modern data science methods need to be used to gather samples, then build, evaluate, and implement talent models.
For maximum accuracy and sensitivity, Talent Analytics builds models specifically for a given role at a specific company. Our research has shown that corporate culture strongly influences role requirements. For example, IBM and Google both hire software engineers. But these two software engineers will measure quite differently due to the vastly different work environments and cultures.
The first step was to mutually agree upon the scope of this experiment. Because different roles demand different profiles, we focused our sample to one specific support-oriented call center role, set in call centers across the country. The sample ultimately included 2,141 employees.
Our client defined success as passing the Series 7 exam. Employee attrition after this start date was also expensive, but this was left as a separate problem. Measuring the larger problem of attrition would be complicated by survivorship bias, and an open-ended time horizon. With a focused time horizon, this was a clean experiment with tangible results.
We focused on finding what it takes to pass the exam and begin to be useful to the call center. Removing the dead weight of attrition before this milestone had a large financial value to the client.
Second, we gathered a baseline measurement of attrition. Each terminated employee had a “reason code” describing why they left the job – voluntary, involuntary, background check, project completed, and so on. We created a meta-code called “job fit,” with three levels: “good hire,” “bad fit,” “neutral.” Each reason code was assigned to a job fit level. Voluntary or involuntary termination was a bad fit, even though we did not know the circumstances of each voluntary termination. Background checks were performed after the hiring date, so failure due to background was considered neutral. If the employee was converted to a full-time employee or let go at the end of the project, it was a good hire. If an employee could not pass Series 7, they were a bad fit.
Examining 1,366 records, this baseline showed 47 percent were a good Hire, 49 percent were a bad fit, and 4 percent were neutral. The results were slightly worse than a coin flip. The primary measure of experiment success was to significantly decrease bad fit and to increase good hire.
From existing employees in our pool, management identified a group of 122 top performers based on performance metrics, manager reviews, and customer feedback. These top performers were measured with an online raw talent survey, gathering 10 raw talent metrics that have been validated and calibrated across a wide range of adults. This was our training dataset.
The next step was to create a mathematical model for success in this role. Unfortunately, the call center was unable to provide individual performance scores for each agent, only that the agent was a member of the top group. The raw talent scores for these top performers were strongly clustered, showing a clear similarity in behavioral and ambition metrics. The top performers had a strong raw talent fingerprint separate from skills or training. As a result, we created a straightforward linear model to indicate a percentage match to the scores in this training set.
The model was then validated against an out-of-sample group of 38 agents, mixed between 28 top performers and 10 lower performers. The model returned 79 percent true positives, 21 percent false positives, 20 percent false negatives, and 80 percent true negatives. This was deemed a good improvement, and the model was given a green light for production.
The call center began to use the match to benchmark number as part of its hiring procedure. The hiring managers were instructed to not use the match number for more than 30 percent of their consideration of a candidate.
Over the course of eight months, 952 candidates in this production set were evaluated with this tool, in combination with other hiring factors. From these, 233 were hired, trained, and given the Series 7 exam. Some quit, failed, terminated, and transferred to other projects.
Results = Million Dollar Savings
The improvement was encouraging:
The Confusion Matrix is a vital tool to understand the impact of prediction from a model, and the trade-offs between models. If you draw a table crossing actual outcomes versus the outcomes that the model predicted, the result using numbers from this study is what the table above shows.
This simple model improved the true positive to 59 percent from 47 percent, a 25 percent improvement. The true positive and true negative indicate where the model works. We predicted a good hire and got a good hire. Or we expected a bad fit and got one. In this case, the true negative must be inferred, since the call center cannot know the job performance of someone who was not hired. A world in which true positive and true negative were 100 percent would be very boring indeed.
It is useful to examine the costs and benefits of each false position. In this case, a false positive is a situation where the model predicts a good hire but the person was a bad fit. The cost of a false positive is high – thousands of dollars of hiring, on-boarding, training, and testing for no benefit. However, in the real world things happen. It is possible that the bad outcome came about from factors outside of the predictive model such as manager styles, educational gaps, or temporary life issues.
A false negative is a situation where the model predicts a bad fit but the employee becomes a good hire. This example contained candidates who were hired in spite of a bad match to benchmark, and in fact did well. In a way, a false negative should be a cause to celebrate the discretion and wisdom of the hiring manager, and of the overachieving employee. The costs for a false negative are more hypothetical for the call center – the company may have missed hiring a good employee, perhaps in a tight employment market.
Tactics shift in different employment markets. In this case, the call center wanted to decrease false positives, which were 49 percent at the baseline.
Talent benchmarking increased true positives, finding more good hires. Talent benchmarking dramatically decreased false positive, saving the cost of bad hires.
The most expensive error type for the client was false positives, people who were hired but did not work out. This percentage went to 35 percent from 49 percent, which is a 30 percent reduction. For this eight-month sample it implies that the call centers hired 52 fewer future ex-employees, saving roughly a million dollars a year. Also during this period, they made 45 more good, long-term hires than they would have. Call center hiring is high-volume, expensive business, so marginal improvements make big savings.
Strategy and Followup
We were fortunate to design and maintain this as a very contained project. The results would have rapidly become muddy if, for example, the scope were expanded to include attrition past Series 7. It was best to build confidence with this exercise, then move forward with new goals.
With more client data, we would be able to deploy more advanced modeling such as regression, decision trees, or neural networks. These methods can use the interactions among multiple factors and would likely provide even better predictive capacity. With better predictive power, the system would accurately find more useful good hires, and accurately reject more of the expensive bad fits.
The client expanded its use of the benchmarking procedure to include more call centers and types of agents.
Today’s media trumpets successful approaches using analytics to model human outcomes with customers. This study shows the value of using this same data science inside the organization to model and optimize employee outcomes.
Pasha Roberts is chief scientist at Talent Analytics Corp., a company that uses data science to model and optimize employee performance in areas such as call center staff, sales organizations and analytics professionals. He wrote the first implementation of the company’s software over a decade ago and continues to drive new features and platforms for the company. He holds a bachelor’s degree in economics and Russian studies from The College of William and Mary, and a master of science degree in financial engineering from the MIT Sloan School of Management.