When hiring data scientists, people tend to focus primarily on technical qualifications. It’s hard to find candidates who have the right mix of computational and statistical skills. But what’s even harder is finding people who have those skills and are good at communicating the story behind the data.
At The Data Incubator, we run a fellowship identifying the top 2% of STEM PhDs looking to work with our partner companies, which range from larger firms like Goldman Sachs or Genentech to smaller companies like Betterment or Yelp. Here are three attributes our partners look for in data scientists, and specific questions they use to identify those attributes:
Ability to articulate the business value of their work. It’s important to look for people who strive to benchmark their work with metrics. Ask a prospective candidate about a project they did and whether or not it was successful. During an interview, see if the candidate is talking about data science metrics or business metrics. A data science metric is one that measures the quality of a model: what was my r-squared? What was the root mean squared error? What was the model’s accuracy? While those are important questions for data science, they do not necessarily get at whether a project was successful. Project success is more often defined in terms of business metrics: how much did I decrease customer attrition? How much did we increase marketing effectiveness?
Good data scientists might lead with a data science metric, but give a business metric when prompted. The best data scientists immediately speak in terms of business metrics because they understand that their work has to have value for the organization, not just be interesting to data scientists. Managers should be wary of prospective hires who don’t know the business impact of their work (“I have to look that up and get back to you”)—that suggests a fundamental disconnect between management’s priorities and those of the potential new hires.
The right level of technical detail. When interviewing data scientists, we are tempted to grill them on technical intricacies like asymptotic bias or the functioning of Hadoop’s distributed cache. It’s important to remember that your potential new hire’s ability to communicate the right level of detail and to effectively tell the story behind the data is not often probed in technical interviews—but this is a huge part of the job. Ask prospective candidates to talk about previous data science work. Do they jump into the gory technical details or do they stay at an appropriately high level so that you can understand the message without being overwhelmed by buzzwords?
For example, any analysis is only as good as its assumptions. But does the candidate drone on about L2-integrability or does she tell you that she needed to assume customer flow is assumed independent day-to-day? Using the right technique is also important in analysis. Does the candidate opine on the virtues of random forests, or can she concisely articulate the reason for choosing that model? Frequently, we encounter candidates who rely on dogma rather than science. Be wary of those who only know the ins and outs of a favorite model but cannot articulate why they chose to use it beyond the fact that it is the “industry standard.” When identifying good candidates, you need to find those who understand the technical underpinnings but who can also then translate them for non-analysts.
Getting visualizations right. One of the most important aspects of data science is visualization. Managers will want to identify candidates who can tell a clean story about data and do so with diagrams. Our brains process visual data 60,000 times faster than text, making visualization one of the most efficient means of communication. Ask candidates to present a previous project. And make sure that they don’t use deceptive visualizations. Do they use truncated or inverted axes? Are axes secretly log scale when they shouldn’t be? Are area visualizations improperly scaled? Even if visualizations are honest, that’s not enough. Does the candidate’s presentation jump directly into a litany of complex diagrams, or does it first illustrate points using a simple example? Business communication is not just about proving a point with scientific rigor. It’s also about convincing your audience and getting them to relate to the point you’re trying to make.
Check out Michael’s session at Predictive Analytics World for Workforce: April 3-6, 2016 in San Francisco. Use code PATIMES16 and get 15% pricing.
Author Bio:
Michael Li founded The Data Incubator, a New York-based training program that turns talented PhDs from academia into workplace-ready data scientists and quants. The program is free to Fellows, and routinely accepts just 1% of applicants. Employers engage with the Incubator as hiring partners.