As you might have heard before if you read McKinsey reports, the New York Times or just about any technology news site, data scientists are in high demand. Heck, the Harvard Business Review called it the sexiest job of the 21st century. But landing a gig as a data scientist isn’t easy — especially a top-notch gig at a major web or e-commerce company where merely talented people are a dime a dozen.
However, companies are starting to talk openly about what they look for in data scientists, including the skills someone should have and what they’ll need to know to survive an interview. I spent a day at the Predictive Analytics World conference on Monday and heard both Netflix and Orbitz give their two cents. That’s also the same day Hortonworks published a blog post about how to build a data science team.
Granted that “data scientist” is a nebulous term — perhaps as much so as “big data” — these tips (a mashup of all three sources) are still broadly applicable. If you want to make the leap from guy who knows data to data scientist, I suggest paying attention.
1. Know the core competencies.
For most of us, there’s readin, ‘ritin’ and ‘rithmetic. For data scientists, there’s SQL, statistics, predictive modeling and programming (probably Python). If you don’t have at least a grounding in these skills, you’re probably not getting through the door, in part because they form a common language that lets people from different backgrounds talk to each other.
Hortonworks’ Ofer Mendelevitch describes the ideal data scientist as occupying a place on the spectrum between a software engineer and a research scientist. In distinguishing a great engineer, mathematician or data analyst from a data scientist, programming skills are probably the biggest variable. That’s because being able to write code means you’ll have an easier time testing out your hypotheses and algorithms, hacking through certain problems and generally thinking in ways that actually relate to the products your employer is building.
Chris Pouliot, director of algorithms and analytics at Netflix, said even being able to “pseudo-code” might be good enough if someone is otherwise a strong candidate. You can pick up SQL or Python or whatever you need pretty quickly, he noted.
Or, hinted Orbitz VP of Advanced Analytics Sameer Chopra, you could just suck it up and learn Python now: “If you were to leave today and ask ‘What specific skills should I learn?’: Python.”
2. Know a little more.
Of course, just meeting the minimum requirements never got anybody a job (well, almost nobody). What Pouliot is really looking for in a candidate are: an advanced degree in a quantitative field; hands-on experience hacking data (ideally using Hive, Pig, SQL or Python); good exploratory analysis skills; the ability to work with engineering teams; and the ability to generate and create algorithms and models rather than relying on out-of-the-box ones.
Chopra’s advice was to get up to speed on machine learning, especially if you want to work in Silicon Valley, where machine learning has exploded in popularity. He’s also a big fan of honing those hacking skills because data munging is such a valuable skill when you’re dealing with so many types of data that you need to process so they work together. If you can do quality analytics across myriad data sources, Chopra said, “you can write your own ticket in this day and age.”
Oh, and if you’re planning to work at a startup, he added, R is almost a must-know for anyone whose job will entail statistical analysis.
3. Embrace online learning.
If it all sounds a little daunting, don’t be too worried, Chopra advised. That’s because there are plenty of opportunities to learn these new skills online via both massive open online courses (he’s particularly keen on Udacity’s Computer Science 101 and Andrew Ng’s machine learning course on Coursera) and universities’ own online curricula. Chopra also suggested joining professional groups on LinkedIn, participating in Kaggle competitons and maybe even getting out of the house by going to meetups.
Whatever you’re curious about, though — text mining, natural language processing, deep learning — you can probably find someone willing to teach you for free or nearly free, and any additional skills will help set you apart from the crowd.
4. Learn to tell a story.
Last month at Structure: Data, DJ Patil told me that one of the biggest skill shortcomings in data science is the ability to tell a story with data beyond just pointing to the numbers. Chopra agreed, noting that today’s new visualization tools make it easier to display data in formats that non-scientists might be able to (or at least want to) consume. A corollary of storytelling is good, old-fashioned communication: All the charts in the world won’t make a difference if you can’t communicate to product managers or executives why your findings matter.
Pouliot is a little less sold on communication skills, though — at least sometimes. If you’re an engineer primarily talking to other engineers, he told the room, you probably can speak all the jargon you want. It’s only if someone has a business-facing role when communication really becomes important.
5. Prepare to be tested (aka “Your pedigree means nothing”).
After you’ve learned all these skills, added them to your résumé and talked to a hiring manager about how good you are at them, it’s likely testing time. Prospective Netflix data scientists go through a battery of exercises, Pouliot says, including explaining projects they’ve worked on and questions to determine the depth of their knowledge. They’ll also be asked to devise a framework that solves a problem of the interviewer’s choice.
One thing Pouliot warned about is an over-reliance on what’s on your résumé. Right off the bat, for example, he’ll test the heck out the skills or knowledge that someone claims to ensure they really know it.
Having a Stanford degree and work experience at Google don’t necessarily make someone a shoo-in, either. Pouliot acknowledged during a quick chat after his presentation that he’s been seduced by the perfect resume before — even going so far as to cut a few corners to get someone in for an interview — only to be disappointed in the end. Everyone has to pass the tests, he said, and some of the best applicants on paper crashed and burned very early in the process.
6. Exercise creativity.
It’s during the testing phase at places like Netflix that all those personal skills and experience can come into play. There’s often no right answer when it comes to answering the hypotheticals an interviewer like Pouliot might ask, and he gives bonus points for solutions he’s never seen before. “Creativity is one of the biggest things to look for when hiring data scientists,” he said. Later, he added, “Creativity is king, I think, for a great data scientist.”
Bonus tips for anyone hiring and managing data scientists
Technically, Pouliot’s talk at Predictive Analytics World was about hiring data scientists, but much of the insights were probably more valuable to aspiring data scientists. Some of them, though, we’re definitely for management, possibly at the C-level. A few points to consider:
Netflix has a standalone data science team that works closely with other departments but ultimately answers to itself. This helps the data scientists collaborate with one another, gives them upward mobility (i.e., they might never become director of marketing, but they could become director of data science) and makes it easier to manage them because everyone speaks the same language so an employee knows his boss knows his stuff.
However, he noted, the alternative approach of embedding data scientists within other departments does bring its own benefits. That type of setup can result in a better alignment of research efforts and business needs, and it can help products get built faster because everyone is on the same page. Pouliot suggests one compromise might be to keep a centralized data science team but locate it physically near the other teams it will be interacting with most often, and other is just to ensure you have representatives from every stakeholder department present for meetings and problem-solving exercises.
Actually, if you just cannot hire data scientists with all the skills you want them to have, Mendelevitch from Hortonworks suggests a similar tactic. It can be difficult to teach applied math to software engineers and vice versa, so, he writes, “[S]imply build a Hadoop data science team that combines data engineers and applied scientists, working in tandem to build your data products. Back when I was at Yahoo!, that’s exactly the structure we had: applied scientists working together with data engineers to build large-scale computational advertising systems.”
If you want to retain your good data scientists once you’ve hired them — especially in Silicon Valley where they can walk out the door and get five offers — paying them the market rate is a good start. Additionally, Pouliot said, letting them work on challenging products will keep them happy. Micro-managing them will not.