What does it mean today to say your are—or want to be, or want to hire—a “data scientist?” Not much, unfortunately. The job title has almost as much ambiguity as the term “Big Data.” If you really want to be one of these, or to hire someone who can help you with your Big Data projects, you need to be much more precise in your terminology.
I could see this coming more than a year ago. I had just finished a Harvard Business Review article on data scientists with D.J. Patil, who co-coined the term when he was at LinkedIn Inc.. The article described the role and called it “the sexiest job of the 21st century.” Shortly after the article came out, a woman introduced herself to me at a health care analytics conference. Her business card said “Data Scientist,” but it was clear that she was a quantitative analyst at best. “Who can resist having the sexiest job of the century?” she asked.
Indeed. Not very many people, apparently. The data scientist term, which originally stood for quantitative experts who could also do a lot of the computational wizardry involved in analyzing unstructured Big Data, has come to mean almost anything. In addition to basic quantitative analysts, I’ve seen it applied to database administrator jobs, programmer jobs, and Web development jobs. I’ve seen it applied to jobs requiring a Ph.D., an MBA or Masters in analytics, or a B.A. “in some quantitative field.”
Some organizations have taken a “big tent” approach to job classifications in the role formerly known as data scientist. Icrunchdata.com, for example, an aggregator of such jobs, refers to them simply as “icrunchdata” jobs, with subcategories of “big data jobs,” “analytics jobs,” and “technology jobs.” The site aggregates over half a million jobs in the categories, which suggests at least two things: a) this is a huge category of work, and it matters how we refer to it if we want to acquire the right skills; and b) there is way too much variation to all fit under one job title.
I think the only answer is to be much more specific about the type of worker you want to be or hire. The specificity could include the type of data that needs to be analyzed (and not just “big”). For example, here are some admirably specific data type categories gleaned from job postings:
Bioinformatics with a strong genome data focus
Syndicated marketing data
Text from warranty claims
Video of customers in stores
Web analytics and web log files
You should also specify just what the job is going to do with the data. These categories might include:
Query and reporting
Data warehouse management
Finally, just so no one gets confused, you need to specify the kind of tools that the data person will need to know. This might encompass:
Hadoop, Pig, Hive, Python (typical big data manipulation tools);
SAS, SPSS, R (typical statistical analysis tools);
SQL, Business Objects, Cognos (typical query and reporting tools)
Excel (capable of a lot, but typically used for small-scale reporting and financial analysis)
Teradata, Informatica (data warehouse and loading tools).
If you’ve specified all these things, then all that’s left is the kind of educational credentials you prefer. Ph.D.’s in quantitative or science fields are often sought by organizations that want a combination of deep analytical skills, the ability to learn new tools easily, and an experimental orientation. M.A.’s or M.S.’s in some form of analytics are likely to be able to do fairly routine statistical or data manipulation work, but may not be able to develop cutting-edge algorithms. MBAs typically can do good spreadsheet work, and perhaps a regression analysis, but aren’t normally taught much beyond that. And BAs in computer science, business, or a quantitative field are probably going to learn a lot of what they need to do on the job. They may be very smart, but they are unlikely to know a lot about the business use of analytics on their first day at work.
If you are thinking about getting more schooling in this domain, you should pin down just what skills an educational program purports to impart, and at what level. Again, you might ask what type of data you’ll learn to work with, what you will learn to do with the data, and what kinds of tools you’ll become facile with. As you can imagine, it’s virtually impossible to teach you everything you could possibly need to know in the data crunching field in a year or two. And just as with job titles, the name of the program isn’t always a good guide to what is included in it.
No matter what your detailed focus in the quant domain, keep in mind that if you plan to go into business, you had better be able to communicate what you do to non-specialists. The rest of the world doesn’t understand—and generally doesn’t care—what methods you used to create your analytical results, and how you handled the multicollinearity problem in your data. It cares only what your results say about how a decision should be made or a new product should be developed. Survey results say that employers care more about the ability to communicate analytics than any other trait. We can start by communicating more effectively just what our jobs involve.
By: Thomas H. Davenport, Distinguished Professor, Babson College Originally published at mobile.blogs.wsj.com
Thomas H. Davenport is a Distinguished Professor at Babson College, a Research Fellow at the Center for Digital Business, Director of Research at the International Institute for Analytics, and a Senior Advisor to Deloitte Analytics.