“Data scientist” is a popular role these days. Everyone seems to have one — or claims to be one.
But how do executives know whether they have the “real thing” and how that value is best employed?
I met with Kaiser Fung, leader of the Applied Analytics program at Columbia University and co-founder of a new school for data scientists called RSquare Edge, to discuss this question. Here, I offer our perspective on capturing true value in the corporation with human talents in data science.
To start, the term “data scientist” has been diluted, and companies don’t always get what they paid for, so to speak. There’s a difference between operators of software who appear to find real data relationships and the people who can truly validate those relationships.
“Data science” can be defined as using data and quantitative or scientific methods to help solve business problems. But even the use of the word “science” is tricky because many problems are quantitative; they require data, but they’re not scientific. A good example is the Google search engine’s page-rank algorithm that defines page rank as a web page’s authority, which cannot be scientifically determined.
Real data scientists (i.e., the most capable, but also most expensive, resources) are professionals who sometimes have data or computer science degrees and related work experience and who apply machine-learning methodologies, data mining, and statistical models. They typically work in technology companies, especially startups, that build applications in web or mobile devices that consider how to incorporate data into the application.
Data analysts are somewhat less technical but have much broader training, proximity, and accountability to the business unit. They report on, track, and explain business metrics. This may include measuring marketing campaigns and understanding the effectiveness and efficiency of operational expenses. Their educational background is usually in economics, statistics, engineering, and the quantitative disciplines.
Data infrastructure engineers build and manage databases. This includes monitoring data quality and possible issues of privacy and security. They make sure the data are available, encoded properly, and of the right quality — or, validated — so that the team manipulates and analyzes accurate data.
An experienced and well-educated data scientist typically has a holistic point of view on data centered on figuring out what kinds of data the business should be collecting, understanding the quality of the data, and reporting on business metrics from a past tense or descriptive view.
A data scientist not only analyzes the trends, but also takes the time to discover the causes. Data scientists conduct “what if” scenarios with the end goal of defining actions that influence the company’s objectives. For example, by conducting predictive activities — experimentation’s aimed at improving the business metrics based on observations about the past — a qualified data scientist can predict an outcome.
Ideally, a strong data scientist leads to the smart design of these “what if” scenarios that can then be predictively tested. This type of advanced predictive analysis is more complex than data quality and integrity know-how.
Someone who has experience with data mining and regression models, application of statistical methodologies, and machine learning can use past outcomes to define the actions an organization should take. In a sense, the data scientist builds a framework on the past, present, and future of the data he’s responsible for.
Software development skills are also paramount, as they are especially helpful in the predictive stage. Outside that, deep experience working with codes and a general proclivity to work with any code that gets the problem solved most effectively are vital attributes.
Top-notch data scientists are comfortable working with real-time, continuously updated algorithms and large amounts of data that tend to propel real-time action in a relatively short time.
John Kelly leads the predictive analytics practice at Berkeley Research Group, which works with marketing, sales, and operations leadership across a range of industries to leverage the power of econometrics and data science. This work results in evidence-driven management, delivering dramatic growth and performance improvement.