Machine Learning Times
Machine Learning Times
Three Best Practices for Unilever’s Global Analytics Initiatives
    This article from Morgan Vawter, Global Vice...
Getting Machine Learning Projects from Idea to Execution
 Originally published in Harvard Business Review Machine learning might...
Eric Siegel on Bloomberg Businessweek
  Listen to Eric Siegel, former Columbia University Professor,...
Effective Machine Learning Needs Leadership — Not AI Hype
 Originally published in BigThink, Feb 12, 2024.  Excerpted from The...

10 years ago
Searching for data scientists? They come in sets of 3!


Data scientists have been called the most sought-after IT professionals of all for 2014. But that mission is distorted on two fronts, according to one current data scientist.

Dr. Michael Wu, chief scientist at Lithium Technologies (aka, data scientist), spoke with FierceCIO this week about the growing demand for data scientists, what the role requires and how organizations can best acquire such talent.

Wu began the conversation by addressing two points of confusion in the industry quest for skilled data analysts.

First is the notion that ‘data scientist’ is a single job role. It is at least three different and related roles, and perhaps even more.

Second is the idea that these much-desired professionals will come from the ranks of IT–many will not and need not, Wu says.

Lithium is a data analytics firm working with a number of large clients that are trying to make sense of the mountains of data they collect from customers. Customers include AT&T and Home Depot.

“They are trying to better understand customer behavior,” Wu explains. “They are collecting data from a lot of social channels and mobile channels, and each set behaves differently.” Specifically, such customers are trying to determine who the real “influencers” are that can help drive sales and brand loyalty.

Wu’s specific tasks are to build data models that can be used to predict social behaviors, and to run tests on those models to verify their credibility. He says that what he is doing now at Lithium is nearly identical to the work he did to earn his PhD in statistics and data modeling.

Lithium’s clients actually benefit the most from the collective work the company does, rather than just their portion of it. The company analyzes the collective data of all clients in order to best identify patterns and trends in social media and consumer behavior. When Wu is doing his thing, he is not thinking about a specific client, but about the tales that the data tells. He wants to know that the data model he is developing actually works at making credible predictions.

Since so much of Lithium’s efforts are for sales and marketing efforts, the holy grail in the search are true influencers–and not the ones so often reported on in the general press.

“We find that the ones that are the loudest are not the true influencers,” Wu says.

To identify who the real influencers are in social media, Wu helped develop a data modeling process that uses six criteria:

  • Credibility–is the person properly aligned with the subject?
  • Bandwidth–how much, or how frequently does the person write/speak about the topic?
  • Content relevance–is the content properly aligned to the client’s needs?
  • Temporal relevance–is the content’s appearance well timed to the client’s needs?
  • Channel alignment–is the content found on the desired sources, i.e. Facebook, Twitter?
  • Trust–do readers trust the person to be objective and honest?

“We built this into our model, and all six factors have to be present for someone to be a true influencer,” Wu explains. “We then validated it, and can throw it into our product.”

From there, a client customizes the product in a way to help them create their own influencers.

Wu acknowledges that reports on how much data scientists are in demand are accurate. But he says there is a great deal of confusion by executives on what that really means. Data scientists are not a single job function–but three.

“There are three types of data scientists,” Wu says. “And everybody in any of the three areas call themselves data scientists.”

The three roles that make up the complete data science triangle are:

  • Business analyst: “Working at the decision layer, this person is responsible for the final analysis and business intelligence. They present the data to the decision maker.”
  • Machine learning expert: “This person focuses on the statistics, builds data models, ensures that the data is accurate, unbiased, is easy to explain and to understand so that the analyst can interpret it effectively.” Wu counts himself in these ranks.
  • Data engineer: “This person works at the infrastructure and platform later. They ensure data quality, sale and relevance.”

Organizations that have their sights set on acquiring a data scientist in 2014 may be in for a shock, Wu indicates. The need for most will be to acquire two or three individuals from the above list, not one. At some organizations, there may even be need for more than one individual in a respective role.

Finally, Wu says the majority of those hired in 2014 to perform the above roles will probably not come from a traditional IT background. These are data and statistical analysis roles, and anyone with a deep rooted background in those fields is just what the doctor ordered.

So where should a hiring manager be looking to find ‘data scientists’ in 2014: “engineering, the sciences,” Wu says. “Anyone who has worked with data modeling, and has scientific training,” Wu explains.

Finally, “Organizations need to understand the required skills to do the work, rather than focusing on the title. It is three roles, not one,” Wu says.

By: David Weldon, senior editor and contributes to FierceCIO, FierceEnterpriseCommunications, and FierceMobileIT
Oriignally published at

Leave a Reply