Is this the beginning of the end for the vaunted data scientist? That’s the clever pitch from Tableau Software, one of a handful of business intelligence vendors pushing the envelope on self-service BI tools.
Because the software’s user interface offers drag-and-drop features, even users without a strong math background can build visualizations and interrogate the data, the vendor promises. Tableau isn’t the only one with a strong self-service play. More and more vendors are offering analytical packaged applications that mask the complexity of analytics on the back end with easy-to-use features, which begs the question: Are data scientists just experiencing their 15 minutes of fame?
Dan Sommer, a Gartner analyst, doesn’t think so. At the Gartner BI and Analytics Summit, he argued that while access to self-service tools may put analytics into the hands of just about every employee, it won’t eliminate the role of the data scientist altogether. “You don’t give a Ferrari to someone who just got their driver’s license,” Sommer said.
Or, perhaps more to the point, you don’t give a bunch of raw materials to just anybody and say, “Build a Ferrari.” That’s not a job for the average mechanic; nor is sniffing out never-before-seen relationships from a company’s diverse data sources a job for data dabblers.
There are just too many hard-to-detect data traps, and even data smarties like Carmen Reinhart, professor of international finance at Harvard Kennedy School, and Kenneth Rogoff, professor of public policy and economics at Harvard University, fall victim to them. They co-authored Growth in a Time of Debt, a study of the relationship between government debt and economic growth. The paper argues that as countries take on significant debt, their economic growth slows.
When Thomas Herndon, a University of Massachusetts Amherst graduate student in economics, tried to duplicate the findings, however, he “basically found the biggest spreadsheet error in the history of mankind,” Sommer said. Turns out, some of the conclusions in the popular paper were based on incomplete data sets. Although the paper’s basic finding didn’t change with more complete information, Herndon found the conclusion wasn’t nearly as black and white.
Businesses, it would seem, will not only need to keep their data scientists — as O’Reilly Media’s Mike Loukides has long argued — they’ll also need to encourage data skepticism.
One surprising takeaway from the Gartner Magic Quadrant on data warehousing and database management systems? Traditional data warehousing “came back with a vengeance in terms of demand,” said Mark Beyer, Gartner analyst, at the BI Summit.
Most traditional vendors seen as “leaders” in this space, including IBM, Teradata and SAP, with its HTAP or hybrid transaction/analytical processing, are building logical data warehouse roadmaps. A term coined by Beyer, the logical data warehouse is a relatively new approach to data management that veers away from the central repository. Instead, data lives where it best resides — be it in a traditional data warehouse, analytical database, or Hadoop-distributed file system — and virtual layers provide views into the data.
The traditional vendors are “getting into a title fight and coming after each other,” Beyer said. But they also need to watch their backs: Cloud provider Amazon Redshift, Hadoop distributor Cloudera and NoSQL database provider MarkLogic found their way into the quadrant this year. They didn’t debut as “leaders,” but they didn’t give a weak performance, either.
Welcome to the self-quantified self. Federico Zannier of Brooklyn, N.Y., data-mined himself and then launched a Kickstarter campaign to hawk his personal data in a project he called “a bit(e) of me.”
“I violated my own privacy,” he said in his campaign video. Why? U.S. advertisers, known to buy and sell customer data, raked in $30 billion in revenue in 2012, Zannier explained. “In 2012, I personally made zero dollars. â€¦ Is my personal data worthless to me?”
To find out, Zannier put a price on his data. For a mere $2, anyone could buy a day’s worth of Zannier’s self-quantified self — bundled into a single folder. The data included websites he visited that day, photos of his face looking at his computer taken every 30 seconds, screenshots of the pages he was looking at, his GPS location, the positions of his mouse and a list of applications he used.
He attracted 213 backers and raised $2,733. Not exactly a goldmine, but more than five times his goal of $500. And he promised to use the funds “to finish a browser extension and an iPhone app that allows you to do the same.”
By: Nicole Laskowski, senior news writer, SearchCIO.com & SearchCIO-Midmarket.com
Originally published at http://searchcio.techtarget.com
You must be logged in to post a comment.
The new tools that are released to abstract the “gory” math behind data science will help businesses to make decisions based on their data, however, much like high level programming languages and R.A.D products make application development easier than using assembly language, you still have to know what you are doing to write great programs. This is even more important when it comes to data science due to the complexities of data analytics. You may be able to drag and drop a neural network with these new tools, however, you still need to know the parameters such as how many layers, or how many neurons, which training technique to use, etc. What happens if your “point and click” solution generates insights that are incorrect? You still need to know which algorithm is best suited for your specific problem, how you should handle problems in your data like missing values (based on the problem you are trying to solve). There’s also feature engineering, validation techniques. The list goes on. I believe that these new tools will make data scientist jobs easier, however, you still need the data scientist.