Top 5 most used tools were R (used by 70% of data miners), IBM SPSS Statistics, Rapid Miner, SAS, and Weka, while STATISTICA, KNIME, SAS JMP, IBM SPSS Modeler, and RapidMiner had the the highest satisfaction. Big Data is actually used only in a small fraction of projects.
Last week at Predictive Analytics World in Boston, Karl Rexer, the president of Rexer Analytics, presented the initial results of the very popular Data Miner Survey his company conducts since 2007. I attended his talk and he kindly shared his findings for publication in KDnuggets.
Full results will be published later in 2013, and results of all past surveys are freely available at www.rexeranalytics.com/ .
This was the 6th survey since 2007, and over 1,200 data miners from 75 countries have responded to 68 questions. The respondents breakdown by occupation was:
While geographic distribution was
Some the highlights from the survey
The average data miner reports using 5 different software tools. The top 10 most used tools were R (used by 70% of data miners), IBM SPSS Statistics, Rapid Miner, SAS, Weka, Matlab, Microsoft SQL, IBM SPSS Modeler, SAS Enterprise Miner, and KNIME.
Here is the chart:
The top 10 tools with ranked by usage as the primary tool were:
The survey also measured tool satisfaction (with vendors excluded) and STATISTICA, KNIME, SAS JMP, IBM SPSS Modeler, and RapidMiner received the highest satisfaction ratings – see chart below.
The survey also looked at Big Data. While reported data volumes have increased in 2007, only about 8% work with really big data, over 100,000,000 records, vs. 7% in 2007. Only 13% report having an Active Big Data program.
Full results will be freely available at www.rexeranalytics.com.