There are many vendors selling products classified as big data analytics software. However, it’s challenging to differentiate these products based on functionality alone, as many of the tools share similar features and capabilities. Additionally, some of the tools exhibit extremely subtle differences. That being said, your key differentiating factors will likely focus on balancing ease of use, algorithmic sophistication and price in relation to your organization’s capability and level of maturity in analytics.
In this article, we examine products from nine big data analytics software vendors: Alteryx, IBM, KNIME.com, Microsoft, Oracle, RapidMiner, SAP, SAS and Teradata. Some of them provide more than one tool (see the “Leading vendors of big data analytics software” sidebar below for more details about the specific product offerings). These vendors represent different facets of the big data analytics market. Based on the characteristics described in our earlier articles, let’s compare and contrast the ways that the products are targeted to meet the business needs of user organizations.
Analyst expertise and skills. Some of the tools are targeted to novice users, some are targeted to expert data analysts, and some are engineered to appeal to both types of users.
Products such as IBM SPSS Modeler, RapidMiner’s tools, Oracle Advanced Analytics and the Automated Analytics version of SAP Predictive Analytics are generally designed to enable users who have little or no background in statistics or data analysis to analyze data, develop analytical models and design analytics workflows with little or no coding. While each vendor wraps the core analytics components with an intuitive user interface to guide the analyst’s progress in data preparation, analysis and then model design and validation, the approach taken may differ, especially when comparing a standalone product (such as RapidMiner) with one that’s a component of a larger suite (such as the Oracle product).
Tools such as IBM SPSS Statistics, KNIME Analytics Platform, the Expert Analytics module of SAP Predictive Analytics, Microsoft Revolution Analytics and the Teradata Aster Discovery Platform provide the more sophisticated functionality that expert users expect. Oracle R Advanced Analytics for Hadoop (ORAAH), which is one of the components in the Oracle Big Data Software Connectors Suite, provides an R interface for manipulating Hadoop Distributed Files System data and writing mapper and reducer functions in R. This flexibility may be appealing to more advanced data scientists.
Alteryx and SAS Enterprise Miner offer functionality adapted to the user’s level of expertise, and essentially fall into both categories. Overall, SAS Enterprise Miner and IBM’s SPSS tools stand out when it comes to supporting more advanced analytical techniques and model scoring, as well as a broader array of analysis functions including neural networks, association analysis and visualization capabilities.
Analytical diversity. Depending on the use case and application, your organization’s users will be required to support different types of analytics capabilities that will use specific types of modeling (e.g., regression, clustering, segmentation, behavior modeling and decision trees). While that has resulted in broad support for the various forms of analytical modeling at a high level, some vendors have invested decades of work into tweaking different versions of their algorithms and adding more sophisticated functionality. It’s important to understand which models are most relevant to your business problem and evaluate the products in terms of how they best serve your users’ business needs.
The more mature and higher-end (and, accordingly, higher-priced) tools will exhibit the greatest analytical breadth. Oracle Data Miner includes an array of well-known machine learning approaches to support clustering, predictive mining and text mining. Both editions of IBM’s SPSS products provide a diverse set of analytical techniques and models. And SAS Enterprise Miner supports many algorithms and techniques, including decision trees, time series, neural networks, linear and logistic regression, sequence and Web path analysis, market basket analysis and link analysis.
The newer generation (and, in some cases, lower-priced) products support different models, but perhaps with a narrower range of algorithmic sophistication. The model inventory in Alteryx Analytics Gallery includes such capabilities as regression analysis, decision trees, association rule analysis, classification and time series analysis. KNIME includes methods for text mining, image mining and time series analysis, and also integrates machine learning algorithms from other open source projects, Weka, R and JFreeChart.
About the author
David Loshin, managing director at DecisionWorx, is a recognized thought leader, speaker and expert consultant. He has also written numerous books, including Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL and Graph. He can be contacted at david [dot] loshin [at] decisionworx [dot] com.