Predictive Analytics Times
Predictive Analytics Times
EXCLUSIVE HIGHLIGHTS
The Art of Data Science
 With much of the latest discussion...
Wise Practitioner – Predictive Analytics Interview Series: Ashley Walsh at LeanTaas
 In anticipation of her upcoming conference...
Twelve Hot Deep Learning Applications Featured at Deep Learning World
  For today’s leading deep learning...
New Book: Stephen Few’s “Big Data, Big Dupe” Smackdown
 Five years ago, in 2013, two...
SHARE THIS:

4 months ago
The Top JavaScript Data Visualization Packages, Ranked

 

Transitioning from academia to industry in Data Science can be a daunting task. The sheer breadth of available tools can make it difficult to figure out where to start with learning Data Science skills, and which are most important to learn. To help guide learners on their Data Science journey, we thought it would be useful to rank the most popular tools Data Scientists use. These tools are popular for several reasons, including their capability and usability, and skill with these tools is in high demand for Data Science professionals.

At The Data Incubator, we train students at every level on the latest in-demand tools and technologies for Data Science – ultimately training students at the post-graduate level for positions in the Data Science industry through our free Fellowship program, and training students at the corporate level to increase organizations’ Data Science capabilities. Our curriculum is based on feedback from our corporate and government partners, but we wanted to develop a more data-driven approach to what we should be teaching. So, we decided to analyze the popularity of certain tools and packages, with the understanding that the more people who use a tool, the more capable and usable it must be.

In this article, we rank the top JavaScript data visualization packages. First we describe our methodology, and then we list the top 20 packages, in order.

The Rankings

Below is a ranking of the top 20 of 110 JavaScript data visualization packages that are useful for Data Science, based on Github and Stack Overflow activity, as well as npmjs (javascript package manager) downloads. The table shows standardized scores, where a value of 1 means one standard deviation above average (average = score of 0). For example, chart.js is 3.29 standard deviations above average in Github activity, while plotly.js is close to average. See below for methods.

Results and Discussion

The ranking is based on equally weighing its three components: Github (stars and forks), Stack Overflow (tags and questions), and npm downloads(totals and compounded monthly growth rate). These were obtained using available APIs. Coming up with a comprehensive list of JavaScript visualization packages was tricky – in the end, we scraped four different lists that we thought were representative (see methods below for details) and ranked 110 JS packages (excluding 191 d3-modules we ranked separately). Computing standardized scores for each metric allows us to see which packages stand out in each category. The full ranking is here, while the raw data is here.

d3.js and its derivatives dominate the field

d3.js is at least four standard deviations above the mean on all calculated metrics. d3.js offers users full control of all aspects of their data visualizations. With this power comes a trade-off: d3.js does not come with built-in charts and making a simple bar graph can become quite time consuming. For this reason, dozens of reusable charting packages have been built upon d3.js. D3.js derivatives with premade components make up six of the top 20 packages on our list. These include: plottable (4), plotly.js (5), britecharts (7), c3 (9), recharts (15), and dc.js (18). These derivatives tend to provide charting options for bar, line, and scatter plots. For more specialized visualizations such as maps and networks additional packages are necessary.

leaftlet.js is the most popular map visualization package

leaflet.js (6) is the only package dedicated to mapping to break into the top 20 on our list with scores above the mean on all of our metrics. In addition to specializing in interactive maps, leaflet.js is lightweight (38KB of JS) and mobile-friendly. cesium (27) is the highest ranking package to offer 3D globes and maps. cartodb (29), rickshaw (37), and datamaps (46) also offer powerful geospatial/mapping visualizations.

sigma.js beats cytoscape for the most popular graph/network visualization package

sigma.js (17) is a JavaScript library solely dedicated to graph drawing, but in fact it is the only package in our top 20 even capable of graph/network visualization (besides the customizations offered by d3.js). Another package specializing in graph theory, cytoscape (38) has a strong showing slightly outperforming sigma.js in StackOverflow and npmjs download activity. However, sigma.js weighs in with more than twice as many stars and forks on GitHub.

britecharts has the largest growth rate for 2017

With so many data visualization options (we ranked 110), one might think it would be hard for a new charting package to gain a following. britecharts, a reusable charting library based on D3.js and created by eventbrite, was first made publicly available less than two years ago. britecharts earned the number 7 spot in our overall rankings, and the highest compound monthly growth rate (110%) over the last 6 months. The next package to even come close is graphael with a 56% growth rate.

There’s a place near the top for both flot and flotr2

flot (rhymes with plot) comes in just one spot behind its successor flotr2. flot is a pure JavaScript plotting library for jQuery, while flotr2 has a similiar syntax but with no dependence on jQuery. Although it was released over five years ago, flotr2 has yet to outperform flot in GitHub or StackOverflow activity. flotr2 has a larger growth rate, but both packages rank highly on our list and continue to be actively maintained by separate groups.

Limitations

As with any analysis, decisions were made along the way. All source code and data is on our Github Page. The full list of JavaScript visualization packages came from a few sources, and packages were unranked, due to unavailable downloads and Github data. These include: Google Charts, KoolCharts, TeeCharts, ZoomCharts.

Further, naturally, some packages that have been around longer will have higher metrics, and therefore higher ranking. This is not adjusted for in the Stack Overflow or Github metrics. The download metrics are restricted to the past six months.

The data presented a few difficulties:

  • The plottable has an inflated Stack Overflow (SO) question metrics since it’s a common word.
  • SO data for plotly may also be inflated, as it’s both an R and Python package.

Methods

All source code and data is on our Github Page.

We first generated a list of 141 Data Science packages from these four sources, and then collected metrics for all of them, to come up with the ranking. Github data is based on both stars and forks, while Stack Overflow data is based on tags and questions containing the package name. Downloads data is from npmjs. Downloads were totaled over a six month period, and the compound monthly growth rate was calculated over the same period. After scraping other sites for JS visualization package names, we had gathered over 200 package names. Many of them were aliases for the same packages (d3, D3JS). If a the first result of Github search returned the same repo as another package, we treated them as the same package, but saved the aliases to search Stack Overflow questions.

A few other notes:

  • Any unavailable Stack Overflow counts were converted to zero count.
  • Counts were standardized to mean 0 and deviation 1, and then averaged to get Github, Stack Overflow, and Download scores, and combined to get the Overall score.
  • Some manual checks were done to confirm Github repository location.
  • 191 D3-modules were removed and d3-modules data collection, analysis, and ranking were done separately.

All data was downloaded on August 6, 2017.

About the Authors

Rachel Allen is Lead Scientist at Booz Allen Hamilton. Prior to joining Booz, she worked as an instructor at The Data Incubator after graduating from their free eight-week fellowship. Rachel holds a PhD in Systems Neuroscience from University of Virginia and completed her post-doctoral research at the National Children’s Health System

Michael Li is the founder and CEO of The Data Incubator, a company he started to help organizations hire and train professional Data Scientists. As a data scientist, he has worked at NASA, Google, Foursquare, and Andreessen Horowitz. He is a regular contributor to VentureBeat, The Next Web, and Harvard Business Review.

Leave a Reply