We analyze Top 30 LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science. Overall activity drops about 25%, but membership growth accelerates in Q4 2013. We identify 4 group quadrants and find which groups are fastest growing and most active.
We update our analysis of Top 30 LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science (Dec 2013) and find several interesting trends.
First, we found that growth slowed down in 2013 Q3 but resumed in 2013 Q4 and 2014 Q1.
The Figure 1 (below) shows quarterly growth rates in top 30 groups. Except for two groups: Machine Learning and SAS & Analytics Users (not shown in Figure 1) which had big growth in 1 or 2 quarters and none in 2 other quarters, most groups show surprisingly similar pattern of decline in growth in 13Q3, followed by acceleration in 14Q1 and 14Q2.
Fig 1: Top Linked Analytics Groups, Quarterly Growth 2013Q2 to 2014Q1. Thick black line is the overall average growth rate.
Here are the 10 largest groups (by membership as of March 31, 2014). We note that 7 largest were in the same order as in Nov 2013. The 6 largest grew significantly faster than the next 4 groups.
However, there seems to be no strong correlation between group size and growth rate among all 30 groups.
Here are 10 groups with the fastest growth in the past 12 months (March 25, 2013 to March 31, 2014)
The chart below shows group growth vs group size. Color corresponds to age – redder is younger, bluer is older. Group name abbreviations are in the table below.
Fig 2: Top Linked Analytics, Big Data, Data Science Groups by 2014 size vs growth
There are 2 main measures of group activity: discussions (posts)/week and comments/week. Since these numbers clearly depend on the group size, we measure them per 1000 members. We measure overall group activity as (discussions + comments / week) per 1000 members.
For 4 months ending in March 2014, activity level was 2.99/week, about 25% less than 3.97/week measured in Nov 2013.
The chart below shows group activity vs group size. Color corresponds to age – redder is younger, bluer is older. Group name abbreviations are in the table below.
Fig 3: Top Linked Analytics, Big Data, Data Science Groups – 2014 Activity vs Growth
In 4 month ending in March 2014 the average activity level was 2.03 discussion/week per 1K members, and 0.96 comments/week per 1K members, or about 2.1 discussions/comment, well below 2.57 discussions/week per 1K members and 1.40 comments/week per 1K members measured in Nov 2013 (1.8 discussions/comment). This means that the while activity has slowed down, the gap between discussions and comments has increased.
The chart below shows average comments/week vs average discussions/week for all 30 groups, with a circle size proportional to group size and circle color corresponding to activity change – green meaning increase, red decrease. We also show median lines for each dimension, which can be used to divide the groups in 4 quadrants.
Fig 4: 4 Quadrants of Top Linked Analytics, Big Data, Data Science Groups: Commenting vs Posting
Several groups stand out: KDnuggets has the highest number of discussions/1000 members, while RDM has a highest number of comments. The median line divide the groups in 4 quadrants, which we can characterize as
The details are in the table with below, with groups ordered by the number of members. The link to the raw data is at the end of the post.
The growth, comments, and discussions are in green font if that value is 25% above average,
in red if 25% below average, and in black otherwise.
We note that there are only 4 “triple green” groups, that are significantly above average on growth, comments, and discussions:
LinkedIn Group | Members (Mar 31, 2014) |
Founded | 12 mon Growth annua lized |
Cmt/ week per 1K mbr |
Disc/ week per 1K mbr |
---|---|---|---|---|---|
Average | 23058 | 22-Dec-08 | 53% | 0.96 | 2.03 |
![]() |
121816 | 28-Sep-07 | 74% | 1.86 | 1.69 |
![]() |
95638 | 20-Feb-09 | 82% | 0.76 | 1.54 |
![]() |
74350 | 1-Mar-12 | 100% | 1.97 | 1.86 |
![]() |
53345 | 3-Mar-08 | 43% | 0.71 | 0.62 |
![]() |
43761 | 25-Jul-08 | 116% | 1.92 | 3.24 |
![]() |
30792 | 1-Sep-08 | 92% | 0.84 | 3.31 |
![]() |
23368 | 26-Sep-07 | 15% | 2.13 | 1.28 |
![]() |
20941 | 25-Jun-08 | 32% | 0.48 | 0.50 |
![]() |
20000 | 6-Jan-08 | 4% | 0.21 | 1.48 |
![]() |
19389 | 23-May-08 | 11% | 0.22 | 1.91 |
![]() |
19087 | 12-Mar-08 | 52% | 0.78 | 0.53 |
Pattern Recognition, Data Data Mining, Machine Intelligence (closed) (PR) | 16297 | 2-Oct-08 | 60% | 1.06 | 0.19 |
![]() |
15121 | 13-Apr-08 | 34% | 0.22 | 1.20 |
![]() |
14930 | 24-Sep-08 | 28% | 0.17 | 0.25 |
![]() |
14929 | 10-Apr-09 | 35% | 1.05 | 1.29 |
![]() |
13947 | 2-Jun-08 | 27% | 0.76 | 1.04 |
![]() |
13112 | 10-Feb-12 | 48% | 0.64 | 3.45 |
![]() |
12025 | 11-Jan-09 | 66% | 0.98 | 2.51 |
![]() |
8450 | 31-Mar-08 | 50% | 0.81 | 1.80 |
![]() |
8003 | 24-Sep-07 | 22% | 0.32 | 1.35 |
![]() |
7623 | 16-Mar-09 | 72% | 0.90 | 5.79 |
![]() |
7278 | 10-Jul-08 | 41% | 1.06 | 0.12 |
![]() |
7052 | 8-Jun-09 | 114% | 2.18 | 6.72 |
![]() |
6349 | 17-Apr-11 | 22% | 0.08 | 0.66 |
![]() |
4972 | 30-Aug-11 | 126% | 2.88 | 2.63 |
![]() |
4886 | 4-Feb-08 | 73% | 2.57 | 9.68 |
![]() |
4034 | 20-Jun-08 | 33% | 0.19 | 1.43 |
![]() |
3596 | 24-Sep-09 | 18% | 0.32 | 0.23 |
![]() |
3594 | 11-Jul-08 | 63% | 0.63 | 0.75 |
![]() |
3044 | 2-Jul-08 | 42% | 0.25 | 1.81 |
Note: You can get actual data from the HTML source code of the LinkedIn group Statistics/Activity page.
Look for dataset seriesName=”Comments” and parse that data. Likewise for Discussions and Members.
Thanks to Anmol Rajpurohit for collecting the membership, comments, and discussions data.
Here is raw data (csv) for the top 30 LinkedIn groups.
Let me know which relevant groups were missed and what other trends you see.
By Gregory Piatetsky, KDnuggets
Originally published at www.kdnuggets.com
You must be logged in to post a comment.
Since LinkedIn’s records are combined from more than 50 offline data flows, Hadoop may use its enormous dataset. LinkedIn chose Teradata to address the rising needs in batch processing in order to assure business continuity. The data flows and datasets for the DWH were created and managed by Big Data Engineering. basketbros
Race through a neon-colored world in tunnel rush unblocked game, a fast-paced game that challenges you to avoid obstacles and keep moving forward.