Machine Learning Times
EXCLUSIVE HIGHLIGHTS
For Managing Business Uncertainty, Predictive AI Eclipses GenAI
  Originally published in Forbes The future is the ultimate...
AI Business Value Is Not an Oxymoron: How Predictive AI Delivers Real ROI for Enterprises
  Originally published in AI Realized Now “Shouldn’t a great...
How To Un-Botch Predictive AI: Business Metrics
  Originally published in Forbes Predictive AI offers tremendous potential...
2 More Ways To Hybridize Predictive AI And Generative AI
  Originally published in Forbes Predictive AI and generative AI...
SHARE THIS:

5 years ago
Clustergam: Visualisation of Cluster Analysis

 
Originally published in MARTIN FLEISCHMANN, April 27, 2021.

When we want to do some cluster analysis to identify groups in our data, we often use algorithms like K-Means, which require the specification of a number of clusters. But the issue is that we usually don’t know how many clusters there are.

There are many methods on how to determine the correct number, like silhouettes or elbow plot, to name a few. But they usually don’t give much insight into what is happening between different options, so the numbers are a bit abstract.

Matthias Schonlau proposed another approach – a clustergram. Clustergram is a two-dimensional plot capturing the flows of observations between classes as you add more clusters. It tells you how your data reshuffles and how good your splits are. Tal Galili later implemented clustergram for K-Means in R. And I have used Tal’s implementation, ported it to Python and created clustergram – a Python package to make clustergrams.

clustergram currently supports K-Means and using scikit-learn (inlcuding Mini-Batch implementation) and RAPIDS.AI cuML (if you have a CUDA-enabled GPU), Gaussian Mixture Model (scikit-learn only) and hierarchical clustering based on scipy.hierarchy. Alternatively, we can create clustergram based on labels and data derived from alternative custom clustering algorithms. It provides a sklearn-like API and plots clustergram using matplotlib, which gives it a wide range of styling options to match your publication style.

To continue reading this article, click here.

Comments are closed.