Comparison of Methods to Inform K-Means Clustering
A Brief TutorialPhoto by Nabeel Hussain on UnsplashK-Means is a popular unsupervised algorithm for clustering tasks. Despite its popularity, it can be difficult to use in some contexts due to the requirement that the number of clusters (or k) be chosen before the algorithm has been implemented.Two quantitative methods to address this issue are the elbow plot and the silhouette score. Some authors regard the elbow plot as “coarse” and recommend data scientists use the silhouette score . Although general advice is useful in…