pith. sign in

arxiv: 1608.07494 · v4 · pith:5ASL24OBnew · submitted 2016-08-26 · 📊 stat.ML

Estimating the Number of Clusters via Normalized Cluster Instability

classification 📊 stat.ML
keywords clusterinstabilitynormalizedclusterscurrentinstability-basedmeasurenumber
0
0 comments X
read the original abstract

We improve current instability-based methods for the selection of the number of clusters $k$ in cluster analysis by developing a normalized cluster instability measure that corrects for the distribution of cluster sizes, a previously unaccounted driver of cluster instability. We show that our normalized instability measure outperforms current instability-based measures across the whole sequence of possible $k$ and especially overcomes limitations in the context of large $k$. We also compare, for the first time, model-based and model-free approaches to determine cluster-instability and find their performance to be comparable. We make our method available in the R-package \verb+cstab+.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.