Demystifying Information-Theoretic Clustering

Aram Galstyan; Fei Sha; Greg Ver Steeg; Simon DeDeo

arxiv: 1310.4210 · v2 · pith:XJU3ZMPDnew · submitted 2013-10-15 · 💻 cs.LG · cs.IT· math.IT· physics.data-an· stat.ML

Demystifying Information-Theoretic Clustering

Greg Ver Steeg , Aram Galstyan , Fei Sha , Simon DeDeo This is my paper

classification 💻 cs.LG cs.ITmath.ITphysics.data-anstat.ML

keywords clusteringdatainformationdefineinformation-theoretictheoryamountassumption-free

0 comments

read the original abstract

We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions. Previous attempts to use information theory to define clusters in an assumption-free way are based on maximizing mutual information between data and cluster labels. We demonstrate that this intuition suffers from a fundamental conceptual flaw that causes clustering performance to deteriorate as the amount of data increases. Instead, we return to the axiomatic foundations of information theory to define a meaningful clustering measure based on the notion of consistency under coarse-graining for finite data.

This paper has not been read by Pith yet.

Demystifying Information-Theoretic Clustering

discussion (0)