pith. sign in

arxiv: 1607.01179 · v2 · pith:RHKYIEBFnew · submitted 2016-07-05 · 📊 stat.ME

Robust clustering tools based on optimal transportation

classification 📊 stat.ME
keywords clusteringdatabarycentersdistributionsprobabilitiesresultsrobustspace
0
0 comments X
read the original abstract

A robust clustering method for probabilities in Wasserstein space is introduced. This new "trimmed $k$-barycenters" approach relies on recent results on barycenters in Wasserstein space that allow intensive computation, as required by clustering algorithms. The possibility of trimming the most discrepant distributions results in a gain in stability and robustness, highly convenient in this setting. As a remarkable application we consider a parallelized estimation setup in which each of $m$ units processes a portion of the data, producing an estimate of $k$-features, encoded as $k$ probabilities. We prove that the trimmed $k$-barycenter of the $m\times k$ estimates produces a consistent aggregation. We illustrate the methodology with simulated and real data examples. These include clustering populations by age distributions and analysis of cytometric data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.