On the power of linear programming for K-means clustering

· 2024 · math.OC · arXiv 2402.01061

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

In [SIAM J. Optim., 2022], the authors introduced a new linear programming (LP) relaxation for K-means clustering. In this paper, we further investigate both theoretical and computational properties of this relaxation. As evident from our numerical experiments with both synthetic real-world data sets, the proposed LP relaxation is almost always tight; i.e. its optimal solution is feasible for the original nonconvex problem. To better understand this unexpected behaviour, on the theoretical side, we focus on K-means clustering with two clusters, and we obtain sufficient conditions under which the LP relaxation is tight. We further analyze the sufficient conditions when the input is generated according to a popular stochastic model and obtain recovery guarantees for the LP relaxation. We conclude our theoretical study by constructing a family of inputs for which the LP relaxation is never tight. Denoting by $n$ the number of data points to be clustered, the LP relaxation contains $\Omega(n^3)$ inequalities making it impractical for large data sets. To address the scalability issue, by building upon a cutting-plane algorithm together with the GPU implementation of PDLP, a first-order method LP solver, we develop an efficient algorithm that solves the proposed LP and hence the K-means clustering problem, for up to $n \leq 4000$ data points.

representative citing papers

Accessible Complexity Bounds for Restarted PDHG on Linear Programs with a Unique Optimizer

math.OC · 2024-10-05 · unverdicted · novelty 7.0

Derives accessible O(κΦ ln(κΦ ||w*||/ε)) iteration bound for rPDHG on unique-optima LPs, with computable Φ, two-stage performance, and equivalence to stability and sharpness.

From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

A modular CPU-GPU batching framework for branch-and-bound delivers 10-100x speedups with zero optimality gap when certifying optimal cardinality-constrained GLMs.

citing papers explorer

Showing 2 of 2 citing papers.

Accessible Complexity Bounds for Restarted PDHG on Linear Programs with a Unique Optimizer math.OC · 2024-10-05 · unverdicted · none · ref 17 · internal anchor
Derives accessible O(κΦ ln(κΦ ||w*||/ε)) iteration bound for rPDHG on unique-optima LPs, with computable Φ, two-stage performance, and equivalence to stability and sharpness.
From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs cs.LG · 2026-05-21 · unverdicted · none · ref 18 · internal anchor
A modular CPU-GPU batching framework for branch-and-bound delivers 10-100x speedups with zero optimality gap when certifying optimal cardinality-constrained GLMs.

On the power of linear programming for K-means clustering

fields

years

verdicts

representative citing papers

citing papers explorer