arxiv: 2604.07744 · v1 · submitted 2026-04-09 · 📊 stat.ML · cs.LG· econ.EM· math.ST· stat.TH

Recognition: 3 theorem links

· Lean Theorem

The Condition-Number Principle for Prototype Clustering

Romano Li , Jianfei Cao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:27 UTC · model grok-4.3

classification 📊 stat.ML cs.LGecon.EMmath.STstat.TH

keywords clustering condition numberprototype clusteringmisclassification errorgeometric frameworksuboptimality gapphase transitionsexact recoverycluster boundaries

0 comments

The pith

A small clustering condition number ensures near-optimal prototypes recover the true partition with low misclassification error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a geometric framework that ties the accuracy of prototype clustering objectives to the recovery of an underlying partition structure. It introduces a clustering condition number measuring the ratio of within-cluster scale to the loss penalty of crossing a boundary, and shows that when this number is small any solution with small objective suboptimality must also have small misclassification error relative to a benchmark. This matters because the guarantees are deterministic and non-asymptotic, separating the intrinsic geometric difficulty of the data from the quality of the algorithm that minimizes the loss. The analysis further reveals how different objectives trade robustness against sensitivity to cluster imbalance and produces sharp phase transitions for exact recovery.

Core claim

We define a clustering condition number that compares within-cluster scale to the minimum loss increase required to move a point across a cluster boundary. When this quantity is small, any solution with a small suboptimality gap must also have a small misclassification error relative to a benchmark partition. The framework clarifies a fundamental trade-off between robustness and sensitivity to cluster imbalance, leading to sharp phase transitions for exact recovery under different objectives. The guarantees are deterministic and non-asymptotic, and they separate the role of algorithmic accuracy from the intrinsic geometric difficulty of the instance. We further show that errors concentrate近界

What carries the argument

The clustering condition number, the ratio of within-cluster scale to the smallest loss increase needed to cross a cluster boundary, which bounds misclassification error in terms of objective suboptimality.

If this is right

Any clustering solution with small objective gap must have small misclassification error when the condition number is small.
Different admissible losses produce distinct phase transitions between robust and imbalance-sensitive exact recovery.
Misclassification errors localize near cluster boundaries while deep cores are recovered exactly under local margin conditions.
The geometric difficulty of an instance can be quantified independently of the solver used to optimize the objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The condition number could be computed on real data sets to decide whether low objective values are trustworthy indicators of structure.
Algorithm design might focus on reducing the condition number itself rather than only chasing smaller objective values.
The same geometric comparison might be adapted to other partitioning methods whose losses admit a scale-versus-boundary interpretation.

Load-bearing premise

The loss functions belong to a broad class of admissible losses for which the geometric comparison between within-cluster scale and boundary loss increase is well-defined and meaningful.

What would settle it

Find a data set with small clustering condition number together with a near-optimal assignment that nevertheless has large misclassification error relative to the benchmark partition.

read the original abstract

We develop a geometric framework that links objective accuracy to structural recovery in prototype-based clustering. The analysis is algorithm-agnostic and applies to a broad class of admissible loss functions. We define a clustering condition number that compares within-cluster scale to the minimum loss increase required to move a point across a cluster boundary. When this quantity is small, any solution with a small suboptimality gap must also have a small misclassification error relative to a benchmark partition. The framework also clarifies a fundamental trade-off between robustness and sensitivity to cluster imbalance, leading to sharp phase transitions for exact recovery under different objectives. The guarantees are deterministic and non-asymptotic, and they separate the role of algorithmic accuracy from the intrinsic geometric difficulty of the instance. We further show that errors concentrate near cluster boundaries and that sufficiently deep cluster cores are recovered exactly under strengthened local margins. Together, these results provide a geometric principle for interpreting low objective values as reliable evidence of meaningful clustering structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a clustering condition number that deterministically ties small objective gaps to small misclassification error for prototype methods when the number is low.

read the letter

The main thing to know is that this work defines a clustering condition number comparing within-cluster scale to the smallest loss increase needed to cross a boundary. When that number is small, any solution with a small suboptimality gap must have small misclassification error relative to a benchmark partition. The framework is algorithm-agnostic and covers a class of admissible losses with deterministic non-asymptotic bounds plus phase transitions for exact recovery.

Referee Report

2 major / 3 minor

Summary. The manuscript develops a geometric framework for prototype-based clustering that is algorithm-agnostic and applies to a broad class of admissible loss functions. It introduces a clustering condition number that compares within-cluster scale to the minimum loss increase required to move a point across a cluster boundary. The central claim is that a small value of this quantity ensures any solution with a small suboptimality gap also has small misclassification error relative to a benchmark partition. Additional results include deterministic non-asymptotic guarantees, phase transitions for exact recovery under varying objectives, error concentration near boundaries, and exact recovery of sufficiently deep cluster cores under strengthened local margins.

Significance. If the guarantees hold, the work supplies a principled geometric principle for interpreting low objective values as evidence of meaningful structure, cleanly separating algorithmic accuracy from intrinsic instance difficulty. The deterministic, non-asymptotic, and algorithm-agnostic character, together with the explicit treatment of admissible losses and the robustness-imbalance trade-off, is a genuine strength that could inform both theoretical analysis and practical validation of clustering outputs.

major comments (2)

[Definition of clustering condition number (early sections)] The definition and properties of the clustering condition number (introduced early, prior to the main theorems) must be shown to be independent of any fitted quantities or self-referential choices; the geometric comparison between within-cluster scale and boundary loss increase should be verified to remain well-defined and non-circular for the stated class of admissible losses.
[Phase transition results] The phase-transition claims for exact recovery (appearing after the main implication theorem) require an explicit derivation showing how the condition number produces the stated thresholds under imbalance; without this step the trade-off between robustness and sensitivity remains stated rather than derived.

minor comments (3)

[Introduction / Preliminaries] The precise definition of 'admissible loss functions' should be stated formally at the outset rather than referenced only in the abstract and later sections.
[Throughout] Notation for the benchmark partition and misclassification error should be introduced once and used consistently; occasional shifts in symbols hinder readability.
[Preliminaries] The manuscript would benefit from a short table or diagram summarizing the admissible losses and the corresponding geometric quantities they induce.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive and constructive report. The comments help clarify key aspects of the framework, and we address them point by point below. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Definition of clustering condition number (early sections)] The definition and properties of the clustering condition number (introduced early, prior to the main theorems) must be shown to be independent of any fitted quantities or self-referential choices; the geometric comparison between within-cluster scale and boundary loss increase should be verified to remain well-defined and non-circular for the stated class of admissible losses.

Authors: The clustering condition number is defined using only the input data geometry, the admissible loss function, and a fixed benchmark partition that serves as the reference for measuring misclassification error. The within-cluster scale is the maximum intra-cluster dispersion computed directly from the points in each benchmark cluster, while the boundary loss increase is the infimum loss increment obtained by reassigning a single point across the boundary (with prototypes held at their benchmark values or adjusted by a minimal admissible shift). Because the benchmark partition is an arbitrary but fixed reference and is never obtained from the optimization procedure itself, the definition contains no dependence on fitted prototypes or algorithmic outputs. To make this independence fully explicit and to verify non-circularity for the full class of admissible losses, we will insert a short dedicated paragraph immediately after the definition, including a brief argument that the quantities remain finite and well-defined whenever the loss satisfies the stated admissibility conditions. revision: yes
Referee: [Phase transition results] The phase-transition claims for exact recovery (appearing after the main implication theorem) require an explicit derivation showing how the condition number produces the stated thresholds under imbalance; without this step the trade-off between robustness and sensitivity remains stated rather than derived.

Authors: We agree that an explicit derivation of the phase-transition thresholds would strengthen the presentation. The main implication theorem already bounds misclassification error by a multiple of the suboptimality gap divided by the condition number; the phase transitions then follow by requiring this error bound to be strictly less than the minimum cluster size (adjusted for imbalance). In the revision we will add a concise derivation, placed either in the main text after the implication theorem or in a short appendix, that starts from the error bound, incorporates the imbalance factor explicitly, and obtains the critical thresholds for exact recovery under both balanced and imbalanced regimes for the admissible objectives considered. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation introduces a clustering condition number as a geometric ratio of within-cluster scale to minimum boundary-crossing loss increase for admissible losses. All subsequent claims (small condition number implies small suboptimality yields small misclassification error, phase transitions, exact core recovery) are logical consequences of this definition and the stated geometric comparisons, without any reduction to fitted parameters, self-referential equations, or load-bearing self-citations. The framework is self-contained as a deterministic geometric principle.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the introduction of the clustering condition number and the assumption that loss functions are admissible; these are new constructs defined within the paper rather than drawn from external benchmarks.

axioms (1)

domain assumption Loss functions belong to a broad class of admissible loss functions
Abstract states the analysis applies to this class without further specification of what makes a loss admissible.

invented entities (1)

clustering condition number no independent evidence
purpose: Compares within-cluster scale to the minimum loss increase required to move a point across a cluster boundary
New quantity defined to link objective accuracy to structural recovery.

pith-pipeline@v0.9.0 · 5462 in / 1324 out tokens · 74068 ms · 2026-05-10T18:27:34.538771+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel; Jcost_pos_of_ne_one; cost_alpha_one_eq_jcost matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

κ(g, γ, D_eff) := g(D_eff)/Δg(γ;D_eff) ... p(Ĉ,C*) ≲ κ·(δ+δ_approx) + displacement term (Corollary 3.5, Theorem 3.4)
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection; RCLCombiner_isCoupling_iff echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Phase transitions: Δ/D > 2 + 2/√c_b (k-means) vs. 2 + 1/c_b (linear loss); exact recovery when κ_means < c_b/4 or κ_med < c_b (Theorem 4.1)
IndisputableMonolith/Foundation/Atomicity.lean atomic_tick; sequential_preserves_conservation refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

Core(s) = {i : d(x_i,θ*_j) ≤ D_eff − s}; enhanced margin γ + 2s; zero-error cores when local κ(s)·δ < 1/n (Lemma 5.1, Proposition 5.2)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 28 canonical work pages

[1]

and MAKARYCHEV, Y

ANGELIDAKIS, H., MAKARYCHEV, K. and MAKARYCHEV, Y. (2017). Algorithms for stable and perturbation-resilient problems. InProceedings of the 49th annual ACM SIGACT symposium on theory of computing438–451. ACM. https://doi.org/10.1145/3055399.3055487

work page doi:10.1145/3055399.3055487 2017
[2]

and GYORGY, A

ANTOS, A., GYORFI, L. and GYORGY, A. (2005). Individual convergence rates in empirical vector quan- tizer design.514013–4022. https://doi.org/10.1109/TIT.2005.856976

work page doi:10.1109/tit.2005.856976 2005
[3]

and VASSILVITSKII, S

ARTHUR, D. and VASSILVITSKII, S. (2006). k-means++: the advantages of careful seeding

2006
[4]

and SHEFFET, O

AWASTHI, P., BLUM, A. and SHEFFET, O. (2010). Stability yields a PTAS for k-median and k-means clustering. In2010 IEEE 51st annual symposium on foundations of computer science309–318. ISSN: 0272-5428. https://doi.org/10.1109/FOCS.2010.36

work page doi:10.1109/focs.2010.36 2010
[5]

and GUPTA, A

BALCAN, M.-F., BLUM, A. and GUPTA, A. (2009). Approximate clustering without the approxima- tion. InProceedings of the 2009 annual ACM-SIAM symposium on discrete algorithms (SODA). Proceedings1068–1077. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/ 1.9781611973068.116

2009
[6]

and GUPTA, A

BALCAN, M.-F., BLUM, A. and GUPTA, A. (2013). Clustering under approximation stability.J. ACM60 8:1–8:34. https://doi.org/10.1145/2450142.2450144

work page doi:10.1145/2450142.2450144 2013
[7]

and VEMPALA, S

BALCAN, M.-F., BLUM, A. and VEMPALA, S. (2008). A discriminative framework for clustering via simi- larity functions. InProceedings of the fortieth annual ACM symposium on theory of computing.STOC ’08671–680. Association for Computing Machinery, New York, NY , USA. https://doi.org/10.1145/ 1374376.1374474

work page arXiv 2008
[8]

BALCAN, M. F. and LIANG, Y. (2016). Clustering under perturbation resilience. arXiv:1112.0826 [cs]. https://doi.org/10.48550/arXiv.1112.0826

work page doi:10.48550/arxiv.1112.0826 2016
[9]

BEN-DAVID, S. (2018). Clustering - what both theoreticians and practitioners are doing wrong.Proceedings of the AAAI Conference on Artificial Intelligence32. https://doi.org/10.1609/aaai.v32i1.12221 42

work page doi:10.1609/aaai.v32i1.12221 2018
[10]

and PÁL, D

BEN-DAVID, S.,VONLUXBURG, U. and PÁL, D. (2006). A sober look at clustering stability. InLearning theory(G. LUGOSIand H. U. SIMON, eds.) 5–19. Springer, Berlin, Heidelberg. https://doi.org/10. 1007/11776420_4

2006
[11]

and LUGOSI, G

BIAU, G., DEVROYE, L. and LUGOSI, G. (2008). On the performance of clustering in Hilbert spaces.IEEE Transactions on Information Theory54781–790. https://doi.org/10.1109/TIT.2007.913516

work page doi:10.1109/tit.2007.913516 2008
[12]

and LINIAL, N

BILU, Y. and LINIAL, N. (2012). Are stable instances easy?21643–660. https://doi.org/10.1017/ S0963548312000193

2012
[13]

and VILLACORTA, L

CAO, J., HANSEN, C., KOZBUR, D. and VILLACORTA, L. (2025). Inference for dependent data with learned clusters.The Review of Economics and Statistics1071684–1701. https://doi.org/10.1162/ rest_a_01460

2025
[14]

CHEN, Y. T. and WITTEN, D. M. (2023). Selective inference for k-means clustering.Journal of Machine Learning Research241–41

2023
[15]

A., GORDALIZA, A

CUESTA-ALBERTOS, J. A., GORDALIZA, A. and MATRÁN, C. (1997). Trimmedk-means: an attempt to robustify quantizers.25. https://doi.org/10.1214/aos/1031833664

work page doi:10.1214/aos/1031833664 1997
[16]

DASGUPTA, S. (1999). Learning mixtures of Gaussians. In40th annual symposium on foundations of com- puter science (cat. no.99CB37039)634–644. ISSN: 0272-5428. https://doi.org/10.1109/SFFCS.1999. 814639

work page doi:10.1109/sffcs.1999 1999
[17]

and CHEN, Y

FEI, Y. and CHEN, Y. (2018). Hidden integrality of SDP relaxations for sub-Gaussian mixture models. In Proceedings of the 31st conference on learning theory1931–1965. PMLR

2018
[18]

2023 , journal =

GAO, L. L., BIEN, J. and WITTEN, D. (2024). Selective inference for hierarchical clustering.Journal of the American Statistical Association119332–342. https://doi.org/10.1080/01621459.2022.2116331

work page doi:10.1080/01621459.2022.2116331 2024
[19]

GARCÍA-ESCUDERO, L. Á. and GORDALIZA, A. (1999). Robustness properties of k means and trimmed k means.Journal of the American Statistical Association94956–969. https://doi.org/10.1080/ 01621459.1999.10474200

work page arXiv 1999
[20]

Á., GORDALIZA, A., MATRÁN, C

GARCÍA-ESCUDERO, L. Á., GORDALIZA, A., MATRÁN, C. and MAYO-ISCAR, A. (2008). A general trimming approach to robust cluster analysis.The Annals of Statistics361324–1345. https://doi.org/ 10.1214/07-AOS515

work page doi:10.1214/07-aos515 2008
[21]

HUBER, P. J. (1964). Robust estimation of a location parameter.The Annals of Mathematical Statistics35 73–101. https://doi.org/10.1214/aoms/1177703732

work page doi:10.1214/aoms/1177703732 1964
[22]

and IMAI, H

INABA, M., KATOH, N. and IMAI, H. (1994). Applications of weighted V oronoi diagrams and randomiza- tion to variance-based k-clustering: (extended abstract). InProceedings of the tenth annual symposium on computational geometry.SCG ’94332–339. Association for Computing Machinery, New York, NY , USA. https://doi.org/10.1145/177424.178042

work page doi:10.1145/177424.178042 1994
[23]

JAFFE, A. Q. (2025). Asymptotic theory of geometric and adaptive k-means clustering.The Annals of Statistics531559–1586. https://doi.org/10.1214/25-AOS2514

work page doi:10.1214/25-aos2514 2025
[24]

JAIN, A. K. and DUBES, R. C. (1988).Algorithms for clustering data. Prentice-Hall, Inc., USA

1988
[25]

and ARIAS-CASTRO, E

JIANG, H. and ARIAS-CASTRO, E. (2021). On the consistency of metric and non-metric K-medoids. In Proceedings of the 24th international conference on artificial intelligence and statistics2485–2493. PMLR

2021
[26]

Frantar, E

KANUNGO, T., MOUNT, D. M., NETANYAHU, N. S., PIATKO, C. D., SILVERMAN, R. and WU, A. Y. (2004). A local search approximation algorithm for k-means clustering.Computational Geometry28 89–112. https://doi.org/10.1016/j.comgeo.2004.03.003

work page doi:10.1016/j.comgeo.2004.03.003 2004
[27]

and ROUSSEEUW, P

KAUFMAN, L. and ROUSSEEUW, P. J. (2009).Finding groups in data: an introduction to cluster analysis. John Wiley & Sons

2009
[28]

and ZHIVOTOVSKIY, N

KLOCHKOV, Y., KROSHNIN, A. and ZHIVOTOVSKIY, N. (2021). Robust k-means clustering for distribu- tions with two moments.The Annals of Statistics492206–2230

2021
[29]

LALOË, T. (2010). L1-Quantization and clustering in Banach spaces.Mathematical Methods of Statistics 19136–150. https://doi.org/10.3103/S1066530710020031

work page doi:10.3103/s1066530710020031 2010
[30]

LEMBER, J. (2003). On minimizing sequences fork-centres.Journal of Approximation Theory12020–35. https://doi.org/10.1016/S0021-9045(02)00010-2

work page doi:10.1016/s0021-9045(02)00010-2 2003
[31]

LEUNG, M. P. (2023). Network cluster-robust inference.Econometrica91641–667. https://doi.org/10. 3982/ECTA19816

2023
[32]

and ZHOU, H

LU, Y. and ZHOU, H. H. (2016). Statistical and computational guarantees of Lloyd’s algorithm and its variants

2016
[33]

and BOUSQUET, O

LUXBURG, U.V., BELKIN, M. and BOUSQUET, O. (2008). Consistency of spectral clustering.The Annals of Statistics36555–586. https://doi.org/10.1214/009053607000000640

work page doi:10.1214/009053607000000640 2008
[34]

MACQUEEN, J. (1967). Some methods for classification and analysis of multivariate observations. InPro- ceedings of the fifth berkeley symposium on mathematical statistics and probability, volume 1: statis- tics,5.1281–298. University of California Press. THE CONDITION-NUMBER PRINCIPLE FOR PROTOTYPE CLUSTERING43

1967
[35]

and TSYBAKOV, A

MAMMEN, E. and TSYBAKOV, A. B. (1999). Smooth discrimination analysis.The Annals of Statistics27 1808–1829

1999
[36]

G., VILLAR, S

MIXON, D. G., VILLAR, S. and WARD, R. (2016). Clustering subgaussian mixtures by semidefinite pro- gramming. https://doi.org/10.48550/arXiv.1602.06612

work page doi:10.48550/arxiv.1602.06612 2016
[37]

and VALIANT, G

MOITRA, A. and VALIANT, G. (2010). Settling the polynomial learnability of mixtures of Gaussians. In 2010 IEEE 51st annual symposium on foundations of computer science93–102. ISSN: 0272-5428. https://doi.org/10.1109/FOCS.2010.15

work page doi:10.1109/focs.2010.15 2010
[38]

NDAOUD, M. (2022). Sharp optimal recovery in the two component Gaussian mixture model.50. https: //doi.org/10.1214/22-AOS2178

work page doi:10.1214/22-aos2178 2022
[39]

and WEI, Y

PENG, J. and WEI, Y. (2007). Approximating k-means-type clustering via semidefinite programming.18 186–205. https://doi.org/10.1137/050641983

work page doi:10.1137/050641983 2007
[40]

and ZANETTI, L

PENG, R., SUN, H. and ZANETTI, L. (2015). Partitioning well-clustered graphs: spectral clustering works! InProceedings of the 28th conference on learning theory1423–1455. PMLR

2015
[41]

POLLARD, D. (1981). Strong consistency of K-means clustering.The Annals of Statistics9135–140

1981
[42]

POLLARD, D. (1982). A central limit theorem fork-means clustering.The Annals of Probability10919–

1982
[43]

https://doi.org/10.1214/aop/1176993713

work page doi:10.1214/aop/1176993713
[44]

TEICHER, H. (1963). Identifiability of finite mixtures.The Annals of Mathematical Statistics341265–1269

1963
[45]

TEICHER, H. (1967). Identifiability of mixtures of product measures.The Annals of Mathematical Statistics 381300–1302. https://doi.org/10.1214/aoms/1177698805

work page doi:10.1214/aoms/1177698805 1967
[46]

TELGARSKY, M. J. and DASGUPTA, S. (2013). Moment-based uniform deviation bounds for k-means and friends. InAdvances in neural information processing systems26. Curran Associates, Inc

2013
[47]

THORPE, M., THEIL, F., JOHANSEN, A. M. and CADE, N. (2015). Convergence of thek-means min- imization problem usingΓ-convergence.SIAM Journal on Applied Mathematics752444–2474. https://doi.org/10.1137/140974365 [47]VONLUXBURG, U. (2007). A tutorial on spectral clustering.Statistics and Computing17395–416. https: //doi.org/10.1007/s11222-007-9033-z

work page doi:10.1137/140974365 2015
[48]

and BARBER, R

YUN, Y.-J. and BARBER, R. F. (2023). Selective inference for clustering with unknown variance.Electronic Journal of Statistics171923–1946. https://doi.org/10.1214/23-EJS2143

work page doi:10.1214/23-ejs2143 2023