Recognition: 1 theorem link
· Lean TheoremDrawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data
Pith reviewed 2026-05-11 00:49 UTC · model grok-4.3
The pith
K-means clustering can generate stable, coherent groups even when the data has no true subgroup structure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
K-means clustering partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In a sequence of controlled simulated datasets drawn from continuous Gaussian latent spaces and in the SMARVUS survey of university student responses across 35 countries, this partitioning produces stable and visually coherent solutions even though no true subgroup structure is present.
What carries the argument
K-means partitioning of space by minimizing within-cluster sums of squared distances to centroids, which imposes compact spherical geometry regardless of the underlying distribution.
Load-bearing premise
The chosen simulated datasets and the international survey responses are representative enough of typical psychometric data for the observed clustering behavior to hold more generally.
What would settle it
Replicating the Gaussian simulation in the same dimensionality as the survey items, running K-means with varied random starts, and finding that the resulting partitions are unstable across runs or lack visual coherence in low-dimensional projections.
read the original abstract
K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets. We then extend the analysis to the SMARVUS dataset, a large international psychometric dataset comprising survey responses from university students across 35 countries, to evaluate whether similar geometric partitioning patterns emerge in empirical psychological data. By contrasting simulated and empirical data, this paper argues that K-means can produce stable and visually coherent clustering solutions even in continuous Gaussian latent spaces without true subgroup structure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper demonstrates through controlled simulations that K-means clustering can produce stable and visually coherent partitions in data drawn from a single multivariate Gaussian distribution without any latent subgroup structure. It then applies the same clustering approach to the SMARVUS dataset of psychometric survey responses from university students in 35 countries, observing similar patterns, and concludes that K-means may impose apparent structure on continuous psychological data.
Significance. This provides a clear cautionary tale for the use of clustering methods in psychological research. By showing that the method can yield interpretable-looking clusters even when none exist by construction in the simulations, and that analogous results appear in real data, the paper underscores the need for caution in interpreting K-means results as evidence of psychological typologies. The controlled nature of the simulations is a strength, offering a direct counterexample to the assumption that coherent clusters imply discrete categories.
major comments (2)
- §3 (Simulations): The simulation setup is central to the existence claim, but the manuscript does not provide sufficient detail on the data generation process, such as the specific covariance structure, number of dimensions, or sample sizes used in the Gaussian simulations. Without these, it is difficult to assess how general the finding is or to replicate the exact conditions under which stable clusters emerge.
- §4 (SMARVUS application): There is no quantitative assessment of cluster stability (e.g., using metrics like the adjusted Rand index across multiple runs or bootstrap resampling). The reliance on visual inspection alone makes the claim of 'stable' solutions less rigorous, especially since the central argument hinges on stability in the absence of true structure.
minor comments (3)
- The abstract could more explicitly state the number of clusters tested or the criteria used for choosing K.
- Figure legends should include more detail on what the axes represent and any preprocessing applied to the data points.
- A brief discussion of alternative clustering methods (e.g., Gaussian mixture models) that might better handle continuous data would contextualize the findings.
Simulated Author's Rebuttal
We thank the referee for their supportive summary and for identifying areas where additional detail and rigor would strengthen the manuscript. We address each major comment below and will incorporate revisions to improve reproducibility and quantitative support for our claims.
read point-by-point responses
-
Referee: §3 (Simulations): The simulation setup is central to the existence claim, but the manuscript does not provide sufficient detail on the data generation process, such as the specific covariance structure, number of dimensions, or sample sizes used in the Gaussian simulations. Without these, it is difficult to assess how general the finding is or to replicate the exact conditions under which stable clusters emerge.
Authors: We agree that the simulation details require expansion for reproducibility and to clarify the scope of the findings. In the revised version we will add explicit descriptions of the covariance matrices (identity or low-variance diagonal structures), the dimensionality (5–10 variables chosen to mimic typical psychometric scales), and the sample sizes (n = 500–2000 per replication). We will also deposit the exact data-generation code in a public repository and reference it in the text. revision: yes
-
Referee: §4 (SMARVUS application): There is no quantitative assessment of cluster stability (e.g., using metrics like the adjusted Rand index across multiple runs or bootstrap resampling). The reliance on visual inspection alone makes the claim of 'stable' solutions less rigorous, especially since the central argument hinges on stability in the absence of true structure.
Authors: We accept that quantitative stability metrics would make the argument more rigorous. While the original manuscript relied on repeated visual inspection of partitions and silhouette widths, the revision will include adjusted Rand indices computed across 50 random initializations of K-means for both the simulated and SMARVUS data, together with a brief bootstrap resampling exercise. These additions will be presented alongside the existing figures without changing the substantive conclusion. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is an empirical demonstration rather than a derivation: it generates data from a single multivariate Gaussian (by construction, no latent subgroups), applies K-means, and shows that stable partitions still appear; it then applies the same procedure to the SMARVUS survey responses. No equations, fitted parameters, or predictions are claimed; the central existence claim follows directly from the controlled simulation design and the real-data application. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing steps. The analysis is therefore self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearK-means can produce stable and visually coherent clustering solutions even in continuous Gaussian latent spaces without true subgroup structure.
Reference graph
Works this paper leans on
-
[1]
L., Heinrichs, N., Kim, H.-J., & Hofmann, S
Baker, S. L., Heinrichs, N., Kim, H.-J., & Hofmann, S. G. (2002). The Liebowitz Social Anxiety Scale as a self-report instrument: A preliminary psychometric analysis. Behaviour Research and Therapy, 40 (6), 701–715. DOI: 10.1016/s0005-7967(01)00060-2 Benson, J., & El-Zahhar, N. (1994). Further refinement and validation of the Revised Test Anxiety Scale. S...
-
[2]
DOI: 10.1038/s41398-026-03832-x Daker, R. J., Cortes, R. A., Lyons, I. M., & Green, A. E. (2020). Creativity anxiety: Evidence for anxiety that is specific to creative thinking, from STEM to the arts. Journal of Experimental Psychology: General, 149 (1), 42–57. DOI: 10.1037/xge0000630. de la Fuente-Tomás, L., Arranz, B., Safont, G., Sierra, P., Sánchez-Au...
-
[3]
DOI: 10.1186/s12888-021-03648-7 Lötsch, J., & Ultsch, A. (2024). Comparative assessment of projection and clustering method combinations in the analysis of biomedical data. Informatics in Medicine Unlocked, 50 , 101573. DOI: 10.1016/j.imu.2024.101573 Orchard C., Lin E., Rosella L., Smith P. M. (2024). Using unsupervised clustering approaches to identify c...
-
[4]
DOI: 10.5334/jopd.80 Kwak, S. G., & Kim, J. H. (2017). Central limit theorem: the cornerstone of modern statistics. Korean J Anesthesiol. 2017 Apr;70(2):144-156. DOI: 10.4097/kjae.2017.70.2.144
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.