Fast and effective algorithms for fair clustering at scale

arxiv: 2605.13759 · v1 · pith:FM357X3Ynew · submitted 2026-05-13 · 💻 cs.LG

Fast and effective algorithms for fair clustering at scale

Claudio Mantuano , Manuel Kammermann , Philipp Baumann This is my paper

Pith reviewed 2026-05-14 19:35 UTC · model grok-4.3

classification 💻 cs.LG

keywords fair clusteringk-meansheuristicsscalabilityfairness constraintsprotected groupsclustering costtrade-off control

0 comments p. Extension

pith:FM357X3Y Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{FM357X3Y}

Prints a linked pith:FM357X3Y badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Heuristics achieve scalable fair k-means clustering by enforcing group representation targets while minimizing sum of squared distances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for partitioning data into a fixed number of clusters such that each protected group meets a user-specified minimum representation in every cluster, all while keeping the total within-cluster sum of squared Euclidean distances low. Three heuristics are built on this framework: one that emphasizes solution quality and extra constraints, one that trades some quality for better speed, and one optimized purely for handling millions of objects quickly. A reader would care because clustering now appears in fairness-sensitive settings like customer segmentation or student grouping, where violating group balance can produce systematically biased outcomes and prior methods either ignored scale or lost too much clustering quality.

Core claim

We propose a general framework for fair clustering that provides precise control over the cost-fairness trade-off and introduce three heuristics based on it. The first heuristic focuses on solution quality and the flexibility to incorporate additional constraints, the second improves scalability while retaining high solution quality, and the third is designed for maximum scalability, producing solutions for instances with millions of objects in seconds. The proposed heuristics outperform existing approaches in comprehensive numerical experiments on benchmark datasets.

What carries the argument

A general fair-clustering framework whose core mechanism adds explicit representation constraints for protected groups and solves the resulting trade-off via local-search or relaxation heuristics that adjust cluster assignments to meet fairness targets while reducing sum-of-squared-distance cost.

If this is right

Users can set an exact target fairness level and obtain a clustering whose cost is competitive with unconstrained k-means.
Datasets containing millions of points become practical for fair clustering because runtimes drop to seconds.
The same framework supports additional side constraints without losing the ability to control the cost-fairness balance.
Quality and scalability can be traded off by selecting among the three heuristics depending on data size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may transfer to other partitioning tasks such as facility location or community detection when group-balance requirements are added.
In streaming or online settings the third heuristic could serve as a fast warm-start for incremental re-clustering under fairness rules.
Empirical success on Euclidean data raises the question whether similar constraint-handling ideas work for non-Euclidean distances or kernelized clustering.

Load-bearing premise

That fairness constraints requiring minimum group representation in each cluster can be satisfied without destroying the geometric structure that makes the sum of squared distances a meaningful clustering objective.

What would settle it

On a moderate-sized benchmark instance whose globally optimal fair clustering solution is known by exhaustive search, the heuristics would return assignments whose total cost exceeds the optimum by more than a few percent or that violate the stated fairness targets.

Figures

Figures reproduced from arXiv: 2605.13759 by Claudio Mantuano, Manuel Kammermann, Philipp Baumann.

**Figure 2.** Figure 2: Decomposition scheme used by the proposed heuristics. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Applying MPFC to the illustrative example: (a) initialization of cluster centers; (b) assignment of [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Flow network used in each stage l > 1 of the assignment step of MS-FlowFC. stages l ′ = 1, . . . , l − 1. For each previously assigned protected group l ′ , a lower bound can be computed by multiplying the target balance Bs(X, λ) by the number of objects from this protected group assigned to center j, namely |Cj ∩ Gl ′s|. The largest of these lower bounds is the one that must be imposed for protected group… view at source ↗

**Figure 5.** Figure 5: Applying MS-FlowFC to the illustrative example: (a) initialization of cluster centers and assignment of [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Applying S-MPFC to the illustrative example: (a) generation of [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Boxplots of clustering costs for solutions obtained with different random seeds on the [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Cost-fairness trade-off of O&C (a), MPFC (b), and MS-FlowFC (c) on the [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

read the original abstract

Clustering is an unsupervised machine learning task that consists of identifying groups of similar objects. It has numerous applications and is increasingly used in fairness-sensitive domains where objects represent individuals, such as customers, employees, or students. We address a fair clustering problem in which objects belong to protected groups. The problem consists of partitioning the objects into a predefined number of clusters while attaining a user-defined target level of fairness, meaning that each protected group is sufficiently represented in each cluster. The objective is to minimize the clustering cost, defined as the sum of squared Euclidean distances between the objects and the centers of their clusters. Since clustering cost and fairness are generally in conflict, managing the trade-off between them is essential in practical applications. Existing methods provide limited control over this trade-off and either fail to scale to large datasets or, when they scale, produce low-quality solutions. We propose a general framework for fair clustering that provides precise control over the cost-fairness trade-off and introduce three heuristics based on it. The first heuristic focuses on solution quality and the flexibility to incorporate additional constraints, the second improves scalability while retaining high solution quality, and the third is designed for maximum scalability, producing solutions for instances with millions of objects in seconds. The proposed heuristics outperform existing approaches in comprehensive numerical experiments on benchmark datasets. The source code of our heuristics and instructions for reproducing the experiments are publicly available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical tunable framework and three heuristics for fair k-means that scale to millions of points, but the outperformance claims rest on experiments without enough statistical or theoretical backing.

read the letter

The main takeaway is that this paper presents a new framework for fair clustering that gives users direct control over the fairness level while minimizing the standard k-means objective, along with three heuristics tailored for quality, balance, and maximum scalability. The max-scale heuristic stands out for handling very large instances quickly. On the positive side, the approach is grounded in the usual clustering cost and an explicit fairness target, avoiding any circular fitting. The heuristics are presented as original engineering solutions rather than simple extensions, and the public GitHub code is a real plus for reproducibility. If the numerical experiments confirm better performance on benchmarks, this could fill a gap for applications needing both fairness and scale, like grouping large customer datasets without violating group representation rules. The soft spots center on the strength of the performance claims. Without approximation guarantees, it's unclear how reliably the heuristics find good trade-off points, especially when the fairness constraint might pull against the geometric structure of the data. The stress-test note is on point here: the experiments are called comprehensive, but details like statistical tests, exact fairness metric definitions, and checks against exact solvers on small problems are missing from the abstract, leaving room for the wins to be artifacts of the tested cases or local search limitations. The interaction between the repair steps and the squared distance objective isn't bounded, so systematic misses on certain distributions remain possible. This work is for practitioners and applied researchers who need implementable fair clustering methods that run on big data. A reader looking for ready-to-use algorithms with code will find it useful, while pure theorists might want more analysis. I would send this to peer review. The engineering focus and reproducibility make it worth a referee's time, even with expected questions on the experimental setup.

Referee Report

2 major / 1 minor

Summary. The paper introduces a general framework for fair k-means clustering that enforces a user-specified minimum representation of protected groups in each cluster while minimizing the sum of squared Euclidean distances to cluster centers. It proposes three heuristics (quality-focused local search, a scalable variant, and a maximum-scalability version) that are claimed to provide precise control over the cost-fairness trade-off, outperform prior methods on benchmark datasets, and scale to instances with millions of points in seconds, with publicly available code.

Significance. If the reported empirical dominance and scalability hold under rigorous validation, the framework and open-source implementation would offer a practical advance for deploying clustering in fairness-sensitive applications such as customer segmentation or educational grouping, where explicit trade-off control is needed.

major comments (2)

[Experiments] The central claim of outperformance rests on the numerical experiments, yet the manuscript provides no statistical significance tests, no comparison against exact solvers on small instances, and no Pareto-front coverage metrics; this is load-bearing because the heuristics lack approximation guarantees and their local-search/repair steps have no proven bound on deviation from globally optimal trade-offs.
[§4] The exact definition of the fairness metric, data-exclusion rules, and interaction between the fairness relaxation/repair steps and the squared-Euclidean objective are not fully specified, preventing verification that reported wins are not artifacts of favorable instance selection or local-optima bias.

minor comments (1)

[Framework] Notation for the fairness target parameter and the precise form of the relaxed objective could be clarified with an explicit equation in the framework section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to incorporate the suggested improvements for stronger empirical validation and clarity.

read point-by-point responses

Referee: [Experiments] The central claim of outperformance rests on the numerical experiments, yet the manuscript provides no statistical significance tests, no comparison against exact solvers on small instances, and no Pareto-front coverage metrics; this is load-bearing because the heuristics lack approximation guarantees and their local-search/repair steps have no proven bound on deviation from globally optimal trade-offs.

Authors: We agree that statistical significance tests, comparisons to exact solvers on small instances, and Pareto-front coverage metrics would strengthen the presentation. In the revised manuscript we will add Wilcoxon signed-rank tests (or paired t-tests where appropriate) across repeated runs and datasets to assess significance of the reported improvements. We will also include a new subsection with experiments on small instances (n ≤ 500) solved to optimality via an ILP formulation using a commercial solver, reporting optimality gaps for our heuristics. Finally, we will report Pareto-front coverage metrics such as the fraction of the cost-fairness curve dominated by each method. While the heuristics are not accompanied by approximation guarantees, the new experiments will provide direct evidence of their practical performance relative to optima on small cases and will be discussed explicitly in the text. revision: yes
Referee: [§4] The exact definition of the fairness metric, data-exclusion rules, and interaction between the fairness relaxation/repair steps and the squared-Euclidean objective are not fully specified, preventing verification that reported wins are not artifacts of favorable instance selection or local-optima bias.

Authors: We apologize for the insufficient detail in §4. In the revision we will provide a complete mathematical definition of the fairness metric (including the precise minimum-representation thresholds per group and cluster), explicitly list all data-exclusion or preprocessing rules applied to the benchmark datasets, and add a formal description (with pseudocode) of how the relaxation and repair steps interact with the squared-Euclidean objective. These additions will enable independent verification and will clarify that the reported results are not driven by instance selection or local-optima artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and heuristics are defined independently of results

full rationale

The paper introduces a fair clustering framework parameterized by an explicit user-supplied fairness target and the standard k-means objective (sum of squared Euclidean distances). The three heuristics are constructed via local-search, relaxation, and repair steps whose definitions do not presuppose the claimed performance outcomes. No equation or claim reduces a derived quantity to a fitted parameter or self-citation by construction. Self-citations, if present, are not load-bearing for the central algorithmic definitions or the empirical comparison. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the standard Euclidean k-means objective and a user-specified fairness target; no new free parameters are introduced beyond the usual cluster count k and the fairness level. No invented entities or non-standard axioms are required.

axioms (2)

domain assumption Euclidean distance is an appropriate measure of dissimilarity for the objects being clustered.
Invoked when defining the clustering cost as sum of squared Euclidean distances.
domain assumption The fairness requirement can be expressed as a set of linear or convex constraints on cluster membership proportions.
Required for the framework to admit efficient heuristics.

pith-pipeline@v0.9.0 · 5545 in / 1467 out tokens · 36138 ms · 2026-05-14T19:35:36.279847+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The assignment step is formulated as a binary linear program (BLP) that includes fairness constraints... controlled by a single tolerance parameter
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

multi-stage minimum-cost flow-based fair clustering (MS-FlowFC)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 2 internal anchors

[1]

Applied Soft Computing , volume=

Credit rating by hybrid machine learning techniques , author=. Applied Soft Computing , volume=. 2010 , publisher=

work page 2010
[2]

Journal of the Operational Research Society , volume=

Identification of credit risk based on cluster analysis of account behaviours , author=. Journal of the Operational Research Society , volume=. 2020 , publisher=

work page 2020
[3]

Socio-Economic Planning Sciences , volume=

Cluster Analysis for mixed data: An application to credit risk evaluation , author=. Socio-Economic Planning Sciences , volume=. 2021 , publisher=

work page 2021
[4]

Expert Systems with Applications , volume=

Credit risk evaluation using clustering based fuzzy classification method , author=. Expert Systems with Applications , volume=. 2023 , publisher=

work page 2023
[5]

Journal of Mathematics , volume=

Optimization of Human Resource Performance Management System Based on Improved R-Means Clustering Algorithm , author=. Journal of Mathematics , volume=. 2022 , publisher=

work page 2022
[6]

International Journal of Computer Applications , volume=

Applicability of clustering and classification algorithms for recruitment data mining , author=. International Journal of Computer Applications , volume=. 2010 , doi=

work page 2010
[7]

International Journal of Computer Applications , volume=

Cluster based ranking index for enhancing recruitment process using text mining and machine learning , author=. International Journal of Computer Applications , volume=. 2017 , publisher=

work page 2017
[8]

Applied Sciences , volume=

Clustering analysis for classifying student academic performance in higher education , author=. Applied Sciences , volume=. 2022 , publisher=

work page 2022
[9]

Educational Data Science: Essentials, Approaches, and Tendencies: Proactive Education based on Empirical Big Data Evidence , pages=

A review of clustering models in educational data science toward fairness-aware learning , author=. Educational Data Science: Essentials, Approaches, and Tendencies: Proactive Education based on Empirical Big Data Evidence , pages=. 2023 , publisher=

work page 2023
[10]

Clustering of students admission data using k-means, hierarchical, and

Cahapin, Erwin Lanceta and Malabag, Beverly Ambagan and Santiago Jr, Cereneo Sailog and Reyes, Jocelyn L and Legaspi, Gemma S and Adrales, Karl Louise , journal=. Clustering of students admission data using k-means, hierarchical, and. 2023 , doi=

work page 2023
[11]

arXiv preprint arXiv:2407.11199 , year=

Algorithms for college admissions decision support: Impacts of policy change and inherent variability , author=. arXiv preprint arXiv:2407.11199 , year=

work page arXiv
[12]

IEEE Access , volume=

An overview of fairness in clustering , author=. IEEE Access , volume=. 2021 , publisher=

work page 2021
[13]

Advances in Neural Information Processing Systems , volume=

Fair clustering through fairlets , author=. Advances in Neural Information Processing Systems , volume=. 2017 , doi=

work page 2017
[14]

Advances in Neural Information Processing Systems , volume=

Fair clustering under a bounded cost , author=. Advances in Neural Information Processing Systems , volume=. 2021 , doi=

work page 2021
[15]

arXiv preprint arXiv:1910.05113 , year=

Fairness in clustering with multiple sensitive attributes , author=. arXiv preprint arXiv:1910.05113 , year=

work page arXiv 1910
[16]

European Journal of Operational Research , volume=

A mixed-integer programming approach to the clustering problem with an application in customer segmentation , author=. European Journal of Operational Research , volume=. 2006 , publisher=

work page 2006
[17]

European Journal of Operational Research , volume=

Heuristic search to the capacitated clustering problem , author=. European Journal of Operational Research , volume=. 2019 , publisher=

work page 2019
[18]

An effective

Fleszar, Krzysztof and Hindi, Khalil S , journal=. An effective. 2008 , publisher=

work page 2008
[19]

Computational Optimization and Applications , pages=

A mathematical programming approach to hierarchical clustering , author=. Computational Optimization and Applications , pages=. 2026 , publisher=

work page 2026
[20]

Handbook of Big Data Analytics and Forensics , pages=

Evaluating performance of scalable fair clustering machine learning techniques in detecting cyber attacks in industrial control systems , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022
[21]

Handbook of Big Data Analytics and Forensics , pages=

Evaluation of scalable fair clustering machine learning methods for threat hunting in cyber-physical systems , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022
[22]

Handbook of Big Data Analytics and Forensics , pages=

Scalable fair clustering algorithm for Internet of Things malware classification , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022
[23]

Handbook of Big Data Analytics and Forensics , pages=

Security of industrial cyberspace: Fair clustering with linear time approximation , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022
[24]

International Conference on Machine Learning , pages=

Scalable fair clustering , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[25]

International Workshop on Approximation and Online Algorithms , pages=

Fair coresets and streaming algorithms for fair k-means , author=. International Workshop on Approximation and Online Algorithms , pages=. 2019 , organization=

work page 2019
[26]

Advances in Neural Information Processing Systems , volume=

Coresets for clustering with fairness constraints , author=. Advances in Neural Information Processing Systems , volume=. 2019 , doi=

work page 2019
[27]

On coresets for fair clustering in metric and

Bandyapadhyay, Sayan and Fomin, Fedor V and Simonov, Kirill , journal=. On coresets for fair clustering in metric and. 2024 , publisher=

work page 2024
[28]

Privacy preserving clustering with constraints

Privacy preserving clustering with constraints , author=. arXiv preprint arXiv:1802.02497 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

On the cost of essentially fair clusterings

On the cost of essentially fair clusterings , author=. arXiv preprint arXiv:1811.10319 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Advances in Neural Information Processing Systems , volume=

Fair algorithms for clustering , author=. Advances in Neural Information Processing Systems , volume=. 2019 , doi=

work page 2019
[31]

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

Clustering without over-representation , author=. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=. 2019 , doi=

work page 2019
[32]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Variational fair clustering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2021 , doi=

work page 2021
[33]

International Conference on Learning and Intelligent Optimization , pages=

A stochastic alternating balance k-means algorithm for fair clustering , author=. International Conference on Learning and Intelligent Optimization , pages=. 2022 , organization=

work page 2022
[34]

INFORMS Journal on Data Science , volume=

An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets , author=. INFORMS Journal on Data Science , volume=. 2024 , publisher=

work page 2024
[35]

Advances in Neural Information Processing Systems , volume=

The Fairness-Quality Tradeoff in Clustering , author=. Advances in Neural Information Processing Systems , volume=. 2024 , doi=

work page 2024
[36]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Towards fairer centroids in k-means clustering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2024 , doi=

work page 2024
[37]

arXiv preprint arXiv:2104.12116 , year=

Fair-capacitated clustering , author=. arXiv preprint arXiv:2104.12116 , year=

work page arXiv
[38]

Tran, Vanessa and Kammermann, Manuel and Baumann, Philipp , booktitle=. The. 2023 , organization=

work page 2023
[39]

Least squares quantization in

Lloyd, Stuart , journal=. Least squares quantization in. 1982 , publisher=

work page 1982
[40]

arXiv preprint arXiv:2409.02963 , year=

Fair minimum representation clustering via integer programming , author=. arXiv preprint arXiv:2409.02963 , year=

work page arXiv
[41]

Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms , volume=

k-means++: The advantages of careful seeding , author=. Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms , volume=. 2007 , doi=

work page 2007
[42]

2022 , publisher=

Piccialli, Veronica and Sudoso, Antonio M and Wiegele, Angelika , journal=. 2022 , publisher=

work page 2022
[43]

2025 , date=

Laurent Perron and Vincent Furnon , organization=. 2025 , date=

work page 2025
[44]

Kelly, Markelle and Longjohn, Rachel and Nottingham, Kolby , year=

work page
[45]

A Dataset to Support Research in the Design of Secure Water Treatment Systems , doi=

Goh, Jonathan and Adepu, Sridhar and Junejo, Khurum and Mathur, Aditya , year=. A Dataset to Support Research in the Design of Secure Water Treatment Systems , doi=

work page
[46]

and Rita, P

Moro, S. and Rita, P. and Cortez, P. , title=. 2014 , howpublished=

work page 2014
[47]

1996 , howpublished=

Becker, Barry and Kohavi, Ronny , title=. 1996 , howpublished=

work page 1996
[48]

1994 , howpublished=

Kahn, Michael , title=. 1994 , howpublished=

work page 1994
[49]

2009 , howpublished=

Yeh, I-Cheng , title=. 2009 , howpublished=

work page 2009
[50]

2001 , howpublished=

Meek, Chris and Thiesson, Bo and Heckerman, David , title=. 2001 , howpublished=

work page 2001
[51]

Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

Certifying and removing disparate impact , author=. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=. 2015 , doi=

work page 2015
[52]

2009 , publisher=

Aloise, Daniel and Deshpande, Amit and Hansen, Pierre and Popat, Preyas , journal=. 2009 , publisher=

work page 2009
[53]

INFORMS Journal on Computing , volume=

An algorithm for clustering with confidence-based must-link and cannot-link constraints , author=. INFORMS Journal on Computing , volume=. 2025 , publisher=

work page 2025
[54]

2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) , pages=

A binary linear programming-based k-means approach for the capacitated centered clustering problem , author=. 2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) , pages=. 2019 , organization=

work page 2019
[55]

International Conference on Machine Learning , pages=

Fair k-center clustering for data summarization , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[56]

2020 IEEE international conference on industrial engineering and engineering management (IEEM) , pages=

A binary linear programming-based k-means algorithm for clustering with must-link and cannot-link constraints , author=. 2020 IEEE international conference on industrial engineering and engineering management (IEEM) , pages=. 2020 , organization=

work page 2020
[57]

Douze, Matthijs and Guzhva, Alexandr and Deng, Chengqi and Johnson, Jeff and Szilvasy, Gergely and Mazaré, Pierre-Emmanuel and Lomeli, Maria and Hosseini, Lucas and Jégou, Hervé , journal=. The. 2026 , volume=

work page 2026

[1] [1]

Applied Soft Computing , volume=

Credit rating by hybrid machine learning techniques , author=. Applied Soft Computing , volume=. 2010 , publisher=

work page 2010

[2] [2]

Journal of the Operational Research Society , volume=

Identification of credit risk based on cluster analysis of account behaviours , author=. Journal of the Operational Research Society , volume=. 2020 , publisher=

work page 2020

[3] [3]

Socio-Economic Planning Sciences , volume=

Cluster Analysis for mixed data: An application to credit risk evaluation , author=. Socio-Economic Planning Sciences , volume=. 2021 , publisher=

work page 2021

[4] [4]

Expert Systems with Applications , volume=

Credit risk evaluation using clustering based fuzzy classification method , author=. Expert Systems with Applications , volume=. 2023 , publisher=

work page 2023

[5] [5]

Journal of Mathematics , volume=

Optimization of Human Resource Performance Management System Based on Improved R-Means Clustering Algorithm , author=. Journal of Mathematics , volume=. 2022 , publisher=

work page 2022

[6] [6]

International Journal of Computer Applications , volume=

Applicability of clustering and classification algorithms for recruitment data mining , author=. International Journal of Computer Applications , volume=. 2010 , doi=

work page 2010

[7] [7]

International Journal of Computer Applications , volume=

Cluster based ranking index for enhancing recruitment process using text mining and machine learning , author=. International Journal of Computer Applications , volume=. 2017 , publisher=

work page 2017

[8] [8]

Applied Sciences , volume=

Clustering analysis for classifying student academic performance in higher education , author=. Applied Sciences , volume=. 2022 , publisher=

work page 2022

[9] [9]

Educational Data Science: Essentials, Approaches, and Tendencies: Proactive Education based on Empirical Big Data Evidence , pages=

A review of clustering models in educational data science toward fairness-aware learning , author=. Educational Data Science: Essentials, Approaches, and Tendencies: Proactive Education based on Empirical Big Data Evidence , pages=. 2023 , publisher=

work page 2023

[10] [10]

Clustering of students admission data using k-means, hierarchical, and

Cahapin, Erwin Lanceta and Malabag, Beverly Ambagan and Santiago Jr, Cereneo Sailog and Reyes, Jocelyn L and Legaspi, Gemma S and Adrales, Karl Louise , journal=. Clustering of students admission data using k-means, hierarchical, and. 2023 , doi=

work page 2023

[11] [11]

arXiv preprint arXiv:2407.11199 , year=

Algorithms for college admissions decision support: Impacts of policy change and inherent variability , author=. arXiv preprint arXiv:2407.11199 , year=

work page arXiv

[12] [12]

IEEE Access , volume=

An overview of fairness in clustering , author=. IEEE Access , volume=. 2021 , publisher=

work page 2021

[13] [13]

Advances in Neural Information Processing Systems , volume=

Fair clustering through fairlets , author=. Advances in Neural Information Processing Systems , volume=. 2017 , doi=

work page 2017

[14] [14]

Advances in Neural Information Processing Systems , volume=

Fair clustering under a bounded cost , author=. Advances in Neural Information Processing Systems , volume=. 2021 , doi=

work page 2021

[15] [15]

arXiv preprint arXiv:1910.05113 , year=

Fairness in clustering with multiple sensitive attributes , author=. arXiv preprint arXiv:1910.05113 , year=

work page arXiv 1910

[16] [16]

European Journal of Operational Research , volume=

A mixed-integer programming approach to the clustering problem with an application in customer segmentation , author=. European Journal of Operational Research , volume=. 2006 , publisher=

work page 2006

[17] [17]

European Journal of Operational Research , volume=

Heuristic search to the capacitated clustering problem , author=. European Journal of Operational Research , volume=. 2019 , publisher=

work page 2019

[18] [18]

An effective

Fleszar, Krzysztof and Hindi, Khalil S , journal=. An effective. 2008 , publisher=

work page 2008

[19] [19]

Computational Optimization and Applications , pages=

A mathematical programming approach to hierarchical clustering , author=. Computational Optimization and Applications , pages=. 2026 , publisher=

work page 2026

[20] [20]

Handbook of Big Data Analytics and Forensics , pages=

Evaluating performance of scalable fair clustering machine learning techniques in detecting cyber attacks in industrial control systems , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022

[21] [21]

Handbook of Big Data Analytics and Forensics , pages=

Evaluation of scalable fair clustering machine learning methods for threat hunting in cyber-physical systems , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022

[22] [22]

Handbook of Big Data Analytics and Forensics , pages=

Scalable fair clustering algorithm for Internet of Things malware classification , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022

[23] [23]

Handbook of Big Data Analytics and Forensics , pages=

Security of industrial cyberspace: Fair clustering with linear time approximation , author=. Handbook of Big Data Analytics and Forensics , pages=. 2022 , publisher=

work page 2022

[24] [24]

International Conference on Machine Learning , pages=

Scalable fair clustering , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019

[25] [25]

International Workshop on Approximation and Online Algorithms , pages=

Fair coresets and streaming algorithms for fair k-means , author=. International Workshop on Approximation and Online Algorithms , pages=. 2019 , organization=

work page 2019

[26] [26]

Advances in Neural Information Processing Systems , volume=

Coresets for clustering with fairness constraints , author=. Advances in Neural Information Processing Systems , volume=. 2019 , doi=

work page 2019

[27] [27]

On coresets for fair clustering in metric and

Bandyapadhyay, Sayan and Fomin, Fedor V and Simonov, Kirill , journal=. On coresets for fair clustering in metric and. 2024 , publisher=

work page 2024

[28] [28]

Privacy preserving clustering with constraints

Privacy preserving clustering with constraints , author=. arXiv preprint arXiv:1802.02497 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

On the cost of essentially fair clusterings

On the cost of essentially fair clusterings , author=. arXiv preprint arXiv:1811.10319 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Advances in Neural Information Processing Systems , volume=

Fair algorithms for clustering , author=. Advances in Neural Information Processing Systems , volume=. 2019 , doi=

work page 2019

[31] [31]

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

Clustering without over-representation , author=. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=. 2019 , doi=

work page 2019

[32] [32]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Variational fair clustering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2021 , doi=

work page 2021

[33] [33]

International Conference on Learning and Intelligent Optimization , pages=

A stochastic alternating balance k-means algorithm for fair clustering , author=. International Conference on Learning and Intelligent Optimization , pages=. 2022 , organization=

work page 2022

[34] [34]

INFORMS Journal on Data Science , volume=

An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets , author=. INFORMS Journal on Data Science , volume=. 2024 , publisher=

work page 2024

[35] [35]

Advances in Neural Information Processing Systems , volume=

The Fairness-Quality Tradeoff in Clustering , author=. Advances in Neural Information Processing Systems , volume=. 2024 , doi=

work page 2024

[36] [36]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Towards fairer centroids in k-means clustering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2024 , doi=

work page 2024

[37] [37]

arXiv preprint arXiv:2104.12116 , year=

Fair-capacitated clustering , author=. arXiv preprint arXiv:2104.12116 , year=

work page arXiv

[38] [38]

Tran, Vanessa and Kammermann, Manuel and Baumann, Philipp , booktitle=. The. 2023 , organization=

work page 2023

[39] [39]

Least squares quantization in

Lloyd, Stuart , journal=. Least squares quantization in. 1982 , publisher=

work page 1982

[40] [40]

arXiv preprint arXiv:2409.02963 , year=

Fair minimum representation clustering via integer programming , author=. arXiv preprint arXiv:2409.02963 , year=

work page arXiv

[41] [41]

Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms , volume=

k-means++: The advantages of careful seeding , author=. Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms , volume=. 2007 , doi=

work page 2007

[42] [42]

2022 , publisher=

Piccialli, Veronica and Sudoso, Antonio M and Wiegele, Angelika , journal=. 2022 , publisher=

work page 2022

[43] [43]

2025 , date=

Laurent Perron and Vincent Furnon , organization=. 2025 , date=

work page 2025

[44] [44]

Kelly, Markelle and Longjohn, Rachel and Nottingham, Kolby , year=

work page

[45] [45]

A Dataset to Support Research in the Design of Secure Water Treatment Systems , doi=

Goh, Jonathan and Adepu, Sridhar and Junejo, Khurum and Mathur, Aditya , year=. A Dataset to Support Research in the Design of Secure Water Treatment Systems , doi=

work page

[46] [46]

and Rita, P

Moro, S. and Rita, P. and Cortez, P. , title=. 2014 , howpublished=

work page 2014

[47] [47]

1996 , howpublished=

Becker, Barry and Kohavi, Ronny , title=. 1996 , howpublished=

work page 1996

[48] [48]

1994 , howpublished=

Kahn, Michael , title=. 1994 , howpublished=

work page 1994

[49] [49]

2009 , howpublished=

Yeh, I-Cheng , title=. 2009 , howpublished=

work page 2009

[50] [50]

2001 , howpublished=

Meek, Chris and Thiesson, Bo and Heckerman, David , title=. 2001 , howpublished=

work page 2001

[51] [51]

Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

Certifying and removing disparate impact , author=. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=. 2015 , doi=

work page 2015

[52] [52]

2009 , publisher=

Aloise, Daniel and Deshpande, Amit and Hansen, Pierre and Popat, Preyas , journal=. 2009 , publisher=

work page 2009

[53] [53]

INFORMS Journal on Computing , volume=

An algorithm for clustering with confidence-based must-link and cannot-link constraints , author=. INFORMS Journal on Computing , volume=. 2025 , publisher=

work page 2025

[54] [54]

2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) , pages=

A binary linear programming-based k-means approach for the capacitated centered clustering problem , author=. 2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) , pages=. 2019 , organization=

work page 2019

[55] [55]

International Conference on Machine Learning , pages=

Fair k-center clustering for data summarization , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019

[56] [56]

2020 IEEE international conference on industrial engineering and engineering management (IEEM) , pages=

A binary linear programming-based k-means algorithm for clustering with must-link and cannot-link constraints , author=. 2020 IEEE international conference on industrial engineering and engineering management (IEEM) , pages=. 2020 , organization=

work page 2020

[57] [57]

Douze, Matthijs and Guzhva, Alexandr and Deng, Chengqi and Johnson, Jeff and Szilvasy, Gergely and Mazaré, Pierre-Emmanuel and Lomeli, Maria and Hosseini, Lucas and Jégou, Hervé , journal=. The. 2026 , volume=

work page 2026