{"paper":{"title":"Fast and effective algorithms for fair clustering at scale","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Heuristics achieve scalable fair k-means clustering by enforcing group representation targets while minimizing sum of squared distances.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Claudio Mantuano, Manuel Kammermann, Philipp Baumann","submitted_at":"2026-05-13T16:40:07Z","abstract_excerpt":"Clustering is an unsupervised machine learning task that consists of identifying groups of similar objects. It has numerous applications and is increasingly used in fairness-sensitive domains where objects represent individuals, such as customers, employees, or students. We address a fair clustering problem in which objects belong to protected groups. The problem consists of partitioning the objects into a predefined number of clusters while attaining a user-defined target level of fairness, meaning that each protected group is sufficiently represented in each cluster. The objective is to mini"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The proposed heuristics outperform existing approaches in comprehensive numerical experiments on benchmark datasets while providing precise control over the cost-fairness trade-off and scaling to instances with millions of objects in seconds.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the fairness constraint (each protected group sufficiently represented in each cluster) can be satisfied without destroying the geometric structure that makes the clustering cost meaningful, and that the heuristics' local-search or relaxation steps do not systematically miss globally better trade-off points.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A framework plus three heuristics for fair clustering that give precise cost-fairness control and scale to millions of objects while beating existing solvers on benchmark data.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Heuristics achieve scalable fair k-means clustering by enforcing group representation targets while minimizing sum of squared distances.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"7c0daec36670a63f22ebddb38c93cf914364c54cfcb2c49e9d7b44341e59997c"},"source":{"id":"2605.13759","kind":"arxiv","version":1},"verdict":{"id":"5ec1afc7-ad26-4f19-8ceb-279a10004459","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:35:36.279847Z","strongest_claim":"The proposed heuristics outperform existing approaches in comprehensive numerical experiments on benchmark datasets while providing precise control over the cost-fairness trade-off and scaling to instances with millions of objects in seconds.","one_line_summary":"A framework plus three heuristics for fair clustering that give precise cost-fairness control and scale to millions of objects while beating existing solvers on benchmark data.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the fairness constraint (each protected group sufficiently represented in each cluster) can be satisfied without destroying the geometric structure that makes the clustering cost meaningful, and that the heuristics' local-search or relaxation steps do not systematically miss globally better trade-off points.","pith_extraction_headline":"Heuristics achieve scalable fair k-means clustering by enforcing group representation targets while minimizing sum of squared distances."},"references":{"count":57,"sample":[{"doi":"","year":2010,"title":"Applied Soft Computing , volume=","work_id":"3d64c351-cebc-4ebd-8a88-4fa7a1734972","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Journal of the Operational Research Society , volume=","work_id":"ffaa2343-746b-4fb1-aa8f-8f372152bae4","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Socio-Economic Planning Sciences , volume=","work_id":"68269116-5051-4040-9ea8-aa84ee56d659","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Expert Systems with Applications , volume=","work_id":"dac170b7-b291-4c1f-8428-d657645f8ce5","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Journal of Mathematics , volume=","work_id":"2c97a650-d737-4b13-a075-301431e0760d","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":57,"snapshot_sha256":"b0c4c8afeae8dd5f916dddc023a381a68468bf6d340ce24ca777668ff5253a77","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"ae9bccfe41e48021943834580a9dd79614e6a543acc536332c2f428538e4dadd"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}