Recognition: unknown
Coloring for dispersion: A polynomial-time algorithm for cardinality-constrained 2-anticlustering
Pith reviewed 2026-05-08 01:41 UTC · model grok-4.3
The pith
The 2-anticlustering problem with cardinality constraints can be solved exactly in polynomial time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that 2-MDCC can be solved in polynomial time. It achieves this by showing the problem reduces to a quadratic number of 2-COLCC instances. These are transformed into a restricted class of subset sum instances where the input values are bounded, making the pseudopolynomial dynamic programming algorithm run in polynomial time.
What carries the argument
The reduction of 2-MDCC to a quadratic number of cardinality-constrained 2-coloring instances, each converted to bounded subset sum problems solved by dynamic programming.
If this is right
- 2-MDCC belongs to the class of problems solvable in polynomial time.
- The algorithm processes instances with up to 10,000 items in less than one second.
- The method outperforms prior integer linear programming formulations by several orders of magnitude.
- Exact solutions become feasible for anticlustering tasks with two groups on large data sets.
Where Pith is reading between the lines
- The bounded-input property of the derived subset sum instances might allow similar polynomial-time results for related partition problems that involve dispersion or heterogeneity measures.
- Extending the reduction technique to k greater than 2 could be tested by checking whether the number of coloring instances remains manageable.
- The practical speed for n=10,000 suggests the approach could integrate into real-time group assignment tools in data analysis pipelines beyond the experiments shown.
Load-bearing premise
The subset sum instances produced by the transformation have input values that are sufficiently bounded for the dynamic programming to achieve polynomial runtime.
What would settle it
An instance of 2-MDCC for which the corresponding 2-COLCC instances produce subset sum problems whose dynamic programming solution exceeds polynomial time bounds or misses the optimal dispersion value.
read the original abstract
The $k$-Maximum Dispersion Problem with Cardinality Constraints ($k$-MDCC) asks for a partition of a given item set with pairwise dissimilarities into $k$ cardinality-constrained groups such that the minimum pairwise intra-group dissimilarity, which is also known as the dispersion, is maximized. The problem arises in the context of anticlustering, where the goal is to create maximally heterogeneous groups of items with applications in psychological research, bioinformatics, and data science. It is known that $k$-MDCC is NP-hard for $k \geq 3$ but it has been an open question whether it can be solved in polynomial time for $k = 2$. We give a positive answer to this question by showing that $2$-MDCC can be solved by a quadratic number of cardinality-constrained 2-coloring problem instances ($2$-COLCC). We solve these instances by transforming them into a restricted class of subset sum instances. Although subset sum is NP-complete in general, for this restricted class the input values are bounded, ensuring that the pseudopolynomial dynamic programming algorithm runs in polynomial time. As a consequence, we obtain a polynomial-time algorithm for $2$-MDCC. We demonstrate that a publicly available open-source implementation of our new algorithm outperforms the previous integer linear programming solution by several orders of magnitude so that even large datasets ($n = 10{,}000$) can be processed in less than a second.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims a polynomial-time algorithm for the cardinality-constrained 2-anticlustering problem (2-MDCC): it reduces 2-MDCC to O(n²) instances of the cardinality-constrained 2-coloring problem (2-COLCC) by enumerating candidate dispersion thresholds from the dissimilarity matrix, then transforms each 2-COLCC instance into a restricted subset-sum instance whose values are bounded by n (component-size differences), which is solvable in polynomial time via the standard pseudopolynomial DP. An open-source implementation is reported to solve instances with n=10,000 in under a second, outperforming prior ILP formulations by orders of magnitude.
Significance. If correct, the result closes the complexity status of 2-MDCC (NP-hard for k≥3, now in P for k=2) and supplies the first practical polynomial-time method for this anticlustering task. The explicit reduction to bounded subset sum and the accompanying open-source code constitute reproducible, falsifiable contributions that directly enable large-scale applications in psychology, bioinformatics, and data science.
minor comments (3)
- [§3] §3 (Reduction to 2-COLCC): the enumeration over possible dispersion thresholds is stated to produce O(n²) instances, but the precise mapping from matrix entries to candidate thresholds and the handling of ties or duplicate values should be stated explicitly to allow independent verification of the quadratic bound.
- [§4] §4 (Subset-sum transformation): the claim that all subset-sum values are bounded by n is central to the polynomial-time guarantee; a short lemma or paragraph confirming that the constructed values are exactly the differences |S1|−|S2| for feasible color classes would strengthen the argument.
- [Experiments] The experimental section reports runtimes but does not include a direct comparison of solution quality (dispersion values) against the ILP baseline on the same instances; adding a small table of objective values would confirm correctness of the new algorithm.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our contribution, as well as for the favorable assessment of its significance. The recommendation of minor revision is noted. No specific major comments or requested changes were provided in the report.
Circularity Check
No significant circularity
full rationale
The paper's derivation chain consists of explicit algorithmic reductions: 2-MDCC is reduced to a quadratic number of 2-COLCC instances by enumerating dispersion thresholds, and each 2-COLCC instance is transformed into a bounded subset-sum problem whose values are at most n. The polynomial-time claim then follows directly from the standard pseudopolynomial DP for subset sum, which runs in polynomial time under the bounded-input restriction. This is a self-contained reduction to independently known results with no self-definitional steps, no fitted parameters renamed as predictions, and no load-bearing self-citations or ansatzes. The central claim therefore does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption k-MDCC is NP-hard for k >= 3
- standard math Subset sum admits a pseudopolynomial-time DP algorithm
Forward citations
Cited by 1 Pith paper
-
An Optimal Algorithm for Cardinality-Constrained Diameter Partitioning
An O(n^2) optimal algorithm for cardinality-constrained diameter partitioning via reduction to bottleneck 2-coloring and tree DP on the maximum spanning tree, with a matching lower bound.
Reference graph
Works this paper leans on
-
[1]
2 K. Bringmann, R. Keusch, and J. Lengler. Geometric inhomogeneous random graphs.Theo- retical Computer Science, 760:35–54, 2019.doi:10.1016/j.tcs.2018.08.014. 3 P. Brucker. On the complexity of clustering problems. In R. Henn, B. Korte, and W. Oettli, editors,Optimization and Operations Research, Lecture Notes in Economics and Mathematical Systems, page ...
-
[2]
URL: https://www.sciencedirect.com/science/article/pii/ S0305048312001818,doi:10.1016/j.omega.2012.09.005. 6 M. Papenberg, M. Breuer, M. Diekhoff, N. K. Tran, and G. W Klau. Extending the Bicriterion Approach for Anticlustering: Exact and Hybrid Approaches.Psychometrika, 90(5):1789–1808, October 2025.doi:10.1017/psy.2025.10052. 7 H. Späth. Anticlustering:...
-
[3]
8 V. Valev. Set partition principles. InTransactions of the Ninth Prague Conference on Information Theory, Statistical Decision Functions, and Random Processes, page 251, 1983
1983
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.