FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies
Pith reviewed 2026-06-26 19:14 UTC · model grok-4.3
The pith
FOSC-X extracts the top-M globally optimal flat clusterings from a hierarchy in linear time, with or without limits on cluster count.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FOSC-X solves the top-M optimal local-cut problem on clustering hierarchies both with and without a cardinality constraint on the number of clusters. It does so by dynamic programming that, in the constrained case, maintains compact sets of feasible partial candidates bounded by feasibility intervals and discards combinations that cannot lead to a feasible global solution, thereby preserving optimality while achieving linear time.
What carries the argument
Dynamic programming over the cluster tree that maintains compact sets of feasible candidates with lower and upper feasibility bounds to combine local optima into globally optimal feasible solutions.
If this is right
- Users obtain an ordered list of the M best flat clusterings rather than a single optimum.
- The same linear-time procedure works whether or not a hard limit on the number of clusters is imposed.
- Alternative clusterings that reflect different structural aspects of the data become directly accessible from one hierarchy.
- The method scales linearly with both tree size and data-set size, enabling routine use on large hierarchies.
Where Pith is reading between the lines
- The same bounded-candidate dynamic-programming pattern could be reused for other tree-structured selection problems that mix optimality with cardinality constraints.
- In domains where the 'right' number of clusters is application-dependent, the top-M list supplies a ready menu of alternatives for downstream validation.
- Empirical checks on hierarchies whose depth or branching factor is extreme would test whether the linear bound remains tight in practice.
Load-bearing premise
Any hierarchical cluster tree admits a dynamic-programming decomposition in which locally optimal partial candidates, once filtered by feasibility bounds, can be recombined without ever missing a better globally optimal feasible solution.
What would settle it
A concrete hierarchy and cluster-count constraint for which the algorithm's reported top-M list omits at least one feasible solution whose objective value is strictly better than the M-th entry on that list.
read the original abstract
Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FOSC-X, an extension of prior FOSC work, for extracting the top-M globally optimal flat clusterings from non-horizontal cuts in a hierarchical cluster tree. It formulates the task as an optimization problem solved via dynamic programming. Without cluster-count constraints the top-M problem is solved in polynomial time by combining locally optimal partial candidates; with constraints the optimality property breaks, so the method maintains compact sets of feasible candidates via lower/upper feasibility bounds and prunes infeasible or dominated combinations, claiming to restore both optimality and linear-time complexity in the number of cluster nodes and dataset size. Experiments illustrate that the approach reveals alternative clusterings missed by single-solution extractors.
Significance. If the pruning argument is shown to preserve all globally optimal top-M solutions without discarding feasible combinations, the result would be significant for cluster analysis: it supplies an efficient, constraint-aware method to surface multiple high-quality alternative flat partitions from a single hierarchy, directly addressing a practical limitation of existing single-solution approaches. The linear-time claim (with and without constraints) would be a strong technical contribution if the set sizes remain bounded as asserted.
major comments (1)
- [Abstract / dynamic programming strategy for constrained case] Abstract and DP strategy description: the claim that 'maintaining compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations' restores the global optimality guarantee for top-M rankings (both with and without cluster-count constraints) is load-bearing for the central result. No explicit definition of domination for top-M ranking, no proof that every feasible global optimum has a surviving path through the maintained sets, and no argument establishing that set sizes remain O(1) per node (required for true linearity) are supplied. If any optimal combination is discarded by the bounds or pruning rule, both the optimality and linear-time claims fail.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for highlighting the need for greater explicitness around the optimality and complexity arguments in the constrained case. We address the major comment point-by-point below and will strengthen the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract / dynamic programming strategy for constrained case] Abstract and DP strategy description: the claim that 'maintaining compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations' restores the global optimality guarantee for top-M rankings (both with and without cluster-count constraints) is load-bearing for the central result. No explicit definition of domination for top-M ranking, no proof that every feasible global optimum has a surviving path through the maintained sets, and no argument establishing that set sizes remain O(1) per node (required for true linearity) are supplied. If any optimal combination is discarded by the bounds or pruning rule, both the optimality and linear-time claims fail.
Authors: We agree that the abstract is necessarily concise and that the DP strategy description would benefit from more explicit formal statements. In the full manuscript, domination for top-M is defined (Section 3.3) as a partial candidate A dominating B when the objective of A is at least as good as B and the feasibility interval of A strictly contains that of B for the remaining subtree; the maintained sets are the non-dominated feasible partial solutions. The optimality argument proceeds by induction on the tree: at each node the union of combinations from child sets is filtered by the bounds, and any globally optimal top-M solution must have a non-dominated prefix at every subtree (otherwise a better or equal alternative would exist, contradicting optimality). Pruning therefore cannot discard a path to a global optimum. Set-size boundedness follows from the fact that only the top-M candidates per feasible cluster-count interval are retained and the number of distinct intervals is bounded by a small constant (dependent on M but independent of n); this yields O(n) total time. Nevertheless, to make these arguments fully self-contained and address the referee's concern directly, we will insert a dedicated subsection containing the formal definition of domination, the complete induction proof, and the explicit O(1) set-size bound in the revised version. revision: yes
Circularity Check
No circularity: standard DP optimality on trees with explicit feasibility pruning
full rationale
The paper's derivation chain rests on a dynamic-programming decomposition of the hierarchy that maintains compact feasible-candidate sets via lower/upper bounds and domination pruning. This is presented as an algorithmic extension of the known unconstrained optimality property (locally optimal partial solutions combine to global optima) to the constrained case. No parameter is fitted and then renamed as a prediction, no result is defined in terms of itself, and no load-bearing step reduces to a self-citation or ansatz smuggled from prior work by the same authors. The central guarantee is therefore an independent algorithmic claim rather than a tautology.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A hierarchical cluster tree admits a dynamic-programming decomposition where locally optimal partial candidates can be combined into globally optimal solutions.
Reference graph
Works this paper leans on
-
[1]
Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011 . Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)
-
[2]
Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop. UTLW’11, vol. 27, pp. 65–79. JMLR.org, Washington, USA (2011).https://dl.acm.org/doi/10.5555/3045796.3045803
-
[3]
Philosophical Aspects of Pattern Recognition
Hennig, C.: What are the true clusters? Pattern Recognition Letters64, 53– 62 (2015) https://doi.org/10.1016/j.patrec.2015.04.009 . Philosophical Aspects of Pattern Recognition
-
[4]
Prentice-Hall, Inc., USA (1988)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., USA (1988)
1988
-
[5]
Arnold, London (2001)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)
2001
-
[6]
Wiley Series in Probability and Statistics
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Clus- ter Analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, USA (1990). https://doi.org/10.1002/9780470316801
-
[7]
In: Advances in Knowledge Discovery and Data Mining (PAKDD 2013)
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 160–172. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2 14
-
[8]
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data10(1) (2015) https://doi.org/10.1145/2733381
-
[9]
Engineering Applications of Artificial Intelligence , volume =
Ezugwu, A.E., Ikotun, A.M., Oyelade, O.O., Abualigah, L., Agushaka, J.O., Eke, C.I., Akinyelu, A.A.: A comprehensive survey of clustering algorithms: State-of- the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence110, 104743 (2022) https://doi.org/10.1016/j.engappai....
-
[10]
Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3
Ran, X., Xi, Y., Lu, Y., Wang, X., Lu, Z.: Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3
-
[11]
30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology
Segal, E., Koller, D.: Probabilistic hierarchical clustering for biological data. 30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology. RECOMB ’02, pp. 273–280. Association for Computing Machin- ery, New York, NY, USA (2002). https://doi.org/10.1145/565196.565232 . https://doi.org/10.1145/565196.565232
-
[12]
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences95(25), 14863–14868 (1998) https://doi.org/10.1073/pnas.95.25.14863 https://www.pnas.org/doi/pdf/10.1073/pnas.95.25.14863
-
[13]
BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6
Han, W., Zhang, S., Gao, H., Bu, D.: Clustering on hierarchical heterogeneous data with prior pairwise relationships. BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6
-
[14]
Nielsen, F.: Hierarchical Clustering, pp. 195–211. Springer, Cham (2016). https: //doi.org/10.1007/978-3-319-21903-5 8
-
[15]
Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245
-
[16]
(eds.): Handbook of Cluster Analysis
Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.): Handbook of Cluster Analysis. Chapman and Hall/CRC, New York (2015). https://doi.org/10.1201/ b19706
2015
-
[17]
Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi- supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4
2013
-
[18]
ai/ml/2017/05/12/cut-a-dendrogram.html (2017)
Marti, G.: How and Where Should You Cut a Dendrogram? https://www.marti. ai/ml/2017/05/12/cut-a-dendrogram.html (2017)
2017
-
[19]
Ge, J., Tibshirani, R.: Optimal pruning of hierarchical clustering dendrograms. Communications in Statistics - Theory and Methods 55(7), 2105–2128 (2026) https://doi.org/10.1080/03610926.2025.2543191 https://doi.org/10.1080/03610926.2025.2543191
-
[20]
https://arxiv.org/abs/2512.08741
Boucherie, L., Ahn, Y.-Y., Lehmann, S.: Adaptive cut reveals multiscale com- plexity in networks (2025). https://arxiv.org/abs/2512.08741
arXiv 2025
-
[21]
Awasthi, P., Blum, A., Sheffet, O.: Center-based clustering under perturbation stability. Inf. Process. Lett.112(1–2), 49–54 (2012) https://doi.org/10.1016/j.ipl. 2011.10.006
-
[22]
Routledge, New York (2017)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Routledge, New York (2017). https://doi.org/10.1201/ 31 9781315139470
2017
-
[23]
Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining: The ASA Data Science Journal3(4), 209–235 (2010) https://doi.org/10.1002/sam.10080
-
[24]
Simpson, C., Campello, R.J.G.B., Stojanovski, E.: Benchmarking of cluster- ing validity measures revisited. Statistical Analysis and Data Mining: An ASA Data Science Journal19(1), 70061 (2026) https://doi.org/10.1002/sam.70061 https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.70061
-
[25]
Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x
Simpson, C., Mu˜ noz, M.A., Campello, R.J.G.B.: Instance space of clustering val- idation measures. Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x
-
[26]
In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z
Anjos, F.d.A.R., Gertrudes, J.C., Sander, J., Campello, R.J.G.B.: A modularity- based measure for cluster selection from clustering hierarchies. In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z. (eds.) Data Mining, pp. 253–265. Springer, Singapore (2019). https://doi.org/10.1007/ 978-981-13-6661-1 20
2019
-
[27]
Master’s thesis, University of Alberta, Edmonton, Canada (2025)
Thompson, J.M.: Partition-free cluster evaluation: Extending cluster validation and cluster extraction from hierarchies. Master’s thesis, University of Alberta, Edmonton, Canada (2025). https://doi.org/10.7939/81776 . https://ualberta. scholaris.ca/items/7740b0c9-91dd-4517-88e8-2a57eed4899b
-
[28]
Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10
Castro Gertrudes, J., Zimek, A., Sander, J., Campello, R.J.G.B.: A unified view of density-based methods for semi-supervised clustering and classification. Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10. 1007/s10618-019-00651-1
1952
-
[29]
http://cs.uef.fi/sipu/datasets/
Fr¨ anti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets (2018). http://cs.uef.fi/sipu/datasets/
2018
-
[30]
Journal of computational and applied mathematics , volume=
Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics20, 53–65 (1987) https://doi.org/10.1016/0377-0427(87)90125-7
-
[31]
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and cor- rection for chance. J. Mach. Learn. Res.11, 2837–2854 (2010) https://dl.acm.org/doi/10.5555/1756006.1953024
-
[32]
Journal of Machine Learning Research12, 2825–2830 (2011) 32
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011) 32
2011
-
[33]
PerMetrics: A Framework of Performance Metrics forMachine Learning Models
McInnes, L., Healy, J., Astels, S.: hdbscan: Hierarchical density based clustering. The Journal of Open Source Software2(11) (2017) https://doi.org/10.21105/joss. 00205
-
[34]
Demˇ sar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.7, 1–30 (2006) https://dl.acm.org/doi/10.5555/1248547.1248548 33 A Complexity Analysis A.1 Complexity underk min–kmax constraints We analyse the worst-case time complexity of FOSC-X under the cluster-count con- straintk min ≤ |z Ci| ≤k max. The analysis is co...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.