Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees

Marek Gagolewski

arxiv: 2604.07143 · v1 · submitted 2026-04-08 · 💻 cs.LG · stat.AP· stat.ML

Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees

Marek Gagolewski This is my paper

Pith reviewed 2026-05-10 18:08 UTC · model grok-4.3

classification 💻 cs.LG stat.APstat.ML

keywords clusteringdivisive clusteringminimum spanning treemutual reachabilityrobust clusteringdensity-based clusteringHDBSCAN

0 comments

The pith

Lumbermark detects clusters of varying sizes, densities, and shapes by iteratively chopping protruding segments from a mutual reachability minimum spanning tree.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Lumbermark as a divisive clustering algorithm that begins with the mutual reachability minimum spanning tree of a dataset and repeatedly severs large limbs at their narrow connecting segments. This process is designed to recover groups that differ in size, density, and geometry while limiting the distorting effect of scattered low-density points. Mutual reachability distances are used to smooth local variations, so that noise between clusters or outliers at edges exert less pull on the cuts. The resulting partitions allow direct user control over cluster sizes, offering a practical alternative when standard density-based methods leave the number or scale of groups unspecified. Practitioners would value such a tool for obtaining reliable groupings from irregular real data without extensive retuning.

Core claim

Lumbermark is a robust divisive clustering algorithm capable of detecting clusters of varying sizes, densities, and shapes. It works by iteratively chopping off large limbs connected by protruding segments of a dataset's mutual reachability minimum spanning tree. The use of mutual reachability distances smoothens the data distribution and decreases the influence of low-density objects, such as noise points between clusters or outliers at their peripheries. The algorithm produces partitions with user-specified sizes and serves as an alternative to HDBSCAN.

What carries the argument

The mutual reachability minimum spanning tree, which the algorithm repeatedly severs at protruding segments to isolate clusters while reducing the effect of density variations and noise.

If this is right

The method separates clusters even when they differ substantially in size, density, and shape.
Low-density noise and peripheral outliers exert reduced influence on the final partition.
Users obtain direct control over the sizes of the clusters in the output.
A fast implementation exists that can serve as a drop-in alternative when HDBSCAN output sizes are not controllable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree-chopping view may connect to other graph-based clustering techniques that operate on minimum spanning trees.
The user-specified size control suggests utility in applications where balanced or constrained group sizes are required.
Benchmark performance indicates the approach could extend to domains with high-dimensional or irregularly shaped data groups.

Load-bearing premise

Mutual reachability distances smooth the data distribution enough for the chopping process to separate clusters accurately despite the presence of noise and outliers.

What would settle it

Running the algorithm on benchmark datasets with known ground-truth partitions and user-specified sizes, then checking whether the recovered clusters match the true structure in cases with added noise bridges or density gradients.

Figures

Figures reproduced from arXiv: 2604.07143 by Marek Gagolewski.

**Figure 1.** Figure 1: Three copies of an example dataset with four clusters: two horizontal segments and two Gaussian blobs connected by a point chain. Additionally, its Euclidean minimum spanning tree (left) as well as two 25-mutual reachability minimum spanning trees (centre, right) are depicted. Edge colour intensities correspond to the underlying distances with the darkest edge being the longest in each case. By increasing … view at source ↗

**Figure 2.** Figure 2: Proportion of datasets with AR ≥ 0.95 and < 0.8 and the average AR as a function of min_cluster_factor 𝑓 and smoothing parameter 𝑀 in the case of the Lumbermark algorithm on 47 datasets. Datasets with imbalanced and balanced reference cluster counts are studied separately. M. Gagolewski: Preprint; Last updated on 9 April 2026 Page 11 of 13 [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Proportion of datasets with AR ≥ 0.95 and < 0.8 and the average AR as a function of the Gini index threshold 𝐺 and smoothing parameter 𝑀 in the case of the Genie algorithm on 47 datasets. Datasets with imbalanced and balanced reference cluster counts are studied separately. M. Gagolewski: Preprint; Last updated on 9 April 2026 Page 12 of 13 [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

read the original abstract

We introduce Lumbermark, a robust divisive clustering algorithm capable of detecting clusters of varying sizes, densities, and shapes. Lumbermark iteratively chops off large limbs connected by protruding segments of a dataset's mutual reachability minimum spanning tree. The use of mutual reachability distances smoothens the data distribution and decreases the influence of low-density objects, such as noise points between clusters or outliers at their peripheries. The algorithm can be viewed as an alternative to HDBSCAN that produces partitions with user-specified sizes. A fast, easy-to-use implementation of the new method is available in the open-source 'lumbermark' package for Python and R. We show that Lumbermark performs well on benchmark data and hope it will prove useful to data scientists and practitioners across different fields.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lumbermark is a new divisive clustering algorithm that chops limbs from a mutual reachability MST to produce user-specified cluster sizes, with working code in Python and R but thin support for its main robustness claim.

read the letter

The main point is that Lumbermark takes a mutual reachability MST and iteratively cuts off large limbs to form clusters of sizes the user picks in advance. It is meant to handle varying densities and shapes better than standard approaches while acting as a size-controlled alternative to HDBSCAN. The mutual reachability step is supposed to smooth the data and reduce the pull of noise or outliers between clusters. They ship a fast implementation in both Python and R, and they report decent results on benchmark sets. That combination of a clear new procedure plus immediately usable code is the practical contribution here. The open package lets people try the method without reimplementing the MST chopping logic from scratch, which is useful for anyone who needs partitions with fixed cardinalities. The benchmarks give at least a starting point for seeing how it behaves on common data. The soft spot is the reliance on the smoothing property of mutual reachability distances. The abstract presents this as the reason the chopping works reliably, yet there is no derivation showing when the smoothing actually occurs or how it behaves under bridge noise, density gradients, or higher dimensions. If the full paper only shows standard benchmark tables without targeted ablations that isolate the chopping rule or the distance transform, then the robustness advantage stays more stated than measured. The user-specified sizes also trade off full automation for control, which narrows the scenarios where it is the obvious choice. This paper is for clustering researchers and practitioners who already work with MST or density-based methods and want a new option that returns exact sizes. A reader who needs reproducible code to test on their own data will get immediate value from the package. It is worth sending to peer review because the algorithmic idea is distinct enough and the implementation is concrete enough that referees can check the claims directly rather than just reading another abstract.

Referee Report

2 major / 2 minor

Summary. The paper introduces Lumbermark, a divisive clustering algorithm that iteratively chops large limbs from the mutual reachability minimum spanning tree (MR-MST) of a dataset to identify clusters of varying sizes, densities, and shapes. It claims that mutual reachability distances smooth the distribution and reduce the effect of noise/outliers, positioning the method as a user-specified-size alternative to HDBSCAN. An open-source Python/R implementation is provided, with claims of good performance on benchmarks.

Significance. If the robustness claims hold, Lumbermark would offer a practical, interpretable alternative for noisy clustering tasks where cluster-size control is desired, extending MST-based ideas from HDBSCAN with a chopping heuristic. The open-source package supports reproducibility and adoption by practitioners.

major comments (2)

[Abstract and §3] Abstract and §3 (Method description): The central robustness claim—that mutual reachability distances 'smoothens the data distribution and decreases the influence of low-density objects' enabling accurate limb separation—lacks any theoretical analysis, bounds, or controlled failure-mode tests (e.g., for bridge noise, density gradients, or high dimensions), leaving the key assumption unverified despite being load-bearing for the 'resistant' guarantee.
[§4] §4 (Experiments): Benchmark results are asserted without reported details on datasets, baselines (including HDBSCAN variants), metrics, or ablation tests isolating the mutual-reachability smoothing effect, so the performance claims cannot be assessed for support.

minor comments (2)

[§3] The manuscript would benefit from pseudocode or a clear algorithmic listing of the chopping procedure (limb identification and cut criteria) to aid implementation and understanding.
[§4] Figure captions and axis labels in the experimental plots should explicitly state the evaluation metrics and number of runs for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline revisions to improve clarity and support for the claims.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method description): The central robustness claim—that mutual reachability distances 'smoothens the data distribution and decreases the influence of low-density objects' enabling accurate limb separation—lacks any theoretical analysis, bounds, or controlled failure-mode tests (e.g., for bridge noise, density gradients, or high dimensions), leaving the key assumption unverified despite being load-bearing for the 'resistant' guarantee.

Authors: We agree that the manuscript provides no formal theoretical analysis or bounds on the smoothing effect of mutual reachability distances. This property is drawn from the established use in HDBSCAN, where it reduces outlier influence via core-distance augmentation. In revision, we will expand §3 with additional intuition and controlled synthetic experiments (e.g., datasets with bridge noise, varying density gradients, and moderate dimensions) to demonstrate limb separation behavior empirically. Deriving general theoretical guarantees for the full heuristic procedure is beyond the current scope and noted as future work. revision: partial
Referee: [§4] §4 (Experiments): Benchmark results are asserted without reported details on datasets, baselines (including HDBSCAN variants), metrics, or ablation tests isolating the mutual-reachability smoothing effect, so the performance claims cannot be assessed for support.

Authors: The experimental section uses standard clustering benchmarks and compares against HDBSCAN, but we acknowledge the details may not have been presented with sufficient explicitness. In the revised manuscript, we will expand §4 to include: (i) a table of all datasets with sizes and characteristics, (ii) precise baseline configurations (HDBSCAN with multiple min_cluster_size settings), (iii) the full set of metrics (adjusted Rand index, normalized mutual information, etc.), and (iv) an ablation comparing results with and without mutual reachability distances. The open-source package already enables full reproduction of the reported runs. revision: yes

Circularity Check

0 steps flagged

No circularity; algorithm is a direct procedural description built on external MST concepts

full rationale

The paper presents Lumbermark as an iterative chopping procedure on a mutual reachability MST, stating its properties (smoothing low-density points) as a direct consequence of the established mutual reachability distance definition from prior literature. No equations, parameters, or steps are defined in terms of the target outputs; no predictions are fitted then renamed; no uniqueness theorems or ansatzes are smuggled via self-citation; the central claim does not reduce to its inputs by construction. The derivation chain is self-contained as a new algorithmic recipe rather than a tautological re-expression of fitted data or prior self-references.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim relies on the effectiveness of the chopping procedure on the mutual reachability MST, which is a new application of these concepts. No new entities are invented. Assessment is limited to abstract description.

free parameters (1)

user-specified cluster sizes
The algorithm produces partitions with user-specified sizes, making the target cluster size a key input parameter.

axioms (1)

domain assumption Mutual reachability distances smoothen the data distribution and decrease the influence of low-density objects
Directly stated in the abstract as a property of the approach.

pith-pipeline@v0.9.0 · 5425 in / 1381 out tokens · 63881 ms · 2026-05-10T18:08:25.938019+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lumbermark iteratively chops off large limbs connected by protruding segments of a dataset's mutual reachability minimum spanning tree. The use of mutual reachability distances smoothens the data distribution...
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Algorithm 1. Lumbermark... Let (min_cluster_size) s = f |T'| / k ... For each edge e in T' in decreasing order of weights

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Asystematicreviewofmachinelearningmethodsinsoftware testing

Ajorloo,S.,Jamarani,A.,Kashfi,M.,HaghiKashani,M.,Najafizadeh,A.,2024. Asystematicreviewofmachinelearningmethodsinsoftware testing. Applied Soft Computing 162, 111805. DOI:10.1016/j.asoc.2024.111805

work page doi:10.1016/j.asoc.2024.111805 2024
[2]

Breunig, Hans-Peter Kriegel, and Joerg Sander

Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering points to identify the clustering structure, in: Proc. ACM Intl. Conf. Management of Data (SIGMOD), pp. 49–60. DOI:10.1145/304182.304187

work page doi:10.1145/304182.304187 1999
[3]

Communications of the ACM18(9), 509–517 (1975).https://doi.org/10.1145/ 361002.361007

Bentley, J., 1975. Multidimensional binary search trees used for associative searching. Communications of the ACM 18, 509–517. DOI:10. 1145/361002.361007

work page arXiv 1975
[4]

Foundations of Data Science

Blum, A., Hopcroft, J., Kannan, R., 2020. Foundations of Data Science. Cambridge University Press. URL:https://www.cs.cornell. edu/jeh/book.pdf

work page 2020
[5]

O jistém problému minimálním

Borůvka, O., 1926. O jistém problému minimálním. Práce Moravské Přírodovědecké Společnosti v Brně 3, 37–58

work page 1926
[6]

In: Advances in Knowledge Discovery and Data Mining

Campello, R.J., Moulavi, D., Sander, J., 2013. Density-based clustering based on hierarchical density estimates. LNAI 7819, 160–172. DOI:10.1007/978-3-642-37456-2_14

work page doi:10.1007/978-3-642-37456-2_14 2013
[7]

ACM Trans- actions on Knowledge Discovery from Data10(1), 1–51 (2015) https://doi.org/ 10.1145/2733381

Campello,R.J.,Moulavi,D.,Zimek,A.,Sander,J.,2015.Hierarchicaldensityestimatesfordataclustering,visualization,andoutlierdetection. ACM Transactions on Knowledge Discovery from Data 10, 5:1–5:51. DOI:10.1145/2733381

work page doi:10.1145/2733381 2015
[8]

Rates of convergence for the cluster tree, in: Advances in Neural Information Processing Systems, pp

Chaudhuri, K., Dasgupta, S., 2010. Rates of convergence for the cluster tree, in: Advances in Neural Information Processing Systems, pp. 343–351

work page 2010
[9]

Cluster expansions:T-walks, labeled pose=ts and matrix calculations

Dogan, A., Birant, D., 2021. Machinelearninganddatamininginmanufacturing. ExpertSystemswithApplications166. DOI:10.1016/j. eswa.2020.114060

work page doi:10.1016/j 2021
[10]

A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proc

Ester, M., Kriegel, H.P., Sander, J., Xu, X., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proc. KDD’96, pp. 226–231

work page 1996
[11]

Florek,K.,Łukaszewicz,J.,Perkal,J.,Steinhaus,H.,Zubrzycki,S.,1951.Surlaliaisonetladivisiondespointsd’unensemblefini.Colloquium Mathematicum 2, 282–285

work page 1951
[12]

Iterative shrinking method for clustering problems

Fränti, P., Virmajoki, O., 2006. Iterative shrinking method for clustering problems. Pattern Recognition 39, 761–765

work page 2006
[13]

Data-driven estimation of tbm performance in soft soils using density-based spatial clustering and random forest

Fu, X., Feng, L., Zhang, L., 2022. Data-driven estimation of tbm performance in soft soils using density-based spatial clustering and random forest. Applied Soft Computing 120, 108686. DOI:10.1016/j.asoc.2022.108686

work page doi:10.1016/j.asoc.2022.108686 2022
[14]

A framework for benchmarking clustering algorithms

Gagolewski, M., 2022. A framework for benchmarking clustering algorithms. SoftwareX 20, 101270. URL:https:// clustering-benchmarks.gagolewski.com/, DOI:10.1016/j.softx.2022.101270. M. Gagolewski:Preprint; Last updated on 9 April 2026Page 9 of 13 Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees

work page doi:10.1016/j.softx.2022.101270 2022
[15]

Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm

Gagolewski, M., Bartoszuk, M., Cena, A., 2016. Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences 363, 8–23

work page 2016
[16]

Clustering with minimum spanning trees: How good can it be? Journal of Classification 42, 90–112

Gagolewski, M., Cena, A., Bartoszuk, M., Brzozowski, L., 2025. Clustering with minimum spanning trees: How good can it be? Journal of Classification 42, 90–112. DOI:10.1007/s00357-024-09483-1

work page doi:10.1007/s00357-024-09483-1 2025
[17]

Minimum spanning trees and single linkage cluster analysis

Gower, J., Ross, G., 1969. Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society. Series C (Applied Statistics) 18, 54–64

work page 1969
[18]

Kernel-based fuzzy clustering: A comparative experimental study

Graves, D., Pedrycz, W., 2010. Kernel-based fuzzy clustering: A comparative experimental study. Fuzzy Sets and Systems 161, 522–543

work page 2010
[19]

What are the true clusters? Pattern Recognition Letters 64, 53–62

Hennig, C., 2015. What are the true clusters? Pattern Recognition Letters 64, 53–62. DOI:10.1016/j.patrec.2015.04.009

work page doi:10.1016/j.patrec.2015.04.009 2015
[20]

Comparing partitions

Hubert, L., Arabie, P., 1985. Comparing partitions. Journal of Classification 2, 193–218

work page 1985
[21]

Cluster analysis: A modern statistical review

Jaeger, A., Banks, D., 2023. Cluster analysis: A modern statistical review. Wiley Interdisciplinary Reviews: Computational Statistics 15, e1597. DOI:10.1002/wics.1597

work page doi:10.1002/wics.1597 2023
[22]

Data clustering: A user’s dilemma

Jain, A., Law, M., 2005. Data clustering: A user’s dilemma. Lecture Notes in Computer Science 3776, 1–10

work page 2005
[23]

CHAMELEON: Hierarchical clustering using dynamic modeling

Karypis, G., Han, E., Kumar, V., 1999. CHAMELEON: Hierarchical clustering using dynamic modeling. Computer 32, 68–75. DOI:10. 1109/2.781637

work page 1999
[24]

Ontheshortestspanningsubtreeofagraphandthetravelingsalesmanproblem

Kruskal,J.,1956. Ontheshortestspanningsubtreeofagraphandthetravelingsalesmanproblem. ProceedingsoftheAmericanMathematical Society 7, 48–50

work page 1956
[25]

A probability theory of cluster analysis

Ling, R.F., 1973. A probability theory of cluster analysis. Journal of the American Statistical Association 68, 159–164

work page 1973
[26]

A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint

Ma, Y., Lin, H., Wang, Y., Huang, H., He, X., 2021. A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint. Information Sciences 557, 194–219. DOI:10.1016/j.ins.2020.12.016

work page doi:10.1016/j.ins.2020.12.016 2021
[27]

It’sokaytobeskinny,ifyourfriendsarefat,in: The4thCGCWorkshoponComputationalGeometry

Maneewongvatana,S.,Mount,D.,1999. It’sokaytobeskinny,ifyourfriendsarefat,in: The4thCGCWorkshoponComputationalGeometry

work page 1999
[28]

Accelerated hierarchical density clustering, in: IEEE International Conference on Data Mining Workshops (ICDMW), pp

McInnes, L., Healy, J., 2017. Accelerated hierarchical density clustering, in: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 33–42. DOI:10.1109/ICDMW.2017.12. see the extended version athttps://arxiv.org/pdf/1705.07321

work page doi:10.1109/icdmw.2017.12 2017
[29]

URL https://doi.org/10.21105/joss.00205

McInnes, L., Healy, J., Astels, S., 2017. hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2, 205. DOI:10.21105/joss.00205

work page doi:10.21105/joss.00205 2017
[30]

Information theoretic clustering using minimum spanning trees, in: Proc

Müller, A., Nowozin, S., Lampert, C., 2012. Information theoretic clustering using minimum spanning trees, in: Proc. German Conference on Pattern Recognition.https://github.com/amueller/information-theoretic-mst

work page 2012
[31]

Parallel algorithms for hierarchical clustering

Olson, C.F., 1995. Parallel algorithms for hierarchical clustering. Parallel Computing 21, 1313–1325

work page 1995
[32]

Scikit-learn: Machine learning in Python

Pedregosa, F., et al., 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830

work page 2011
[33]

Chothia and A.M

Prim,R.C.,1957. Shortestconnectionnetworksandsomegeneralizations. BellSystemTechnicalJournal36,1389–1401. DOI:10.1002/j. 1538-7305.1957.tb01515.x

work page doi:10.1002/j 1957
[34]

The ArborX library: Version 2.0

Prokopenko, A., Arndt, D., Lebrun-Grandié, D., Turcksin, B., 2025. The ArborX library: Version 2.0. ACM Transactions on Mathematical Software 51, 30. DOI:10.1145/3772288

work page doi:10.1145/3772288 2025
[35]

Asingle-treealgorithmtocomputetheEuclideanminimumspanningtreeonGPUs

Prokopenko,A.,Sao,P.,Lebrun-Grandié,D.,2023. Asingle-treealgorithmtocomputetheEuclideanminimumspanningtreeonGPUs. Proc. ICPP’22 , 14DOI:10.1145/3545008.3546185

work page doi:10.1145/3545008.3546185 2023
[36]

Optimizing search strategies in K-d trees, in: 5th WSES/IEEE Conf

Sample, N., Haines, M., Arnold, M., Purcell, T., 2001. Optimizing search strategies in K-d trees, in: 5th WSES/IEEE Conf. on Circuits, Systems, Communications & Computers (CSCC’01)

work page 2001
[37]

Clustering benchmark datasets exploiting the fundamental clustering problems

Thrun, M., Ultsch, A., 2020. Clustering benchmark datasets exploiting the fundamental clustering problems. Data in Brief 30, 105501. DOI:10.1016/j.dib.2020.105501

work page doi:10.1016/j.dib.2020.105501 2020
[38]

A white paper on good research practices in benchmarking: The case of cluster analysis

van Mechelen, I., Boulesteix, A.L., Dangl, R., et al., 2023. A white paper on good research practices in benchmarking: The case of cluster analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , e1511DOI:10.1002/widm.1511

work page doi:10.1002/widm.1511 2023
[39]

ResTune: Resource oriented tuning boosted by meta-learning for cloud databases

Wang,Y.,Yu,S.,Gu,Y.,Shun,J.,2021. FastparallelalgorithmsforEuclideanminimumspanningtreeandhierarchicalspatialclustering,in: Proc. SIGMOD’21, pp. 1982–1995. DOI:10.1145/3448016.3457296

work page doi:10.1145/3448016.3457296 2021
[40]

Expert Systems with Applications238, 122229 (2024) https://doi

Wei,Z.,Gao,Y.,Zhang,X.,Li,X.,Han,Z.,2024. Adaptivemarinetrafficbehaviourpatternrecognitionbasedonmultidimensionaldynamic time warping and dbscan algorithm. Expert Systems with Applications 238. URL:https://www.scopus.com/inward/record.uri? eid=2-s2.0-85174700827&doi=10.1016%2fj.eswa.2023.122229&partnerID=40&md5=5cdf395548b9461bc74e462f9df4332d, DOI:10.101...

work page doi:10.1016/j.eswa.2023.122229 2024
[41]

Modern algorithms of cluster analysis

Wierzchoń, S., Kłopotek, M., 2018. Modern algorithms of cluster analysis. Springer

work page 2018
[42]

Integrating semi-supervised learning with density tree structures for scalable clustering

Yang, H., Feng, Y., Cai, J., Wang, J., Li, Y., Zhao, X., Xun, Y., 2026. Integrating semi-supervised learning with density tree structures for scalable clustering. Applied Soft Computing 186, 114147. DOI:10.1016/j.asoc.2025.114147

work page doi:10.1016/j.asoc.2025.114147 2026
[43]

Densitypeakclusteringalgorithmbasedonboundaryeliminationandbackboneconstruction

Zhao,Z., Chen,S.,Hu, C.,2025. Densitypeakclusteringalgorithmbasedonboundaryeliminationandbackboneconstruction. AppliedSoft Computing 185, 113880. DOI:10.1016/j.asoc.2025.113880. M. Gagolewski:Preprint; Last updated on 9 April 2026Page 10 of 13 Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees 0.05 0.1 0.15 0.2 0.25...

work page doi:10.1016/j.asoc.2025.113880 2025

[1] [1]

Asystematicreviewofmachinelearningmethodsinsoftware testing

Ajorloo,S.,Jamarani,A.,Kashfi,M.,HaghiKashani,M.,Najafizadeh,A.,2024. Asystematicreviewofmachinelearningmethodsinsoftware testing. Applied Soft Computing 162, 111805. DOI:10.1016/j.asoc.2024.111805

work page doi:10.1016/j.asoc.2024.111805 2024

[2] [2]

Breunig, Hans-Peter Kriegel, and Joerg Sander

Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering points to identify the clustering structure, in: Proc. ACM Intl. Conf. Management of Data (SIGMOD), pp. 49–60. DOI:10.1145/304182.304187

work page doi:10.1145/304182.304187 1999

[3] [3]

Communications of the ACM18(9), 509–517 (1975).https://doi.org/10.1145/ 361002.361007

Bentley, J., 1975. Multidimensional binary search trees used for associative searching. Communications of the ACM 18, 509–517. DOI:10. 1145/361002.361007

work page arXiv 1975

[4] [4]

Foundations of Data Science

Blum, A., Hopcroft, J., Kannan, R., 2020. Foundations of Data Science. Cambridge University Press. URL:https://www.cs.cornell. edu/jeh/book.pdf

work page 2020

[5] [5]

O jistém problému minimálním

Borůvka, O., 1926. O jistém problému minimálním. Práce Moravské Přírodovědecké Společnosti v Brně 3, 37–58

work page 1926

[6] [6]

In: Advances in Knowledge Discovery and Data Mining

Campello, R.J., Moulavi, D., Sander, J., 2013. Density-based clustering based on hierarchical density estimates. LNAI 7819, 160–172. DOI:10.1007/978-3-642-37456-2_14

work page doi:10.1007/978-3-642-37456-2_14 2013

[7] [7]

ACM Trans- actions on Knowledge Discovery from Data10(1), 1–51 (2015) https://doi.org/ 10.1145/2733381

Campello,R.J.,Moulavi,D.,Zimek,A.,Sander,J.,2015.Hierarchicaldensityestimatesfordataclustering,visualization,andoutlierdetection. ACM Transactions on Knowledge Discovery from Data 10, 5:1–5:51. DOI:10.1145/2733381

work page doi:10.1145/2733381 2015

[8] [8]

Rates of convergence for the cluster tree, in: Advances in Neural Information Processing Systems, pp

Chaudhuri, K., Dasgupta, S., 2010. Rates of convergence for the cluster tree, in: Advances in Neural Information Processing Systems, pp. 343–351

work page 2010

[9] [9]

Cluster expansions:T-walks, labeled pose=ts and matrix calculations

Dogan, A., Birant, D., 2021. Machinelearninganddatamininginmanufacturing. ExpertSystemswithApplications166. DOI:10.1016/j. eswa.2020.114060

work page doi:10.1016/j 2021

[10] [10]

A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proc

Ester, M., Kriegel, H.P., Sander, J., Xu, X., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proc. KDD’96, pp. 226–231

work page 1996

[11] [11]

Florek,K.,Łukaszewicz,J.,Perkal,J.,Steinhaus,H.,Zubrzycki,S.,1951.Surlaliaisonetladivisiondespointsd’unensemblefini.Colloquium Mathematicum 2, 282–285

work page 1951

[12] [12]

Iterative shrinking method for clustering problems

Fränti, P., Virmajoki, O., 2006. Iterative shrinking method for clustering problems. Pattern Recognition 39, 761–765

work page 2006

[13] [13]

Data-driven estimation of tbm performance in soft soils using density-based spatial clustering and random forest

Fu, X., Feng, L., Zhang, L., 2022. Data-driven estimation of tbm performance in soft soils using density-based spatial clustering and random forest. Applied Soft Computing 120, 108686. DOI:10.1016/j.asoc.2022.108686

work page doi:10.1016/j.asoc.2022.108686 2022

[14] [14]

A framework for benchmarking clustering algorithms

Gagolewski, M., 2022. A framework for benchmarking clustering algorithms. SoftwareX 20, 101270. URL:https:// clustering-benchmarks.gagolewski.com/, DOI:10.1016/j.softx.2022.101270. M. Gagolewski:Preprint; Last updated on 9 April 2026Page 9 of 13 Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees

work page doi:10.1016/j.softx.2022.101270 2022

[15] [15]

Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm

Gagolewski, M., Bartoszuk, M., Cena, A., 2016. Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences 363, 8–23

work page 2016

[16] [16]

Clustering with minimum spanning trees: How good can it be? Journal of Classification 42, 90–112

Gagolewski, M., Cena, A., Bartoszuk, M., Brzozowski, L., 2025. Clustering with minimum spanning trees: How good can it be? Journal of Classification 42, 90–112. DOI:10.1007/s00357-024-09483-1

work page doi:10.1007/s00357-024-09483-1 2025

[17] [17]

Minimum spanning trees and single linkage cluster analysis

Gower, J., Ross, G., 1969. Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society. Series C (Applied Statistics) 18, 54–64

work page 1969

[18] [18]

Kernel-based fuzzy clustering: A comparative experimental study

Graves, D., Pedrycz, W., 2010. Kernel-based fuzzy clustering: A comparative experimental study. Fuzzy Sets and Systems 161, 522–543

work page 2010

[19] [19]

What are the true clusters? Pattern Recognition Letters 64, 53–62

Hennig, C., 2015. What are the true clusters? Pattern Recognition Letters 64, 53–62. DOI:10.1016/j.patrec.2015.04.009

work page doi:10.1016/j.patrec.2015.04.009 2015

[20] [20]

Comparing partitions

Hubert, L., Arabie, P., 1985. Comparing partitions. Journal of Classification 2, 193–218

work page 1985

[21] [21]

Cluster analysis: A modern statistical review

Jaeger, A., Banks, D., 2023. Cluster analysis: A modern statistical review. Wiley Interdisciplinary Reviews: Computational Statistics 15, e1597. DOI:10.1002/wics.1597

work page doi:10.1002/wics.1597 2023

[22] [22]

Data clustering: A user’s dilemma

Jain, A., Law, M., 2005. Data clustering: A user’s dilemma. Lecture Notes in Computer Science 3776, 1–10

work page 2005

[23] [23]

CHAMELEON: Hierarchical clustering using dynamic modeling

Karypis, G., Han, E., Kumar, V., 1999. CHAMELEON: Hierarchical clustering using dynamic modeling. Computer 32, 68–75. DOI:10. 1109/2.781637

work page 1999

[24] [24]

Ontheshortestspanningsubtreeofagraphandthetravelingsalesmanproblem

Kruskal,J.,1956. Ontheshortestspanningsubtreeofagraphandthetravelingsalesmanproblem. ProceedingsoftheAmericanMathematical Society 7, 48–50

work page 1956

[25] [25]

A probability theory of cluster analysis

Ling, R.F., 1973. A probability theory of cluster analysis. Journal of the American Statistical Association 68, 159–164

work page 1973

[26] [26]

A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint

Ma, Y., Lin, H., Wang, Y., Huang, H., He, X., 2021. A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint. Information Sciences 557, 194–219. DOI:10.1016/j.ins.2020.12.016

work page doi:10.1016/j.ins.2020.12.016 2021

[27] [27]

It’sokaytobeskinny,ifyourfriendsarefat,in: The4thCGCWorkshoponComputationalGeometry

Maneewongvatana,S.,Mount,D.,1999. It’sokaytobeskinny,ifyourfriendsarefat,in: The4thCGCWorkshoponComputationalGeometry

work page 1999

[28] [28]

Accelerated hierarchical density clustering, in: IEEE International Conference on Data Mining Workshops (ICDMW), pp

McInnes, L., Healy, J., 2017. Accelerated hierarchical density clustering, in: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 33–42. DOI:10.1109/ICDMW.2017.12. see the extended version athttps://arxiv.org/pdf/1705.07321

work page doi:10.1109/icdmw.2017.12 2017

[29] [29]

URL https://doi.org/10.21105/joss.00205

McInnes, L., Healy, J., Astels, S., 2017. hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2, 205. DOI:10.21105/joss.00205

work page doi:10.21105/joss.00205 2017

[30] [30]

Information theoretic clustering using minimum spanning trees, in: Proc

Müller, A., Nowozin, S., Lampert, C., 2012. Information theoretic clustering using minimum spanning trees, in: Proc. German Conference on Pattern Recognition.https://github.com/amueller/information-theoretic-mst

work page 2012

[31] [31]

Parallel algorithms for hierarchical clustering

Olson, C.F., 1995. Parallel algorithms for hierarchical clustering. Parallel Computing 21, 1313–1325

work page 1995

[32] [32]

Scikit-learn: Machine learning in Python

Pedregosa, F., et al., 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830

work page 2011

[33] [33]

Chothia and A.M

Prim,R.C.,1957. Shortestconnectionnetworksandsomegeneralizations. BellSystemTechnicalJournal36,1389–1401. DOI:10.1002/j. 1538-7305.1957.tb01515.x

work page doi:10.1002/j 1957

[34] [34]

The ArborX library: Version 2.0

Prokopenko, A., Arndt, D., Lebrun-Grandié, D., Turcksin, B., 2025. The ArborX library: Version 2.0. ACM Transactions on Mathematical Software 51, 30. DOI:10.1145/3772288

work page doi:10.1145/3772288 2025

[35] [35]

Asingle-treealgorithmtocomputetheEuclideanminimumspanningtreeonGPUs

Prokopenko,A.,Sao,P.,Lebrun-Grandié,D.,2023. Asingle-treealgorithmtocomputetheEuclideanminimumspanningtreeonGPUs. Proc. ICPP’22 , 14DOI:10.1145/3545008.3546185

work page doi:10.1145/3545008.3546185 2023

[36] [36]

Optimizing search strategies in K-d trees, in: 5th WSES/IEEE Conf

Sample, N., Haines, M., Arnold, M., Purcell, T., 2001. Optimizing search strategies in K-d trees, in: 5th WSES/IEEE Conf. on Circuits, Systems, Communications & Computers (CSCC’01)

work page 2001

[37] [37]

Clustering benchmark datasets exploiting the fundamental clustering problems

Thrun, M., Ultsch, A., 2020. Clustering benchmark datasets exploiting the fundamental clustering problems. Data in Brief 30, 105501. DOI:10.1016/j.dib.2020.105501

work page doi:10.1016/j.dib.2020.105501 2020

[38] [38]

A white paper on good research practices in benchmarking: The case of cluster analysis

van Mechelen, I., Boulesteix, A.L., Dangl, R., et al., 2023. A white paper on good research practices in benchmarking: The case of cluster analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , e1511DOI:10.1002/widm.1511

work page doi:10.1002/widm.1511 2023

[39] [39]

ResTune: Resource oriented tuning boosted by meta-learning for cloud databases

Wang,Y.,Yu,S.,Gu,Y.,Shun,J.,2021. FastparallelalgorithmsforEuclideanminimumspanningtreeandhierarchicalspatialclustering,in: Proc. SIGMOD’21, pp. 1982–1995. DOI:10.1145/3448016.3457296

work page doi:10.1145/3448016.3457296 2021

[40] [40]

Expert Systems with Applications238, 122229 (2024) https://doi

Wei,Z.,Gao,Y.,Zhang,X.,Li,X.,Han,Z.,2024. Adaptivemarinetrafficbehaviourpatternrecognitionbasedonmultidimensionaldynamic time warping and dbscan algorithm. Expert Systems with Applications 238. URL:https://www.scopus.com/inward/record.uri? eid=2-s2.0-85174700827&doi=10.1016%2fj.eswa.2023.122229&partnerID=40&md5=5cdf395548b9461bc74e462f9df4332d, DOI:10.101...

work page doi:10.1016/j.eswa.2023.122229 2024

[41] [41]

Modern algorithms of cluster analysis

Wierzchoń, S., Kłopotek, M., 2018. Modern algorithms of cluster analysis. Springer

work page 2018

[42] [42]

Integrating semi-supervised learning with density tree structures for scalable clustering

Yang, H., Feng, Y., Cai, J., Wang, J., Li, Y., Zhao, X., Xun, Y., 2026. Integrating semi-supervised learning with density tree structures for scalable clustering. Applied Soft Computing 186, 114147. DOI:10.1016/j.asoc.2025.114147

work page doi:10.1016/j.asoc.2025.114147 2026

[43] [43]

Densitypeakclusteringalgorithmbasedonboundaryeliminationandbackboneconstruction

Zhao,Z., Chen,S.,Hu, C.,2025. Densitypeakclusteringalgorithmbasedonboundaryeliminationandbackboneconstruction. AppliedSoft Computing 185, 113880. DOI:10.1016/j.asoc.2025.113880. M. Gagolewski:Preprint; Last updated on 9 April 2026Page 10 of 13 Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees 0.05 0.1 0.15 0.2 0.25...

work page doi:10.1016/j.asoc.2025.113880 2025