pith. machine review for the scientific record. sign in

arxiv: 2604.09370 · v1 · submitted 2026-04-10 · 🧬 q-bio.QM · cs.CV

Recognition: unknown

Cluster-First Labelling: An Automated Pipeline for Segmentation and Morphological Clustering in Histology Whole Slide Images

Damion Young, Jon Mason, Muhammad Haseeb Ahmad, Sharmila Rajendran

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:28 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.CV
keywords histologywhole slide imagesimage segmentationmorphological clusteringautomated labelingtissue analysisannotation efficiency
0
0 comments X

The pith

A cluster-first pipeline segments histology images and groups similar tissue components so humans label clusters instead of individual objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an automated pipeline for handling the labor-intensive task of labeling structures in histology whole slide images. It tiles slides, segments components such as cells and nuclei, extracts embeddings to capture morphology, reduces dimensions, and applies clustering to form groups of similar objects. A human annotator then assigns labels to these clusters rather than to each structure separately. Testing on 3,696 components across 13 tissue types from three species yields a weighted alignment accuracy of 96.8 percent with independent human labels, reaching perfect agreement in seven types. The approach shifts annotation from exhaustive per-object work to review of representative groups, making detailed analysis of large images more practical.

Core claim

The system tiles whole slide images, filters uninformative areas, segments tissue components, extracts neural embeddings, reduces dimensionality, and applies density-based clustering to produce groups of morphologically similar objects. Human labeling then occurs at the cluster level rather than for each individual component, producing a weighted cluster-label alignment accuracy of 96.8 percent across 3,696 evaluated structures from 13 tissue types in human, rat, and rabbit samples, with perfect agreement in seven of those types.

What carries the argument

The cluster-first paradigm, in which unsupervised morphological clustering of segmented objects occurs before any human labeling, shifting effort from individuals to representative groups.

If this is right

  • Annotation effort drops by orders of magnitude for slides containing tens of thousands of structures.
  • The pipeline handles diverse tissue types from multiple species with high measured alignment to human judgments.
  • Seven of the 13 tested tissue types reach perfect cluster-label agreement.
  • The full pipeline, companion web application, and evaluation code are released as open-source software.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could enable routine morphological analysis of slide repositories that are currently too large for manual labeling.
  • Similar cluster-first strategies might transfer to other high-volume biomedical imaging tasks beyond histology.
  • The open-source components could support community experiments on additional tissue datasets to test cluster stability.

Load-bearing premise

The unsupervised clusters formed from image embeddings correspond to categories that human annotators would consistently recognize and label in the same way.

What would settle it

A new collection of whole slide images in which many clusters contain structures receiving inconsistent human labels, resulting in alignment accuracy substantially below 96.8 percent.

Figures

Figures reproduced from arXiv: 2604.09370 by Damion Young, Jon Mason, Muhammad Haseeb Ahmad, Sharmila Rajendran.

Figure 1
Figure 1. Figure 1: Pipeline architecture. WSI slides (.ndpi) are tiled into 512 × 512 patches, quality-filtered, segmented with Cellpose-SAM, embedded with ResNet-50, reduced via UMAP, and clustered with DBSCAN. Optional downstream stages annotate images, extract rep￾resentative tiles per cluster, and invoke a multimodal LLM for experi￾mental classification. 3 System Architecture [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Labelling application interface. The central panel shows a 512 × 512 histology tile with interactive cell polygons. The sidebar provides tile navigation, label assignment controls, per-tile and batch progress, and visual customisation settings. 3.4 Scalability The pipeline supports two execution modes selected by a single parameter (-max_nodes): • Sequential (n = 1): each stage runs as a single Azure ML co… view at source ↗
Figure 4
Figure 4. Figure 4: Post-alignment tile comparison. Cell polygons are colour￾coded: green = human and aligned-model labels agree; red = disagree. Labels are shown as human / aligned-model pairs. Bottom panel: rabbit femur tile with 207 cells, 206 correct (99.5% accuracy). number of clusters, the annotator’s task reduces from reviewing thousands of cells to reviewing tens of clusters. Generality. Using a single fixed configura… view at source ↗
read the original abstract

Labelling tissue components in histology whole slide images (WSIs) is prohibitively labour-intensive: a single slide may contain tens of thousands of structures--cells, nuclei, and other morphologically distinct objects--each requiring manual boundary delineation and classification. We present a cloudnative, end-to-end pipeline that automates this process through a cluster-first paradigm. Our system tiles WSIs, filters out tiles deemed unlikely to contain valuable information, segments tissue components with Cellpose-SAM (including cells, nuclei, and other morphologically similar structures), extracts neural embeddings via a pretrained ResNet-50, reduces dimensionality with UMAP, and groups morphologically similar objects using DBSCAN clustering. Under this paradigm, a human annotator labels representative clusters rather than individual objects, reducing annotation effort by orders of magnitude. We evaluate the pipeline on 3,696 tissue components across 13 diverse tissue types from three species (human, rat, rabbit), measuring how well unsupervised clusters align with independent human labels via per-tile Hungarian-algorithm matching. Our system achieves a weighted cluster-label alignment accuracy of 96.8%, with 7 of 13 tissue types reaching perfect agreement. The pipeline, a companion labelling web application, and all evaluation code are released as open-source software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents a cloud-native, end-to-end pipeline for automating segmentation and morphological clustering in histology whole slide images (WSIs) under a cluster-first labeling paradigm. WSIs are tiled and filtered, tissue components (cells, nuclei, and similar structures) are segmented via Cellpose-SAM, embeddings are extracted with a pretrained ResNet-50, dimensionality is reduced with UMAP, and objects are grouped with DBSCAN. A human then labels representative clusters rather than individual objects. The pipeline is evaluated on 3,696 tissue components across 13 tissue types from three species (human, rat, rabbit), reporting a weighted cluster-label alignment accuracy of 96.8% (with perfect agreement in 7 tissue types) obtained via per-tile Hungarian-algorithm matching. The pipeline, a companion labeling web application, and all evaluation code are released as open source.

Significance. If the reported alignment holds under a global cluster-labeling regime, the work could reduce annotation effort in digital pathology by orders of magnitude, shifting the burden from labeling tens of thousands of individual objects to labeling a much smaller number of clusters. The open-source release of the full pipeline, web app, and reproducible evaluation code is a clear strength that supports adoption and extension. The significance is nevertheless conditional on whether the unsupervised clusters are morphologically consistent enough to receive a single, stable human label across their full extent.

major comments (1)
  1. [Evaluation procedure] Evaluation procedure (abstract and results): The 96.8% weighted cluster-label alignment accuracy is obtained via per-tile Hungarian matching after DBSCAN on the pooled set of 3,696 components. Because a single cluster ID can contain morphologically similar objects drawn from different tissue types or species that carry distinct human labels, the per-tile optimal assignment permits the same cluster to be matched to different labels in different tiles. This metric therefore does not test the central cluster-first claim that a single human-assigned label to an entire cluster would be correct across the cluster's full extent.
minor comments (2)
  1. [Methods] The exact tile-filtering criteria, the procedure used to select DBSCAN eps and min_samples (and UMAP n_neighbors, min_dist), and whether the 96.8% figure is a single run or an average are not stated, limiting reproducibility.
  2. [Results] The abstract and results would benefit from a brief statement of the number of clusters produced and the distribution of cluster sizes, which directly affects the claimed reduction in annotation effort.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the strengths and limitations of our evaluation. We address the major concern on the evaluation procedure below and will update the manuscript to strengthen the validation of the cluster-first paradigm.

read point-by-point responses
  1. Referee: [Evaluation procedure] Evaluation procedure (abstract and results): The 96.8% weighted cluster-label alignment accuracy is obtained via per-tile Hungarian matching after DBSCAN on the pooled set of 3,696 components. Because a single cluster ID can contain morphologically similar objects drawn from different tissue types or species that carry distinct human labels, the per-tile optimal assignment permits the same cluster to be matched to different labels in different tiles. This metric therefore does not test the central cluster-first claim that a single human-assigned label to an entire cluster would be correct across the cluster's full extent.

    Authors: We agree that the per-tile Hungarian matching does not directly test global label consistency within each cluster, which is central to the cluster-first claim. Although DBSCAN is performed on the pooled embeddings and the high accuracy indicates effective morphological grouping, the per-tile optimal assignment can mask cases where a single cluster spans objects with differing human labels across tiles or tissue types. To address this, we will revise the Results and Evaluation sections to add two complementary global metrics: (1) cluster purity, defined as the fraction of each cluster's members sharing the majority human label, averaged across clusters weighted by size; and (2) a simulated cluster-first accuracy obtained by assigning the majority label to every member of a cluster and computing agreement with the full set of individual human labels. These additions will provide a direct assessment of whether a single label per cluster is reliable across its full extent. We will report both the original per-tile metric and the new global metrics for transparency. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; evaluation is post-hoc and independent.

full rationale

The paper describes an unsupervised pipeline (Cellpose-SAM segmentation, ResNet-50 embeddings, UMAP dimensionality reduction, DBSCAN clustering) followed by a separate evaluation step that computes per-tile Hungarian matching between cluster IDs and independent human labels on 3,696 components. The reported 96.8% weighted alignment accuracy is a post-hoc measurement and is not used to fit, select, or optimize any pipeline parameters. No self-definitional equations, fitted-input predictions, load-bearing self-citations, uniqueness theorems, or ansatzes are present that would reduce the central claim to its own inputs by construction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The pipeline depends on the correctness of several pretrained models and standard unsupervised techniques without new mathematical derivations or invented physical entities.

free parameters (2)
  • DBSCAN eps and min_samples
    Clustering hyperparameters that determine cluster formation; likely tuned on the evaluation data though not detailed in abstract.
  • UMAP n_neighbors and min_dist
    Dimensionality reduction settings that affect embedding quality and downstream clustering.
axioms (2)
  • domain assumption Cellpose-SAM produces accurate instance segmentations of cells and nuclei in the tested histology images
    The pipeline treats the output of this pretrained model as reliable input for embedding and clustering.
  • domain assumption ResNet-50 embeddings capture morphological similarity relevant to human labeling decisions
    The method assumes that features from a network pretrained on natural images transfer usefully to histology objects.

pith-pipeline@v0.9.0 · 5537 in / 1569 out tokens · 37030 ms · 2026-05-10T16:28:09.824944+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    QuPath: Open source software for digital pathology image analysis.Scientific Reports, 7(1):16878, 2017

    Peter Bankhead, Maurice B Loughrey, José A Fernán- dez, Yvonne Dombrowski, Darragh G McArt, Philip D Dunne, Stephen McQuaid, Ronan T Gray, Liam J Murray, Helen G Coleman, et al. QuPath: Open source software for digital pathology image analysis.Scientific Reports, 7(1):16878, 2017

  2. [2]

    The OpenCV library.Dr

    Gary Bradski. The OpenCV library.Dr. Dobb’s Journal of Software Tools, 2000

  3. [3]

    Towards a general-purpose foundation model for computational pathology.Nature Medicine, 30(3):850– 862, 2024

    Richard J Chen, Tong Ding, Ming Y Lu, Drew F K Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology.Nature Medicine, 30(3):850– 862, 2024

  4. [4]

    ImageNet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

  5. [5]

    Unbiased single-cell morphology with self-supervised vision transformers.bioRxiv, 2023

    Michael Doron, Théo Moutakanni, Zitong S Chen, Nikita Moshkov, Mathilde Caron, Hugo Touvron, Piotr Bo- janowski, Wolfgang M Pernice, and Juan C Caicedo. Unbiased single-cell morphology with self-supervised vision transformers.bioRxiv, 2023

  6. [6]

    A density-based algorithm for discovering clusters in large spatial databases with noise

    Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xi- aowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. InProceed- ings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226–231, 1996

  7. [7]

    Whole slide imaging in pathology: advantages, limita- tions, and emerging perspectives.Pathology and Labora- tory Medicine International, 7:23–33, 2015

    Navid Farahani, Anil V Parwani, and Liron Pantanowitz. Whole slide imaging in pathology: advantages, limita- tions, and emerging perspectives.Pathology and Labora- tory Medicine International, 7:23–33, 2015

  8. [8]

    Openslide: A vendor- neutral software foundation for digital pathology.Journal of Pathology Informatics, 4(1):27, 2013

    Adam Goode, Benjamin Gilbert, Jan Harkes, Drazen Ju- kic, and Mahadev Satyanarayanan. Openslide: A vendor- neutral software foundation for digital pathology.Journal of Pathology Informatics, 4(1):27, 2013

  9. [9]

    Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images

    Simon Graham, Quoc Dang Vu, Shan E Ahmed Raza, Ayesha Azam, Yee Wah Tsang, Jin Tae Kwak, and Nasir Rajpoot. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58:101563, 2019

  10. [10]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

  11. [11]

    The Hungarian method for the assign- ment problem.Naval Research Logistics Quarterly, 2(1– 2):83–97, 1955

    Harold W Kuhn. The Hungarian method for the assign- ment problem.Naval Research Logistics Quarterly, 2(1– 2):83–97, 1955

  12. [12]

    Data-efficient and weakly supervised computa- tional pathology on whole-slide images.Nature Biomed- ical Engineering, 5(6):555–570, 2021

    Ming Y Lu, Drew F K Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mah- mood. Data-efficient and weakly supervised computa- tional pathology on whole-slide images.Nature Biomed- ical Engineering, 5(6):555–570, 2021

  13. [13]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and pro- jection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018

  14. [14]

    Azure Machine Learning documenta- tion

    Microsoft. Azure Machine Learning documenta- tion. https://learn.microsoft.com/en-us/ azure/machine-learning/, 2026. Accessed: 2026-04-09

  15. [15]

    Cellpose-SAM: superhuman generalization for cellular segmentation.bioRxiv, 2025

    Marius Pachitariu, Michael Rariden, and Carsen Stringer. Cellpose-SAM: superhuman generalization for cellular segmentation.bioRxiv, 2025

  16. [16]

    Review of the current state of whole slide imaging in pathology.Journal of Pathology Informatics, 2(1):36, 2011

    Liron Pantanowitz, Paul N Valenstein, Andrew J Evans, Keith J Kaplan, John D Pfeifer, David C Wilbur, Laura C Collins, and Terence J Colgan. Review of the current state of whole slide imaging in pathology.Journal of Pathology Informatics, 2(1):36, 2011. 6

  17. [17]

    RAPIDS: Open GPU data science

    RAPIDS Development Team. RAPIDS: Open GPU data science. https://rapids.ai, 2026. Accessed: 2026-04-09

  18. [18]

    U-net: Convolutional networks for biomedical image seg- mentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image seg- mentation. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2015, pages 234–241. Springer, 2015

  19. [19]

    Cell detection with star-convex polygons

    Uwe Schmidt, Martin Weigert, Coleman Broaddus, and Gene Myers. Cell detection with star-convex polygons. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2018, pages 265–273. Springer, 2018

  20. [20]

    Deep neural network models for computational histopathology: A survey.Medical Image Analysis, 67:101813, 2021

    Chetan L Srinidhi, Ozan Ciga, and Anne L Martel. Deep neural network models for computational histopathology: A survey.Medical Image Analysis, 67:101813, 2021

  21. [21]

    Cellpose: a generalist algorithm for cellular segmentation.Nature Methods, 18(1):100–106, 2021

    Carsen Stringer, Tim Wang, Michalis Michaelos, and Marius Pachitariu. Cellpose: a generalist algorithm for cellular segmentation.Nature Methods, 18(1):100–106, 2021. 7