arxiv: 2605.03053 · v1 · submitted 2026-05-04 · 💻 cs.CV · cond-mat.soft· q-bio.QM

Recognition: 1 theorem link

Approaching human parity in the quality of automated organoid image segmentation

Chase Cartwright , Gongbo Guo , Sai Teja Pusuluri , Christopher N. Mayhew , Mark Hester , Horacio E. Castillo

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:46 UTC · model grok-4.3

classification 💻 cs.CV cond-mat.softq-bio.QM

keywords organoid segmentationimage segmentationSegment Anything Modelcomputer visionstem cell imagingautomated analysisinter-observer variabilityspheroid measurement

0 comments

The pith

A composite method pairing the Segment Anything Model with a domain-specific tool segments organoid images at or near inter-observer human accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests several existing tools for automatically outlining the size and shape of developing organoids in microscope images and finds that none works reliably across all conditions. It introduces a composite approach that first uses the general-purpose Segment Anything Model and then refines the result with an existing specialized tool. On the authors' test set this hybrid produces accurate outlines for nearly all images, including many where single tools fail. Its error rates reach the level of differences between independent human annotators on one metric and come close on others. This matters because organoids are used to model human development and disease, so reliable automated measurement can track their growth without constant manual tracing.

Core claim

No single existing segmentation tool delivers sufficient accuracy on every test image of pluripotent-stem-cell-derived spheroids, yet the composite method that combines the Segment Anything Model with a domain-specific tool produces consistent and accurate results on all but a very small fraction of the most challenging images; by one quantitative measure its performance equals inter-observer variability among human annotators and by others it lies very close to that benchmark.

What carries the argument

The composite segmentation pipeline that applies the Segment Anything Model (SAM) as a general-purpose foundation model and then refines its output with an existing domain-specific organoid segmentation tool.

If this is right

Large-scale time-lapse studies of organoid development can replace most manual outlining with automated measurements.
Morphological changes during disease modeling become easier to quantify across hundreds of organoids.
The same hybrid strategy can be tested on other complex three-dimensional cell cultures where single tools currently fall short.
Routine monitoring of organoid size and shape no longer requires an expert annotator for every image.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same SAM-plus-refinement pattern may transfer to other biomedical imaging domains that already possess one reliable but narrow tool.
If the composite method maintains its performance on live-cell imaging sequences, it could enable fully automated tracking of organoid growth trajectories.
Adoption would shift the bottleneck in organoid research from image segmentation to downstream biological interpretation of the resulting shape and size data.

Load-bearing premise

The selected test images and the manual annotations used as ground truth adequately represent the full range of real-world organoid imaging conditions and that matching inter-observer variability is the appropriate target for acceptable automated performance.

What would settle it

Apply the composite method to a new collection of organoid images drawn from different laboratories, imaging modalities, or organoid types and measure whether its segmentation error exceeds the inter-observer variability measured on the same set.

Figures

Figures reproduced from arXiv: 2605.03053 by Chase Cartwright, Christopher N. Mayhew, Gongbo Guo, Horacio E. Castillo, Mark Hester, Sai Teja Pusuluri.

**Figure 1.** Figure 1: Growth protocol for iPSC-derived spheroids. Media was replaced in well plates as shown. Training data To generate a robust and diverse dataset to retrain OrganoID, we used 176 manually segmented and labeled images taken from a variety of experimental conditions (either untreated, treated with CEPT at standard concentration, treated with CEPT at 1/2, 1/4, 1/10 or 1/20 of standard concentration, or treated w… view at source ↗

**Figure 2.** Figure 2: Demonstration of IOU Calculation We calculate IOU by dividing the number of pixels which belong to both masks (the purple region) by the total number of pixels which appear in either mask (the blue, purple, and red regions combined) in order to quantify the agreement of two masks. segmentation which is almost completely disjoint with the true spheroid view at source ↗

**Figure 3.** Figure 3: Example image and masks from Image Set A view at source ↗

**Figure 4.** Figure 4: Example image and masks from Image Set B view at source ↗

**Figure 5.** Figure 5: Example image and masks from Image Set C view at source ↗

**Figure 6.** Figure 6: IOUs of Segmentation Methods IOUs between the masks produced by each method and the corresponding ground truth masks. On the horizontal axis is the segmentation method used to produce the masks and on the vertical axis is the overlap between those masks and the corresponding ground truth masks, which ranges between 0 and 1, with 1 being perfect accuracy. Dotted lines indicate the mean IOU for the respectiv… view at source ↗

**Figure 7.** Figure 7: IOUs and normalized areas for each image and segmentation view at source ↗

**Figure 8.** Figure 8: Untrained Segmentation Processes a) Flowchart for the Grounding DINO + SAM process: Grounding DINO takes the original image along with the text prompt ”a dark, solid cluster”, and outputs bounding boxes representing regions that may contain an object fitting the given description. The Segment Anything Model then takes the original image and the Grounding DINO bounding boxes, generating masks for the object… view at source ↗

**Figure 9.** Figure 9: IOUs and normalized areas for Trained and Untrained OrganoID view at source ↗

**Figure 10.** Figure 10: The OrganoID Centroid + SAM and OrganoID + SAM Composite processes view at source ↗

**Figure 11.** Figure 11: IOUs and normalized areas of the OrganoID Centroid + SAM and OrganoID + SAM Composite view at source ↗

**Figure 12.** Figure 12: Segmentation methods based on Untrained OrganoID view at source ↗

**Figure 13.** Figure 13: IOUs for the Hybrid method and related methods view at source ↗

**Figure 14.** Figure 14: Demonstration of Eccentricity Calculation view at source ↗

**Figure 15.** Figure 15: Demonstration of Solidity Calculation a) The solidity of a shape is the ratio of its area to the area of its convex hull. The convex hull of a shape is the enclosing shape with the smallest possible perimeter, as shown in these 3 examples, which all have the same square-shaped convex hull, outlined by the green perimeter. b) Three example masks with their convex hulls outlined in green and their soliditie… view at source ↗

**Figure 16.** Figure 16: OrganoID and successful enhancements thereof view at source ↗

**Figure 17.** Figure 17: OrganoID and successful enhancements thereof (continued) view at source ↗

**Figure 18.** Figure 18: Agreement of segmentation methods with ground truth segmentation view at source ↗

read the original abstract

Organoids are complex, three dimensional, self-organizing cell cultures which manifest organ-like features and represent a powerful platform for studying human disease and developing treatment options. Organoid development is characterized by dynamic morphological and cellular organization, which mimic some aspects of organ development. To study these rapid changes over the course of organoid development, advanced imaging and analytical tools are critical to accurately monitor the trajectory of organoid growth and investigate disease processes. In this work, we focus on computer vision and machine learning techniques to automatically measure the size and shape of developing spheroids derived from pluripotent stem cells (iPSCs), which are typically the starting material for generating organoid cultures. To facilitate this task, we introduce a composite method that combines the Segment Anything Model (SAM), a general-purpose foundation model, with an existing domain-specific tool. This composite method is evaluated together with several existing tools by testing them on organoid image data and comparing with the results of manual image segmentation. We find that no single existing tool is able to segment the test images with sufficient accuracy across all test conditions, but the newly introduced composite method produces consistent and accurate results for all but a very small fraction of the most challenging images. Finally, we compare the accuracy of this method to the variability between manual segmentations by independent annotators (inter-observer variability) and find that by one measure it performs at the level of inter-observer variability and by others it performs very close to it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The composite SAM plus domain tool gets organoid segmentation close to inter-observer variability on the tested images, but the paper is thin on dataset size, selection, and exact metric values.

read the letter

The main point is that pairing the Segment Anything Model with an existing organoid-specific tool produces segmentations that hold up across most of their test images and reach or approach the spread seen between independent human annotators. No single prior tool managed that consistency on the same data. The paper shows the hybrid works where others drop off and then anchors the result against inter-observer differences, which is a direct and relevant check. That comparison is the clearest strength; it gives a practical sense of whether the output is usable without constant manual correction. The evaluation uses external manual annotations rather than any self-referential loop, so the performance numbers stand on their own. The central claim of near-human parity therefore rests on real data rather than fitting artifacts. The soft spot is the missing detail on the test set itself. The abstract states the method succeeds on all but a small fraction of challenging cases and is close on some measures, yet it gives no count of images, no description of how they were picked, and no numerical values for the metrics. Without those, it is difficult to judge whether the images cover the range of morphologies, densities, or imaging conditions that appear in other labs. The inter-observer benchmark is sensible, but its value depends on how representative the annotators and images are. This work is aimed at bioimaging groups that need to measure large numbers of organoids or spheroids without proportional manual effort. A reader in that area would find the pipeline description and the human-variability comparison useful for deciding whether to try the method. It deserves peer review because the empirical anchor is there and the problem is concrete, even if referees will need to press for fuller dataset characterization and failure-case numbers.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a composite segmentation approach that combines the Segment Anything Model (SAM) with an existing domain-specific tool for automated analysis of organoid images derived from iPSCs. It evaluates the method against several existing tools and manual segmentations, claiming consistent and accurate results for all but a small fraction of challenging images, with performance reaching inter-observer variability on one metric and approaching it on others.

Significance. If the quantitative results hold with representative data, the work could enable scalable, reproducible monitoring of organoid morphology in developmental biology and disease modeling, reducing dependence on manual annotation. The direct comparison to inter-observer variability and use of a foundation model adapted to the domain are notable strengths that support practical utility.

major comments (2)

Abstract: The central claim that the composite method 'performs at the level of inter-observer variability' by one measure and 'very close to it' by others is not supported by any reported numerical values for the metrics (Dice, IoU, boundary error, etc.), dataset cardinality, image selection protocol, or characterization of the 'very small fraction' of failure cases. These details are load-bearing for evaluating statistical robustness and external validity of the human-parity assertion.
Evaluation section: The manuscript adopts inter-observer variability as the benchmark for acceptable automated performance without explicit justification or analysis of whether this reference distribution aligns with biological requirements (e.g., tolerance for error in downstream growth trajectory studies); this assumption requires concrete support to sustain the parity conclusion.

minor comments (2)

Clarify the precise integration mechanism between SAM and the domain-specific tool, including any post-processing steps or parameter choices, to support reproducibility.
Add representative failure-case images and quantitative breakdowns by image difficulty or morphology type to the results or supplementary material.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review of our manuscript. We address the major comments point by point below, proposing revisions to enhance the clarity and support for our claims.

read point-by-point responses

Referee: Abstract: The central claim that the composite method 'performs at the level of inter-observer variability' by one measure and 'very close to it' by others is not supported by any reported numerical values for the metrics (Dice, IoU, boundary error, etc.), dataset cardinality, image selection protocol, or characterization of the 'very small fraction' of failure cases. These details are load-bearing for evaluating statistical robustness and external validity of the human-parity assertion.

Authors: We acknowledge that the abstract would be improved by including specific supporting details. The manuscript does report the numerical values for the metrics, the dataset cardinality and selection protocol, and the characterization of failure cases in the Evaluation and Results sections. To make these load-bearing details immediately available to readers, we will revise the abstract to include key quantitative results and a summary of the dataset and failure cases. revision: yes
Referee: Evaluation section: The manuscript adopts inter-observer variability as the benchmark for acceptable automated performance without explicit justification or analysis of whether this reference distribution aligns with biological requirements (e.g., tolerance for error in downstream growth trajectory studies); this assumption requires concrete support to sustain the parity conclusion.

Authors: The use of inter-observer variability as a benchmark is standard practice in the field for assessing automated segmentation performance against human experts. However, we agree that explicit justification and analysis of its alignment with biological requirements would strengthen the manuscript. We will add a new paragraph in the Evaluation section that justifies this choice with references to prior work and discusses its implications for downstream applications such as growth trajectory studies. revision: yes

Circularity Check

0 steps flagged

No circularity: evaluation uses independent external manual annotations and inter-observer benchmarks

full rationale

The paper introduces a composite SAM-based segmentation method and evaluates it empirically on organoid images by direct comparison to manual segmentations performed by independent annotators. Performance is reported relative to inter-observer variability as an external reference. No equations, parameter fitting, or derivations are described that reduce the reported accuracy to the method's own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim rests on external test data and human annotations rather than any self-referential loop, satisfying the criteria for a self-contained empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions in image segmentation evaluation rather than new postulates.

axioms (1)

domain assumption Manual segmentations by independent annotators constitute a reliable reference standard for measuring automated accuracy.
Used directly as the benchmark for human parity.

pith-pipeline@v0.9.0 · 5588 in / 1052 out tokens · 52465 ms · 2026-05-08T18:46:17.589367+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 28 canonical work pages · 2 internal anchors

[1]

Modeling Development and Disease with Organoids

Clevers H. Modeling Development and Disease with Organoids. Cell. 2016;165(7):1586–1597. doi:10.1016/j.cell.2016.05.082

work page doi:10.1016/j.cell.2016.05.082 2016
[2]

Organoids: Modeling Development and the Stem Cell Niche in a Dish

Kretzschmar K, Clevers H. Organoids: Modeling Development and the Stem Cell Niche in a Dish. Developmental Cell. 2016;38(6):590–600. doi:10.1016/j.devcel.2016.08.014

work page doi:10.1016/j.devcel.2016.08.014 2016
[3]

Organoids — Preclinical Models of Human Disease

Li M, Belmonte JCI. Organoids — Preclinical Models of Human Disease. New England Journal of Medicine. 2019;380(6):569–579. doi:10.1056/NEJMra1806175

work page doi:10.1056/nejmra1806175 2019
[5]

Engineering organoids

Hofer M, Lutolf MP. Engineering organoids. Nature Reviews Materials. 2021;6(5):402–420. doi:10.1038/s41578-021-00279-y

work page doi:10.1038/s41578-021-00279-y 2021
[6]

Importance of Organoids for Personalized Medicine

Perkhofer L, F Pierre-Olivier, M Martin, , Kleger A. Importance of Organoids for Personalized Medicine. Personalized Medicine. 2018;15(6):461–465. doi:10.2217/pme-2018-0071

work page doi:10.2217/pme-2018-0071 2018
[7]

Oral Mucosal Organoids as a Potential Platform for Personalized Cancer Therapy

Driehuis E, Kolders S, Spelier S, L˜ ohmussaar K, Willems SM, Devriese LA, et al. Oral Mucosal Organoids as a Potential Platform for Personalized Cancer Therapy. Cancer Discovery. 2019;9(7):852–871. doi:10.1158/2159-8290.CD-18-1522

work page doi:10.1158/2159-8290.cd-18-1522 2019
[8]

Organoid based personalized medicine: from bench to bedside

Li Y, Tang P, Cai S, Peng J, Hua G. Organoid based personalized medicine: from bench to bedside. Cell Regeneration. 2020;9(1):21. doi:10.1186/s13619-020-00059-z

work page doi:10.1186/s13619-020-00059-z 2020
[9]

Organoid-based personalized medicine: from tumor outcome prediction to autologous transplantation

Soto-Gamez A, Gunawan JP, Barazzuol L, Pringle S, Coppes RP. Organoid-based personalized medicine: from tumor outcome prediction to autologous transplantation. Stem Cells. 2024;42(6):499–508. doi:10.1093/stmcls/sxae023

work page doi:10.1093/stmcls/sxae023 2024
[10]

Patient-Derived Organoids: A Game-Changer in Personalized Cancer Medicine

Abbasian MH, Sobhani N, Sisakht MM, D’Angelo A, Sirico M, Roudi R. Patient-Derived Organoids: A Game-Changer in Personalized Cancer Medicine. Stem Cell Reviews and Reports. 2025;21(1):211–225. doi:10.1007/s12015-024-10805-4

work page doi:10.1007/s12015-024-10805-4 2025
[11]

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Sarker IH. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science. 2021;2(6):1–20. doi:10.1007/s42979-021-00815-1. May 6, 2026 24/26

work page doi:10.1007/s42979-021-00815-1 2021
[12]

A review on deep learning in medical image analysis

Suganyadevi S, Seethalakshmi V, Balasamy K. A review on deep learning in medical image analysis. International Journal of Multimedia Information Retrieval. 2021;11(1):19–38. doi:10.1007/s13735-021-00218-1

work page doi:10.1007/s13735-021-00218-1 2021
[13]

Deep Learning Applications in Medical Image Analysis

Ker J, Wang L, Rao J, Lim T. Deep Learning Applications in Medical Image Analysis. IEEE Access. 2018;6:9375–9389. doi:10.1109/ACCESS.2017.2788044

work page doi:10.1109/access.2017.2788044 2018
[14]

Deep learning for cellular image analysis

Moen E, Bannon D, Kudo T, Graf W, Covert M, Van Valen D. Deep learning for cellular image analysis. Nature Methods. 2019;16(12):1233–1246. doi:10.1038/s41592-019-0403-1

work page doi:10.1038/s41592-019-0403-1 2019
[15]

Frontiers| CNN-Based Cell Analysis: From Image to Quantitative Representation

Allier C, Herv´ e L, Paviolo C, Mandula O, Cioni O, Pierr´ e W, et al. Frontiers| CNN-Based Cell Analysis: From Image to Quantitative Representation. Frontiers in Physics. 2022;doi:10.3389/fphy.2021.776805

work page doi:10.3389/fphy.2021.776805 2022
[16]

Growth of Epithelial Organoids in a Defined Hydrogel

Broguiere N, Isenmann L, Hirt C, Ringel T, Placzek S, Cavalli E, et al. Growth of Epithelial Organoids in a Defined Hydrogel. Advanced Materials. 2018;30(43):1801621. doi:10.1002/adma.201801621

work page doi:10.1002/adma.201801621 2018
[17]

MOrgAna: accessible quantitative analysis of organoids with machine learning

Gritti N, Lim JL, Anla¸ s K, Pandya M, Aalderink G, Mart´ ınez-Ara G, et al. MOrgAna: accessible quantitative analysis of organoids with machine learning. Development. 2021;148(18):dev199611. doi:10.1242/dev.199611

work page doi:10.1242/dev.199611 2021
[18]

Development of a deep learning based image processing tool for enhanced organoid analysis

Park T, Kim TK, Han YD, Kim KA, Kim H, Kim HS. Development of a deep learning based image processing tool for enhanced organoid analysis. Scientific Reports. 2023;13(1):19841. doi:10.1038/s41598-023-46485-2

work page doi:10.1038/s41598-023-46485-2 2023
[19]

OrganoID: A versatile deep learning platform for tracking and analysis of single-organoid dynamics

Matthews JM, Schuster B, Kashaf SS, Liu P, Ben-Yishay R, Ishay-Ronen D, et al. OrganoID: A versatile deep learning platform for tracking and analysis of single-organoid dynamics. PLOS Computational Biology. 2022;18(11):e1010584. doi:10.1371/journal.pcbi.1010584

work page doi:10.1371/journal.pcbi.1010584 2022
[20]

U-Net: Convolutional Networks for Biomedical Image Segmentation

Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Munich, Germany: Springer International Publishing; 2015. p. 234–241. Available from:http://arxiv.org/abs/1505.04597

work page internal anchor Pith review arXiv 2015
[21]

MSU-Net: Multi-Scale U-Net for 2D Medical Image Segmentation

Su R, Zhang D, Liu J, Cheng C. MSU-Net: Multi-Scale U-Net for 2D Medical Image Segmentation. Frontiers in Genetics. 2021;doi:10.3389/fgene.2021.639930

work page doi:10.3389/fgene.2021.639930 2021
[22]

Segment Anything

Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment Anything. Preprint arXiv:230402643. 2023;doi:10.48550/arXiv.2304.02643

work page internal anchor Pith review doi:10.48550/arxiv.2304.02643 2023
[23]

Segment Anything for Microscopy

Archit A, Freckmann L, Nair S, Khalid N, Hilt P, Rajashekar V, et al. Segment Anything for Microscopy. Nature Methods. 2025;22(3):579–591. doi:10.1038/s41592-024-02580-4

work page doi:10.1038/s41592-024-02580-4 2025
[24]

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Liu S, Zeng Z, Ren T, Li F, Zhang H, Yang J, et al.. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
[25]

Available from:http://arxiv.org/abs/2303.05499

work page Pith review arXiv
[26]

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Ren T, Liu S, Zeng A, Lin J, Li K, Cao H, et al.. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks; 2024. Available from: http://arxiv.org/abs/2401.14159. May 6, 2026 25/26

work page Pith review arXiv 2024
[27]

Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO; 2024

Mumuni F, Mumuni A. Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO; 2024. Available from:http://arxiv.org/abs/2406.19057

work page arXiv 2024
[28]

Automated microfluidic platform for dynamic and combinatorial drug screening of tumor organoids

Schuster B, Junkin M, Kashaf SS, Romero-Calvo I, Kirby K, Matthews J, et al. Automated microfluidic platform for dynamic and combinatorial drug screening of tumor organoids. Nature Communications. 2020;11(1):5271. doi:10.1038/s41467-020-19058-4

work page doi:10.1038/s41467-020-19058-4 2020
[29]

A versatile polypharmacology platform promotes cytoprotection and viability of human pluripotent and differentiated cells

Chen Y, Tristan CA, Chen L, Jovanovic VM, Malley C, Chu PH, et al. A versatile polypharmacology platform promotes cytoprotection and viability of human pluripotent and differentiated cells. Nat Methods. 2021;18(5):528–541. doi:10.1038/s41592-021-01126-2

work page doi:10.1038/s41592-021-01126-2 2021
[30]

scikit-image: image processing in Python,

Van Der Walt S, Sch¨ onberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;2:e453. doi:10.7717/peerj.453. May 6, 2026 26/26

work page doi:10.7717/peerj.453 2014