pith. machine review for the scientific record. sign in

arxiv: 2605.03053 · v1 · submitted 2026-05-04 · 💻 cs.CV · cond-mat.soft· q-bio.QM

Recognition: 1 theorem link

Approaching human parity in the quality of automated organoid image segmentation

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:46 UTC · model grok-4.3

classification 💻 cs.CV cond-mat.softq-bio.QM
keywords organoid segmentationimage segmentationSegment Anything Modelcomputer visionstem cell imagingautomated analysisinter-observer variabilityspheroid measurement
0
0 comments X

The pith

A composite method pairing the Segment Anything Model with a domain-specific tool segments organoid images at or near inter-observer human accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests several existing tools for automatically outlining the size and shape of developing organoids in microscope images and finds that none works reliably across all conditions. It introduces a composite approach that first uses the general-purpose Segment Anything Model and then refines the result with an existing specialized tool. On the authors' test set this hybrid produces accurate outlines for nearly all images, including many where single tools fail. Its error rates reach the level of differences between independent human annotators on one metric and come close on others. This matters because organoids are used to model human development and disease, so reliable automated measurement can track their growth without constant manual tracing.

Core claim

No single existing segmentation tool delivers sufficient accuracy on every test image of pluripotent-stem-cell-derived spheroids, yet the composite method that combines the Segment Anything Model with a domain-specific tool produces consistent and accurate results on all but a very small fraction of the most challenging images; by one quantitative measure its performance equals inter-observer variability among human annotators and by others it lies very close to that benchmark.

What carries the argument

The composite segmentation pipeline that applies the Segment Anything Model (SAM) as a general-purpose foundation model and then refines its output with an existing domain-specific organoid segmentation tool.

If this is right

  • Large-scale time-lapse studies of organoid development can replace most manual outlining with automated measurements.
  • Morphological changes during disease modeling become easier to quantify across hundreds of organoids.
  • The same hybrid strategy can be tested on other complex three-dimensional cell cultures where single tools currently fall short.
  • Routine monitoring of organoid size and shape no longer requires an expert annotator for every image.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same SAM-plus-refinement pattern may transfer to other biomedical imaging domains that already possess one reliable but narrow tool.
  • If the composite method maintains its performance on live-cell imaging sequences, it could enable fully automated tracking of organoid growth trajectories.
  • Adoption would shift the bottleneck in organoid research from image segmentation to downstream biological interpretation of the resulting shape and size data.

Load-bearing premise

The selected test images and the manual annotations used as ground truth adequately represent the full range of real-world organoid imaging conditions and that matching inter-observer variability is the appropriate target for acceptable automated performance.

What would settle it

Apply the composite method to a new collection of organoid images drawn from different laboratories, imaging modalities, or organoid types and measure whether its segmentation error exceeds the inter-observer variability measured on the same set.

Figures

Figures reproduced from arXiv: 2605.03053 by Chase Cartwright, Christopher N. Mayhew, Gongbo Guo, Horacio E. Castillo, Mark Hester, Sai Teja Pusuluri.

Figure 1
Figure 1. Figure 1: Growth protocol for iPSC-derived spheroids. Media was replaced in well plates as shown. Training data To generate a robust and diverse dataset to retrain OrganoID, we used 176 manually segmented and labeled images taken from a variety of experimental conditions (either untreated, treated with CEPT at standard concentration, treated with CEPT at 1/2, 1/4, 1/10 or 1/20 of standard concentration, or treated w… view at source ↗
Figure 2
Figure 2. Figure 2: Demonstration of IOU Calculation We calculate IOU by dividing the number of pixels which belong to both masks (the purple region) by the total number of pixels which appear in either mask (the blue, purple, and red regions combined) in order to quantify the agreement of two masks. segmentation which is almost completely disjoint with the true spheroid view at source ↗
Figure 3
Figure 3. Figure 3: Example image and masks from Image Set A view at source ↗
Figure 4
Figure 4. Figure 4: Example image and masks from Image Set B view at source ↗
Figure 5
Figure 5. Figure 5: Example image and masks from Image Set C view at source ↗
Figure 6
Figure 6. Figure 6: IOUs of Segmentation Methods IOUs between the masks produced by each method and the corresponding ground truth masks. On the horizontal axis is the segmentation method used to produce the masks and on the vertical axis is the overlap between those masks and the corresponding ground truth masks, which ranges between 0 and 1, with 1 being perfect accuracy. Dotted lines indicate the mean IOU for the respectiv… view at source ↗
Figure 7
Figure 7. Figure 7: IOUs and normalized areas for each image and segmentation view at source ↗
Figure 8
Figure 8. Figure 8: Untrained Segmentation Processes a) Flowchart for the Grounding DINO + SAM process: Grounding DINO takes the original image along with the text prompt ”a dark, solid cluster”, and outputs bounding boxes representing regions that may contain an object fitting the given description. The Segment Anything Model then takes the original image and the Grounding DINO bounding boxes, generating masks for the object… view at source ↗
Figure 9
Figure 9. Figure 9: IOUs and normalized areas for Trained and Untrained OrganoID view at source ↗
Figure 10
Figure 10. Figure 10: The OrganoID Centroid + SAM and OrganoID + SAM Composite processes view at source ↗
Figure 11
Figure 11. Figure 11: IOUs and normalized areas of the OrganoID Centroid + SAM and OrganoID + SAM Composite view at source ↗
Figure 12
Figure 12. Figure 12: Segmentation methods based on Untrained OrganoID view at source ↗
Figure 13
Figure 13. Figure 13: IOUs for the Hybrid method and related methods view at source ↗
Figure 14
Figure 14. Figure 14: Demonstration of Eccentricity Calculation view at source ↗
Figure 15
Figure 15. Figure 15: Demonstration of Solidity Calculation a) The solidity of a shape is the ratio of its area to the area of its convex hull. The convex hull of a shape is the enclosing shape with the smallest possible perimeter, as shown in these 3 examples, which all have the same square-shaped convex hull, outlined by the green perimeter. b) Three example masks with their convex hulls outlined in green and their soliditie… view at source ↗
Figure 16
Figure 16. Figure 16: OrganoID and successful enhancements thereof view at source ↗
Figure 17
Figure 17. Figure 17: OrganoID and successful enhancements thereof (continued) view at source ↗
Figure 18
Figure 18. Figure 18: Agreement of segmentation methods with ground truth segmentation view at source ↗
read the original abstract

Organoids are complex, three dimensional, self-organizing cell cultures which manifest organ-like features and represent a powerful platform for studying human disease and developing treatment options. Organoid development is characterized by dynamic morphological and cellular organization, which mimic some aspects of organ development. To study these rapid changes over the course of organoid development, advanced imaging and analytical tools are critical to accurately monitor the trajectory of organoid growth and investigate disease processes. In this work, we focus on computer vision and machine learning techniques to automatically measure the size and shape of developing spheroids derived from pluripotent stem cells (iPSCs), which are typically the starting material for generating organoid cultures. To facilitate this task, we introduce a composite method that combines the Segment Anything Model (SAM), a general-purpose foundation model, with an existing domain-specific tool. This composite method is evaluated together with several existing tools by testing them on organoid image data and comparing with the results of manual image segmentation. We find that no single existing tool is able to segment the test images with sufficient accuracy across all test conditions, but the newly introduced composite method produces consistent and accurate results for all but a very small fraction of the most challenging images. Finally, we compare the accuracy of this method to the variability between manual segmentations by independent annotators (inter-observer variability) and find that by one measure it performs at the level of inter-observer variability and by others it performs very close to it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a composite segmentation approach that combines the Segment Anything Model (SAM) with an existing domain-specific tool for automated analysis of organoid images derived from iPSCs. It evaluates the method against several existing tools and manual segmentations, claiming consistent and accurate results for all but a small fraction of challenging images, with performance reaching inter-observer variability on one metric and approaching it on others.

Significance. If the quantitative results hold with representative data, the work could enable scalable, reproducible monitoring of organoid morphology in developmental biology and disease modeling, reducing dependence on manual annotation. The direct comparison to inter-observer variability and use of a foundation model adapted to the domain are notable strengths that support practical utility.

major comments (2)
  1. Abstract: The central claim that the composite method 'performs at the level of inter-observer variability' by one measure and 'very close to it' by others is not supported by any reported numerical values for the metrics (Dice, IoU, boundary error, etc.), dataset cardinality, image selection protocol, or characterization of the 'very small fraction' of failure cases. These details are load-bearing for evaluating statistical robustness and external validity of the human-parity assertion.
  2. Evaluation section: The manuscript adopts inter-observer variability as the benchmark for acceptable automated performance without explicit justification or analysis of whether this reference distribution aligns with biological requirements (e.g., tolerance for error in downstream growth trajectory studies); this assumption requires concrete support to sustain the parity conclusion.
minor comments (2)
  1. Clarify the precise integration mechanism between SAM and the domain-specific tool, including any post-processing steps or parameter choices, to support reproducibility.
  2. Add representative failure-case images and quantitative breakdowns by image difficulty or morphology type to the results or supplementary material.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review of our manuscript. We address the major comments point by point below, proposing revisions to enhance the clarity and support for our claims.

read point-by-point responses
  1. Referee: Abstract: The central claim that the composite method 'performs at the level of inter-observer variability' by one measure and 'very close to it' by others is not supported by any reported numerical values for the metrics (Dice, IoU, boundary error, etc.), dataset cardinality, image selection protocol, or characterization of the 'very small fraction' of failure cases. These details are load-bearing for evaluating statistical robustness and external validity of the human-parity assertion.

    Authors: We acknowledge that the abstract would be improved by including specific supporting details. The manuscript does report the numerical values for the metrics, the dataset cardinality and selection protocol, and the characterization of failure cases in the Evaluation and Results sections. To make these load-bearing details immediately available to readers, we will revise the abstract to include key quantitative results and a summary of the dataset and failure cases. revision: yes

  2. Referee: Evaluation section: The manuscript adopts inter-observer variability as the benchmark for acceptable automated performance without explicit justification or analysis of whether this reference distribution aligns with biological requirements (e.g., tolerance for error in downstream growth trajectory studies); this assumption requires concrete support to sustain the parity conclusion.

    Authors: The use of inter-observer variability as a benchmark is standard practice in the field for assessing automated segmentation performance against human experts. However, we agree that explicit justification and analysis of its alignment with biological requirements would strengthen the manuscript. We will add a new paragraph in the Evaluation section that justifies this choice with references to prior work and discusses its implications for downstream applications such as growth trajectory studies. revision: yes

Circularity Check

0 steps flagged

No circularity: evaluation uses independent external manual annotations and inter-observer benchmarks

full rationale

The paper introduces a composite SAM-based segmentation method and evaluates it empirically on organoid images by direct comparison to manual segmentations performed by independent annotators. Performance is reported relative to inter-observer variability as an external reference. No equations, parameter fitting, or derivations are described that reduce the reported accuracy to the method's own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim rests on external test data and human annotations rather than any self-referential loop, satisfying the criteria for a self-contained empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions in image segmentation evaluation rather than new postulates.

axioms (1)
  • domain assumption Manual segmentations by independent annotators constitute a reliable reference standard for measuring automated accuracy.
    Used directly as the benchmark for human parity.

pith-pipeline@v0.9.0 · 5588 in / 1052 out tokens · 52465 ms · 2026-05-08T18:46:17.589367+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 28 canonical work pages · 2 internal anchors

  1. [1]

    Modeling Development and Disease with Organoids

    Clevers H. Modeling Development and Disease with Organoids. Cell. 2016;165(7):1586–1597. doi:10.1016/j.cell.2016.05.082

  2. [2]

    Organoids: Modeling Development and the Stem Cell Niche in a Dish

    Kretzschmar K, Clevers H. Organoids: Modeling Development and the Stem Cell Niche in a Dish. Developmental Cell. 2016;38(6):590–600. doi:10.1016/j.devcel.2016.08.014

  3. [3]

    Organoids — Preclinical Models of Human Disease

    Li M, Belmonte JCI. Organoids — Preclinical Models of Human Disease. New England Journal of Medicine. 2019;380(6):569–579. doi:10.1056/NEJMra1806175

  4. [5]

    Engineering organoids

    Hofer M, Lutolf MP. Engineering organoids. Nature Reviews Materials. 2021;6(5):402–420. doi:10.1038/s41578-021-00279-y

  5. [6]

    Importance of Organoids for Personalized Medicine

    Perkhofer L, F Pierre-Olivier, M Martin, , Kleger A. Importance of Organoids for Personalized Medicine. Personalized Medicine. 2018;15(6):461–465. doi:10.2217/pme-2018-0071

  6. [7]

    Oral Mucosal Organoids as a Potential Platform for Personalized Cancer Therapy

    Driehuis E, Kolders S, Spelier S, L˜ ohmussaar K, Willems SM, Devriese LA, et al. Oral Mucosal Organoids as a Potential Platform for Personalized Cancer Therapy. Cancer Discovery. 2019;9(7):852–871. doi:10.1158/2159-8290.CD-18-1522

  7. [8]

    Organoid based personalized medicine: from bench to bedside

    Li Y, Tang P, Cai S, Peng J, Hua G. Organoid based personalized medicine: from bench to bedside. Cell Regeneration. 2020;9(1):21. doi:10.1186/s13619-020-00059-z

  8. [9]

    Organoid-based personalized medicine: from tumor outcome prediction to autologous transplantation

    Soto-Gamez A, Gunawan JP, Barazzuol L, Pringle S, Coppes RP. Organoid-based personalized medicine: from tumor outcome prediction to autologous transplantation. Stem Cells. 2024;42(6):499–508. doi:10.1093/stmcls/sxae023

  9. [10]

    Patient-Derived Organoids: A Game-Changer in Personalized Cancer Medicine

    Abbasian MH, Sobhani N, Sisakht MM, D’Angelo A, Sirico M, Roudi R. Patient-Derived Organoids: A Game-Changer in Personalized Cancer Medicine. Stem Cell Reviews and Reports. 2025;21(1):211–225. doi:10.1007/s12015-024-10805-4

  10. [11]

    Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

    Sarker IH. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science. 2021;2(6):1–20. doi:10.1007/s42979-021-00815-1. May 6, 2026 24/26

  11. [12]

    A review on deep learning in medical image analysis

    Suganyadevi S, Seethalakshmi V, Balasamy K. A review on deep learning in medical image analysis. International Journal of Multimedia Information Retrieval. 2021;11(1):19–38. doi:10.1007/s13735-021-00218-1

  12. [13]

    Deep Learning Applications in Medical Image Analysis

    Ker J, Wang L, Rao J, Lim T. Deep Learning Applications in Medical Image Analysis. IEEE Access. 2018;6:9375–9389. doi:10.1109/ACCESS.2017.2788044

  13. [14]

    Deep learning for cellular image analysis

    Moen E, Bannon D, Kudo T, Graf W, Covert M, Van Valen D. Deep learning for cellular image analysis. Nature Methods. 2019;16(12):1233–1246. doi:10.1038/s41592-019-0403-1

  14. [15]

    Frontiers| CNN-Based Cell Analysis: From Image to Quantitative Representation

    Allier C, Herv´ e L, Paviolo C, Mandula O, Cioni O, Pierr´ e W, et al. Frontiers| CNN-Based Cell Analysis: From Image to Quantitative Representation. Frontiers in Physics. 2022;doi:10.3389/fphy.2021.776805

  15. [16]

    Growth of Epithelial Organoids in a Defined Hydrogel

    Broguiere N, Isenmann L, Hirt C, Ringel T, Placzek S, Cavalli E, et al. Growth of Epithelial Organoids in a Defined Hydrogel. Advanced Materials. 2018;30(43):1801621. doi:10.1002/adma.201801621

  16. [17]

    MOrgAna: accessible quantitative analysis of organoids with machine learning

    Gritti N, Lim JL, Anla¸ s K, Pandya M, Aalderink G, Mart´ ınez-Ara G, et al. MOrgAna: accessible quantitative analysis of organoids with machine learning. Development. 2021;148(18):dev199611. doi:10.1242/dev.199611

  17. [18]

    Development of a deep learning based image processing tool for enhanced organoid analysis

    Park T, Kim TK, Han YD, Kim KA, Kim H, Kim HS. Development of a deep learning based image processing tool for enhanced organoid analysis. Scientific Reports. 2023;13(1):19841. doi:10.1038/s41598-023-46485-2

  18. [19]

    OrganoID: A versatile deep learning platform for tracking and analysis of single-organoid dynamics

    Matthews JM, Schuster B, Kashaf SS, Liu P, Ben-Yishay R, Ishay-Ronen D, et al. OrganoID: A versatile deep learning platform for tracking and analysis of single-organoid dynamics. PLOS Computational Biology. 2022;18(11):e1010584. doi:10.1371/journal.pcbi.1010584

  19. [20]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Munich, Germany: Springer International Publishing; 2015. p. 234–241. Available from:http://arxiv.org/abs/1505.04597

  20. [21]

    MSU-Net: Multi-Scale U-Net for 2D Medical Image Segmentation

    Su R, Zhang D, Liu J, Cheng C. MSU-Net: Multi-Scale U-Net for 2D Medical Image Segmentation. Frontiers in Genetics. 2021;doi:10.3389/fgene.2021.639930

  21. [22]

    Segment Anything

    Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment Anything. Preprint arXiv:230402643. 2023;doi:10.48550/arXiv.2304.02643

  22. [23]

    Segment Anything for Microscopy

    Archit A, Freckmann L, Nair S, Khalid N, Hilt P, Rajashekar V, et al. Segment Anything for Microscopy. Nature Methods. 2025;22(3):579–591. doi:10.1038/s41592-024-02580-4

  23. [24]

    Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

    Liu S, Zeng Z, Ren T, Li F, Zhang H, Yang J, et al.. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

  24. [25]

    Available from:http://arxiv.org/abs/2303.05499

  25. [26]

    Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    Ren T, Liu S, Zeng A, Lin J, Li K, Cao H, et al.. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks; 2024. Available from: http://arxiv.org/abs/2401.14159. May 6, 2026 25/26

  26. [27]

    Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO; 2024

    Mumuni F, Mumuni A. Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO; 2024. Available from:http://arxiv.org/abs/2406.19057

  27. [28]

    Automated microfluidic platform for dynamic and combinatorial drug screening of tumor organoids

    Schuster B, Junkin M, Kashaf SS, Romero-Calvo I, Kirby K, Matthews J, et al. Automated microfluidic platform for dynamic and combinatorial drug screening of tumor organoids. Nature Communications. 2020;11(1):5271. doi:10.1038/s41467-020-19058-4

  28. [29]

    A versatile polypharmacology platform promotes cytoprotection and viability of human pluripotent and differentiated cells

    Chen Y, Tristan CA, Chen L, Jovanovic VM, Malley C, Chu PH, et al. A versatile polypharmacology platform promotes cytoprotection and viability of human pluripotent and differentiated cells. Nat Methods. 2021;18(5):528–541. doi:10.1038/s41592-021-01126-2

  29. [30]

    scikit-image: image processing in Python,

    Van Der Walt S, Sch¨ onberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;2:e453. doi:10.7717/peerj.453. May 6, 2026 26/26