Multi-scale Cell Instance Segmentation with Keypoint Graph based Bounding Boxes
Pith reviewed 2026-05-24 18:29 UTC · model grok-4.3
The pith
A keypoint graph groups five detected points per cell into bounding boxes that then guide instance segmentation inside those boxes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We first detect the five pre-defined points of a cell via keypoints detection. Then we group these points according to a keypoint graph and subsequently extract the bounding box for each cell. Finally, cell segmentation is performed on feature maps within the bounding boxes, producing superior results compared with other instance segmentation techniques on two cell datasets.
What carries the argument
The keypoint graph that groups the five detected points into per-cell bounding boxes before segmentation occurs inside those boxes.
If this is right
- Touching cells are separated more reliably than by methods that segment without boxes.
- Class imbalance problems of anchor-based detectors are avoided.
- The same pipeline works across cell datasets that have visibly different shapes.
- Segmentation computation is restricted to the interior of the extracted boxes.
Where Pith is reading between the lines
- The same five-point plus graph construction could be tested on non-cell objects that have consistent landmark positions.
- If the graph grouping step is made differentiable, end-to-end training might further reduce errors on crowded scenes.
- The bounding-box restriction might also cut memory use in high-resolution whole-slide images.
Load-bearing premise
Five pre-defined keypoints can be detected reliably on every cell and the graph will group them correctly even when cells touch or overlap.
What would settle it
On a held-out set of images containing many overlapping cells, the method produces more merged instances than a strong anchor-based box segmentation baseline.
Figures
read the original abstract
Most existing methods handle cell instance segmentation problems directly without relying on additional detection boxes. These methods generally fails to separate touching cells due to the lack of global understanding of the objects. In contrast, box-based instance segmentation solves this problem by combining object detection with segmentation. However, existing methods typically utilize anchor box-based detectors, which would lead to inferior instance segmentation performance due to the class imbalance issue. In this paper, we propose a new box-based cell instance segmentation method. In particular, we first detect the five pre-defined points of a cell via keypoints detection. Then we group these points according to a keypoint graph and subsequently extract the bounding box for each cell. Finally, cell segmentation is performed on feature maps within the bounding boxes. We validate our method on two cell datasets with distinct object shapes, and empirically demonstrate the superiority of our method compared to other instance segmentation techniques. Code is available at: https://github.com/yijingru/KG_Instance_Segmentation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a box-based cell instance segmentation pipeline: detect five pre-defined keypoints per cell, group them with a keypoint graph to extract per-cell bounding boxes, then run segmentation inside those boxes. It argues this avoids the touching-cell failures of direct instance segmentation and the class-imbalance problems of anchor-based detectors, and reports empirical superiority on two cell datasets with different shapes.
Significance. If the quantitative claims hold, the keypoint-graph approach could provide a practical way to obtain instance boxes without anchors, potentially improving separation of touching cells in dense biomedical images. Code release is a positive factor for reproducibility.
major comments (2)
- [Abstract] Abstract: the central claim of superiority over other instance segmentation techniques is stated without any quantitative metrics (AP, Dice, IoU, etc.), error bars, ablation results, or breakdown by touching-cell subsets, making it impossible to evaluate the asserted performance gains.
- [Method] Pipeline description (method section): the load-bearing assumptions that five keypoints are reliably detected on every cell and that the keypoint graph correctly clusters points even when cells touch or overlap are presented without per-stage recall/precision numbers or failure analysis on dense/overlapping regions; any systematic error here would directly produce incorrect or merged boxes and negate the claimed advantage.
minor comments (2)
- [Abstract] Abstract: grammatical error ('These methods generally fails' should be 'fail').
- [Title] Title refers to 'Multi-scale' but the abstract and pipeline description do not clarify where or how multi-scale processing is applied.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight opportunities to strengthen the presentation of our results and intermediate validation. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of superiority over other instance segmentation techniques is stated without any quantitative metrics (AP, Dice, IoU, etc.), error bars, ablation results, or breakdown by touching-cell subsets, making it impossible to evaluate the asserted performance gains.
Authors: We agree that the abstract would be strengthened by including quantitative support for the superiority claims. The current abstract emphasizes the methodological novelty and high-level empirical demonstration but omits specific numbers. In the revised version we will add key metrics (AP, Dice, IoU) with comparisons to baselines, note the presence of error bars or standard deviations from our experiments, and briefly reference the touching-cell performance gains shown in the main text and supplementary material. revision: yes
-
Referee: [Method] Pipeline description (method section): the load-bearing assumptions that five keypoints are reliably detected on every cell and that the keypoint graph correctly clusters points even when cells touch or overlap are presented without per-stage recall/precision numbers or failure analysis on dense/overlapping regions; any systematic error here would directly produce incorrect or merged boxes and negate the claimed advantage.
Authors: The referee correctly identifies that the reliability of the two core stages is central to the approach. While the manuscript reports strong end-to-end instance segmentation results on two datasets, it does not isolate recall/precision for keypoint detection or graph clustering, nor does it provide a dedicated failure analysis on dense/overlapping regions. We will add these per-stage metrics and a short failure-case discussion in the revised method section to directly address potential systematic errors. revision: yes
Circularity Check
Empirical pipeline with no self-referential derivations
full rationale
The paper proposes a sequential computer-vision pipeline (keypoint detection of five fixed points, graph-based grouping, box extraction, then mask prediction inside boxes) and validates it empirically on two external cell datasets. No equations, fitted parameters, or self-citations are described that would reduce any reported performance metric to a definition or input by construction. The central claims rest on standard detection and segmentation stages whose correctness is assessed by external benchmarks rather than internal redefinition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88(2), 303–338 (2010)
work page 2010
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
-
[9]
Payer, C., ˇStern, D., Neff, T., Bischof, H., Urschler, M.: Instance segmentation and tracking with cosine embeddings and recurrent hourglass networks. In: MICCAI. pp. 3–11. Springer (2018)
work page 2018
-
[10]
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI. pp. 234–241. Springer (2015) Title Suppressed Due to Excessive Length 9
work page 2015
-
[11]
Schmidt, U., Weigert, M., Broaddus, C., Myers, G.: Cell detection with star-convex polygons. In: MICCAI. pp. 265–273. Springer (2018)
work page 2018
-
[12]
Yi, J., Wu, P., Jiang, M., Huang, Q., Hoeppner, D.J., Metaxas, D.N.: Attentive neural cell instance segmentation. Medical image analysis (2019)
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.