pith. sign in

arxiv: 2606.10735 · v1 · pith:6ETSGMYMnew · submitted 2026-06-09 · 💻 cs.CV · physics.med-ph

Patient-Level Diagnosis of Acute Myeloid Leukemia via Deep Learning Analysis of Bone Marrow Smear

Pith reviewed 2026-06-27 13:33 UTC · model grok-4.3

classification 💻 cs.CV physics.med-ph
keywords acute myeloid leukemiabone marrow smeardeep learningcell classificationpatient-level diagnosisYOLOEfficientNetcomposite blast-like cells
0
0 comments X

The pith

A deep learning pipeline aggregates cell classifications into CBLC ratios to support patient-level AML diagnosis from bone marrow smears.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a cell-to-patient pipeline that detects individual cells in bone marrow smear images and classifies them to estimate the ratio of an expert-defined composite blast-like cell category. This ratio is then used to assist in diagnosing acute myeloid leukemia at the patient level rather than relying on manual single-cell review. The approach trains a YOLO detector followed by an EfficientNet classifier using a two-stage strategy that corrects for class imbalance and incorporates morphology supervision. Validation on an external cohort from three additional centers shows the method maintains performance when moving beyond the training data. The work focuses on turning many cellular observations into a single diagnostic ratio that pathologists could use for AML assessment.

Core claim

By defining a composite category of blast-like cells (CBLC) from eight specific morphological types and training a YOLO segmentation plus EfficientNet classification pipeline on that target, cell predictions can be aggregated into patient-level CBLC ratios that support AML diagnosis. The pipeline produces stable results internally and generalizes externally, reaching ensemble weighted F1-scores of 0.9076, 0.8696, and 0.9124 on the three held-out centers.

What carries the argument

YOLO-based cell detection matched to expert contours followed by EfficientNet-B0 classification of the expert-defined CBLC composite category, with patient-level aggregation of the resulting cell ratios.

If this is right

  • The same pipeline can be applied to new centers without retraining while retaining F1 performance above 0.86.
  • Patient diagnosis can be derived from the ratio of one composite cell category instead of exhaustive manual counting of all cell types.
  • Two-stage training with contour matching and morphology supervision produces consistent single-cell crops across variable smear preparations.
  • Ensemble weighting of the classifier outputs improves the final patient-level F1 scores on external data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The CBLC ratio could be tracked over serial smears to monitor disease progression or treatment response without new model training.
  • Similar composite-category targeting might reduce annotation burden when adapting the pipeline to related blood disorders.
  • Embedding the ratio output into existing laboratory software could allow pathologists to flag cases for review rather than replace their judgment.

Load-bearing premise

The expert grouping of eight specific cell types into a single CBLC composite accurately serves as a morphological proxy for AML diagnosis.

What would settle it

A blinded comparison in which pathologists diagnose AML from the same smears without using the model's CBLC ratios and the two sets of diagnoses disagree on a substantial fraction of patients.

Figures

Figures reproduced from arXiv: 2606.10735 by Fajin Tao, Gen Yang, Hongru Chen, Lin An, Qunxian Lu, Tianyi Wang, Weihua Meng, Xiaodong Mo, Yuqi Ma.

Figure 2
Figure 2. Figure 2: Global distribution of the 16 annotated cell categories and CBLC definition. The upper panel shows center- [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall cell-to-patient pipeline for AML-assisted diagnosis using CBLC. The workflow links de-identified bone marrow smear images, fixed cell segmentation, IoU-based contour matching, single-cell crop generation, patient-level five-fold splitting, two-stage EfficientNet-B0 training, cell-level prediction, and patient-level aggregation [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Bone marrow smear review remains important for acute myeloid leukemia (AML) assessment, but manual single-cell interpretation is labor-intensive and patient-level diagnosis requires aggregation of many cellular observations. We present a cell-to-patient deep learning pipeline for AML-assisted diagnosis from bone marrow smear images. The study included 258 patients from six anonymized centers, including a main cohort of 169 patients from Centers 1-3 and an external validation cohort of 89 patients from Centers 4-6. A 16-category cell annotation vocabulary was used to describe the global cellular composition, including granulocytic, monocytic, erythroid, lymphoid, eosinophilic, and other cells. Rather than identifying strict AML blasts or leukemic blasts, the model targets an expert-defined composite category termed Composite Blast-like Cells (CBLC), comprising N, N1, M, M1, R, R1, J, and J1 according to the project-wide morphological standard. A fixed YOLO-based segmentation module detected cells, predicted contours were matched to expert polygon annotations by contour IoU, and standardized single-cell crops were generated. An EfficientNet-B0 classifier was trained through a two-stage GT-to-YOLO and YOLO-to-YOLO strategy with class-imbalance correction, center-border regularization, and morphology-assisted supervision. Cell-level predictions were aggregated into patient-level CBLC ratios for AML-oriented diagnostic support. The pipeline achieved stable internal validation and maintained external generalization, with ensemble weighted F1-scores of 0.9076, 0.8696, and 0.9124 on Centers 4, 5, and 6, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a deep learning pipeline for bone marrow smear analysis to support AML diagnosis. It employs YOLO-based detection followed by EfficientNet-B0 classification of cells into 16 morphological categories, with aggregation of an expert-defined Composite Blast-like Cells (CBLC) group (N, N1, M, M1, R, R1, J, J1) into patient-level ratios. The study uses 258 patients across six centers (169 in internal cohort from Centers 1-3; 89 in external from Centers 4-6), reporting ensemble weighted F1 scores of 0.9076, 0.8696, and 0.9124 on the external centers.

Significance. If the patient-level CBLC ratio aggregation were shown to reliably separate AML from non-AML cases and correlate with expert blast counts, the work would address a clinically relevant task in hematopathology by reducing manual review burden. The multi-center external validation setup and two-stage training strategy with imbalance correction are positive elements that strengthen potential generalizability.

major comments (2)
  1. [Abstract] Abstract and Results: The central claim of a 'cell-to-patient' pipeline supporting AML diagnosis via aggregated CBLC ratios is not supported by any reported patient-level metrics. No table, figure, or text provides AML vs. non-AML classification accuracy, AUC, sensitivity/specificity at diagnostic thresholds, or correlation between automated CBLC percentages and expert blast counts. Only cell-level weighted F1 scores are supplied, which is load-bearing for the title, abstract, and stated purpose.
  2. [Abstract] Abstract: No information is given on per-center patient counts within the external cohort, exact train/validation/test splits, staining normalization procedures, or any baseline comparisons (e.g., against standard blast counting or other classifiers). These omissions prevent evaluation of whether the reported F1 scores reflect robust generalization.
minor comments (1)
  1. [Methods] The definition and morphological criteria for the CBLC composite category could be clarified with an explicit table or figure showing example cells from each subclass to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback on our manuscript describing a cell-to-patient pipeline for bone marrow smear analysis in AML. We address each major comment below and commit to revisions that will strengthen the presentation of patient-level aspects and methodological details.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Results: The central claim of a 'cell-to-patient' pipeline supporting AML diagnosis via aggregated CBLC ratios is not supported by any reported patient-level metrics. No table, figure, or text provides AML vs. non-AML classification accuracy, AUC, sensitivity/specificity at diagnostic thresholds, or correlation between automated CBLC percentages and expert blast counts. Only cell-level weighted F1 scores are supplied, which is load-bearing for the title, abstract, and stated purpose.

    Authors: We acknowledge that the manuscript reports detailed cell-level weighted F1 scores as the primary quantitative result and describes the aggregation into patient-level CBLC ratios for diagnostic support without providing explicit AML vs. non-AML classification metrics, AUC, sensitivity/specificity, or direct correlations with expert blast counts. The cell-level performance is presented as the enabling step for the pipeline. In revision we will add a dedicated results subsection and table reporting patient-level CBLC ratio statistics, Pearson/Spearman correlations with expert blast percentages, and binary diagnostic performance (e.g., sensitivity/specificity at standard blast thresholds) on the external cohorts to directly support the title and abstract claims. revision: yes

  2. Referee: [Abstract] Abstract: No information is given on per-center patient counts within the external cohort, exact train/validation/test splits, staining normalization procedures, or any baseline comparisons (e.g., against standard blast counting or other classifiers). These omissions prevent evaluation of whether the reported F1 scores reflect robust generalization.

    Authors: We will revise the abstract, methods, and supplementary material to specify the per-center breakdown of the 89 external patients across Centers 4-6, the exact patient-level train/validation/test splits used in the two-stage training, the staining normalization procedures applied, and baseline comparisons (including agreement with manual blast counting and at least one alternative classifier). These additions will allow readers to assess generalization more rigorously. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical cell classification with external validation

full rationale

The manuscript presents an empirical ML pipeline: a YOLO segmenter and EfficientNet-B0 classifier are trained on annotated cells from centers 1-3 and evaluated via weighted F1 on held-out centers 4-6. No equations, parameter-fitting steps, or derivations are described that would reduce any output to its inputs by construction. The CBLC category is introduced as an expert-defined grouping whose patient-level ratio is asserted to support diagnosis, but this assertion is not derived from any fitted quantity or self-citation chain within the paper. External-center testing supplies independent evidence, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that the CBLC grouping is clinically meaningful and that cell composition ratios generalize across centers; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The 16-category cell annotation vocabulary and the CBLC composite definition accurately capture morphological features relevant to AML assessment.
    Invoked when the model is trained to predict CBLC rather than strict blasts and when patient-level ratios are used for diagnosis support.

pith-pipeline@v0.9.1-grok · 5858 in / 1310 out tokens · 23683 ms · 2026-06-27T13:33:30.977118+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 10 canonical work pages

  1. [1]

    Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN

    Döhner H, Wei AH, Appelbaum FR, et al. Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN. Blood. 2022;140(12):1345-1377. doi:10.1182/blood.2022016867

  2. [2]

    The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms

    Khoury JD, Solary E, Abla O, et al. The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms. Leukemia. 2022;36:1703-1719. doi:10.1038/s41375-022-01613-1

  3. [3]

    International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data

    Arber DA, Orazi A, Hasserjian RP, et al. International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140(11):1200-1228. doi:10.1182/blood.2022015850

  4. [4]

    2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party

    Heuser M, Freeman SD, Ossenkoppele GJ, et al. 2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party. Blood. 2021;138(26):2753-2767. doi:10.1182/blood.2021013626

  5. [5]

    A survey on deep learning in medical image analysis , journal =

    Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60-88. doi:10.1016/j.media.2017.07.005

  6. [6]

    Dermatologist-level classification of skin cancer with deep neural networks

    Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115-118. doi:10.1038/nature21056

  7. [7]

    In: Proc

    Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention. 2015:234-241. doi:10.1007/978-3-319-24574-4_28

  8. [8]

    Cellpose:ageneralistalgorithmforcellularsegmentation

    Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021;18:100-106. doi:10.1038/s41592-020-01018-x

  9. [9]

    Segment Anything

    Kirillov A, Mintun E, Ravi N, et al. Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:4015-4026

  10. [10]

    You Only Look Once: unified, real-time object detection

    Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788

  11. [11]

    Ultralytics YOLO

    Jocher G, Chaurasia A, Qiu J. Ultralytics YOLO. 2023. Available from: https://github.com/ultralytics/ultralytics

  12. [12]

    EfficientNet: rethinking model scaling for convolutional neural networks

    Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning. 2019;97:6105-6114

  13. [13]

    Focal loss for dense object detection

    Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision. 2017:2980-2988

  14. [14]

    Decoupled weight decay regularization

    Loshchilov I, Hutter F. Decoupled weight decay regularization. International Conference on Learning Representations. 2019

  15. [15]

    PyTorch: an imperative style, high-performance deep learning library

    Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019;32

  16. [16]

    PyTorch Lightning

    Falcon WA, The PyTorch Lightning team. PyTorch Lightning. 2019. Available from: https://github.com/Lightning-AI/pytorch-lightning

  17. [17]

    Proposals for the classification of the acute leukaemias

    Bennett JM, Catovsky D, Daniel MT, et al. Proposals for the classification of the acute leukaemias. French-American-British Cooperative Group. Br J Haematol. 1976;33(4):451-458. doi:10.1111/j.1365-2141.1976.tb03563.x

  18. [18]

    Management of acute promyelocytic leukemia: updated recommendations from an expert panel of the European LeukemiaNet

    Sanz MA, Fenaux P, Tallman MS, et al. Management of acute promyelocytic leukemia: updated recommendations from an expert panel of the European LeukemiaNet. Blood. 2019;133(15):1630-1643. doi:10.1182/blood-2019-01-894980