arxiv: 2605.14717 · v1 · submitted 2026-05-14 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning

Saqib Nazir , Ardhendu Behera

Authors on Pith no claims yet

Pith reviewed 2026-05-15 04:39 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords label-free imagingsingle-cell phenotypingmulti-task learningwhite blood cell classificationprotein expression regressiondifferential phase contrasthybrid CNN-transformerdeep learning

0 comments

The pith

A deep learning model jointly classifies white blood cell types and regresses protein expression levels from label-free DPC images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multi-task deep learning system that takes label-free differential phase contrast images and outputs both discrete cell-type labels for white blood cells and continuous values for protein expression. This setup seeks to replace fluorescence staining with a scalable imaging approach that infers molecular information directly from cell morphology. The architecture merges local texture details from convolutional layers with global context from transformers, using a gating module to combine them, and adds an LLM component to produce biological summaries of the outputs. Reported results on two public benchmarks reach 91.3 percent classification accuracy and 0.72 Pearson correlation for CD16 regression, indicating that morphological cues in bright-field images can support both identification and quantitative phenotyping tasks.

Core claim

A hybrid convolutional-transformer model with learnable cross-branch gating jointly solves white blood cell classification and continuous protein-expression regression from label-free differential phase contrast images, reaching 91.3 percent accuracy and 0.72 correlation on the BSCCM benchmark while also generating LLM-based biological summaries of the predicted states.

What carries the argument

Hybrid architecture that fuses convolutional fine-grained texture features with transformer-based global representations through a learnable cross-branch gating module.

If this is right

Simultaneous cell-type identification and quantitative biomarker estimation become possible from a single unstained image.
Hematological profiling can proceed without fluorescent reagents, lowering cost and complexity in routine analysis.
LLM-generated summaries supply biologically interpretable descriptions alongside the numerical predictions.
The same framework can be applied to other label-free imaging modalities once similar paired morphology-protein data exist.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the morphology-to-protein mapping holds, the method could be extended to live-cell time-lapse sequences to track changing phenotypes without repeated staining.
The gating mechanism may reveal which image regions most strongly drive the protein predictions, offering visual explanations for the inferred molecular states.
Performance on the two reported benchmarks suggests the approach could support large-scale screening studies where staining throughput is a bottleneck.
Integration with existing flow-cytometry datasets could serve as a calibration step to improve regression accuracy on new cell populations.

Load-bearing premise

The bright-field morphology visible in DPC images contains enough information to accurately predict molecular protein expression levels.

What would settle it

A collection of cells with nearly identical DPC morphology but substantially different measured protein levels, on which the regression head would show near-zero correlation.

Figures

Figures reproduced from arXiv: 2605.14717 by Ardhendu Behera, Saqib Nazir.

**Figure 2.** Figure 2: a. confusion matrix on the BSCCM test split. b. One-vs-rest ROC curves. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Predicted vs. true protein expression for several markers. Most lineage [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Per-class Z-score distributions. CD16 shows strong granulocyte enrich [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Ridge plots showing density estimates per marker and cell type. CD16 [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Label-free single-cell imaging offers a scalable, non-invasive alternative to fluorescence-based cytometry, yet inferring molecular phenotypes directly from bright-field morphology remains challenging. We present a unified Deep Learning (DL) framework that jointly performs White Blood Cell (WBC) classification and continuous protein-expression regression from label-free Differential Phase Contrast (DPC) images. Our model employs a Hybrid architecture that fuses convolutional fine-grained texture features with transformer-based global representations through a learnable cross-branch gating module, enabling robust morpho-molecular inference from DPC images. To support downstream interpretability, we further incorporate a Large Language Model (LLM) that generates concise, biologically grounded summaries of the predicted cell states. Experiments on the Berkeley Single Cell Computational Microscopy (BSCCM) and Blood Cells Image benchmarks demonstrate strong performance, achieving a 91.3% WBC classification accuracy and a 0.72 Pearson correlation for CD16 expression regression on BSCCM. These results underscore the promise of label-free single-cell imaging for cost-effective hematological profiling, enabling simultaneous phenotype identification and quantitative biomarker estimation without fluorescent staining. The source code is available at https://github.com/saqibnaziir/Single-Cell-Phenotyping.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The hybrid CNN-transformer with gating delivers usable numbers on label-free DPC images but the protein regression claim needs ablations to show it goes beyond cell-type correlations.

read the letter

The paper presents a hybrid CNN-transformer model with learnable cross-branch gating for joint white blood cell classification and protein expression regression from label-free DPC images. They also use an LLM to generate biological summaries of the predictions. On the BSCCM benchmark it hits 91.3% classification accuracy and 0.72 Pearson correlation for CD16 regression. The new part is the gating mechanism to fuse the two branches for this multi-task setup in a label-free context, plus the LLM integration for interpretability. The code release helps too. It does a decent job showing that such a model can produce usable numbers on these tasks. The main concern is whether the regression truly captures continuous protein levels from morphology or if it's mostly leveraging the discrete cell-type information that correlates with those levels. The abstract gives performance numbers but skips ablations that would isolate the regression performance or control for dataset biases. Without those, the morpho-molecular claim rests on shaky ground. The experimental details are also thin in the summary, which makes it tough to judge overfitting or statistical robustness. This work is aimed at researchers in computational microscopy and hematological imaging who want to move away from fluorescence. A reader in that space would find the architecture and results worth looking at, though they'd need the full paper to assess reproducibility. I think it deserves peer review. The idea is timely and the results are concrete, so referees could help tighten the validation around the regression task.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a multi-task deep learning framework for joint white blood cell classification and continuous protein-expression regression from label-free differential phase contrast (DPC) images. It employs a hybrid convolutional-transformer architecture with a learnable cross-branch gating module and incorporates an LLM to generate biologically grounded summaries of predicted cell states. On the BSCCM benchmark, it reports 91.3% classification accuracy and 0.72 Pearson correlation for CD16 regression, with code released at the provided GitHub link.

Significance. If the central results hold under rigorous validation, the work could support scalable, cost-effective label-free hematological profiling by enabling simultaneous phenotype identification and quantitative biomarker estimation without fluorescent staining. The multi-task formulation and LLM-based interpretability are positive elements, but the claim that DPC morphology encodes sufficient signal for continuous protein regression (independent of cell-type morphology) requires stronger evidence to be considered established.

major comments (3)

[Abstract] Abstract: The reported 91.3% WBC classification accuracy and 0.72 Pearson correlation for CD16 expression are presented without any description of experimental setup, train/test splits, baseline comparisons, statistical tests, or controls for overfitting, making it impossible to assess whether the joint multi-task results support the morpho-molecular inference claim.
[Architecture and Experiments] Architecture and Experiments sections: The hybrid conv-transformer with cross-branch gating is claimed to enable robust inference, yet no ablation studies isolate the regression head's performance from features learned primarily for classification, nor do they test whether continuous protein levels (e.g., CD16) can be regressed independently of discrete cell-type morphology.
[Results and Discussion] Results and Discussion: The premise that bright-field DPC morphology contains direct signal for continuous protein regression (beyond cell-type correlation) is not supported by controls for dataset biases such as staining-derived labels or single-task regression baselines; the 0.72 correlation therefore does not yet establish the central morpho-molecular mapping.

minor comments (2)

[Abstract] Abstract: The phrase 'strong performance' is subjective; replace with quantitative comparison to prior single-task or label-free methods if available.
[Methods] The LLM integration for summaries is mentioned but not evaluated for factual accuracy or biological relevance; a small human evaluation or example outputs would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive and detailed review of our manuscript. We appreciate the emphasis on strengthening the experimental details, ablations, and controls to better support the central claims. We have revised the manuscript accordingly and provide point-by-point responses below.

read point-by-point responses

Referee: [Abstract] Abstract: The reported 91.3% WBC classification accuracy and 0.72 Pearson correlation for CD16 expression are presented without any description of experimental setup, train/test splits, baseline comparisons, statistical tests, or controls for overfitting, making it impossible to assess whether the joint multi-task results support the morpho-molecular inference claim.

Authors: We agree that the abstract was overly concise. In the revised manuscript, we have expanded the abstract to include a brief description of the experimental setup (5-fold cross-validation on the BSCCM dataset), train/test splits (80/20), baseline comparisons to single-task models, and statistical significance testing (paired t-tests with p < 0.01). Overfitting controls (dropout, weight decay, and early stopping) are now referenced. These additions clarify how the joint multi-task results support the morpho-molecular inference claim, with full details remaining in Sections 3 and 4. revision: yes
Referee: [Architecture and Experiments] Architecture and Experiments sections: The hybrid conv-transformer with cross-branch gating is claimed to enable robust inference, yet no ablation studies isolate the regression head's performance from features learned primarily for classification, nor do they test whether continuous protein levels (e.g., CD16) can be regressed independently of discrete cell-type morphology.

Authors: We have added new ablation studies in the revised Experiments section (new Table 3 and Figure 5). These isolate the regression head by freezing the classification branch (showing a drop in Pearson correlation to 0.59, confirming the gating module's contribution) and include within-cell-type regression experiments (e.g., CD16 regression restricted to neutrophils alone, yielding 0.68 correlation). This demonstrates that the model captures protein expression signals beyond discrete cell-type morphology differences. revision: yes
Referee: [Results and Discussion] Results and Discussion: The premise that bright-field DPC morphology contains direct signal for continuous protein regression (beyond cell-type correlation) is not supported by controls for dataset biases such as staining-derived labels or single-task regression baselines; the 0.72 correlation therefore does not yet establish the central morpho-molecular mapping.

Authors: We have strengthened the evidence in the revised Results and Discussion sections. We now report single-task regression baselines, where the multi-task model outperforms (0.72 vs. 0.63 Pearson correlation). The protein labels come from independent flow cytometry on separate aliquots, not from staining the DPC images. We added controls in Supplementary Material S4 that match for morphological covariates (cell area, eccentricity) across expression bins. While complete isolation of all confounders remains challenging in real-world data, these additions provide stronger support for the morpho-molecular mapping. We have also moderated the language in the discussion to reflect the evidence level. revision: partial

Circularity Check

0 steps flagged

No circularity in empirical multi-task DL framework

full rationale

The paper describes a standard supervised multi-task neural network trained end-to-end on labeled DPC image datasets (BSCCM and Blood Cells Image benchmarks). Reported metrics (91.3% classification accuracy, 0.72 Pearson correlation) are obtained via conventional train/test splits and evaluation protocols. No equations, uniqueness theorems, or self-citations are invoked to derive the central claims; the hybrid conv-transformer architecture and cross-branch gating are presented as design choices whose performance is measured externally. The framework is self-contained against independent benchmarks and code release, with no reduction of predictions to fitted inputs or definitional loops.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The claim depends on standard supervised learning assumptions and the specific architectural choices whose parameters are optimized on benchmark data.

free parameters (2)

learnable cross-branch gating parameters
Parameters in the gating module are learned from data to fuse features.
model weights
Deep neural network parameters fitted during training on the datasets.

axioms (1)

domain assumption DPC images capture morphological features sufficient for phenotype inference
The framework assumes that label-free morphology correlates with molecular states like protein expression.

pith-pipeline@v0.9.0 · 5511 in / 1221 out tokens · 46948 ms · 2026-05-15T04:39:01.992450+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

achieving a 91.3% WBC classification accuracy and a 0.72 Pearson correlation for CD16 expression regression

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

[1]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Chen, J., et al.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

Lab on a Chip24(5), 924–932 (2024)

Ciaparrone, G., et al.: Label-free cell classification in holographic flow cytometry through an unbiased learning strategy. Lab on a Chip24(5), 924–932 (2024)

work page 2024
[3]

arXiv: Learning (2016),https: //api.semanticscholar.org/CorpusID:125617073

Dan, H., et al.: Gaussian error linear units (gelus). arXiv: Learning (2016),https: //api.semanticscholar.org/CorpusID:125617073

work page 2016
[4]

In: International Conference on Learning Representations (2021)

Dosovitskiy, A., et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations (2021)

work page 2021
[5]

Google DeepMind: Gemini 2.5 pro model card.https://deepmind.google/(2024) Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning 15

work page 2024
[6]

Artificial Intelligence in Medicine111, 102005 (2021)

Habibzadeh, M., et al.: A Review on Automatic Analysis of Blood Cells: From Image Acquisition to Classification. Artificial Intelligence in Medicine111, 102005 (2021)

work page 2021
[7]

bioRxiv pp

Kobayashi-Kirschvink, K.J., et al.: Raman2rna: Live-cell label-free prediction of single-cell rna expression profiles by raman microscopy. bioRxiv pp. 2021–11 (2021)

work page 2021
[8]

Scientific reports12(1), 1123 (2022)

Kouzehkanan,Z.,etal.:Alargedatasetofwhitebloodcellscontainingcelllocations and types, along with segmented nuclei and cytoplasm. Scientific reports12(1), 1123 (2022)

work page 2022
[9]

Journal of Biomedical Informatics (2023)

Li, Y., et al.: Clinical-t5: A text-to-text transformer for clinical language under- standing. Journal of Biomedical Informatics (2023)

work page 2023
[10]

In: Proceedings of the IEEE international conference on computer vision

Lin, T.Y., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp. 2980–2988 (2017)

work page 2017
[11]

Briefings in Bioinformatics (2022)

Luo, R., et al.: Biogpt: Generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics (2022)

work page 2022
[12]

IEEE transactions on medical imaging38(2), 448–459 (2018)

Naylor, P., et al.: Segmentation of nuclei in histopathology images by deep regres- sion of the distance map. IEEE transactions on medical imaging38(2), 448–459 (2018)

work page 2018
[13]

Neurocomputing p

Nazir, S., et al.: 3dgeomeshnet: A multi-scale graph auto-encoder for 3d mesh reconstruction and completion. Neurocomputing p. 132652 (2026)

work page 2026
[14]

In: Bioimaging 2026

Nazir, S., et al.: Attention-guided u-net for cell nucleus segmentation in microscopy images. In: Bioimaging 2026. SCITEPRESS (2026)

work page 2026
[15]

In: IEEE International Symposium on Biomedical Imaging (ISBI)

Nazir, S., et al.: Hybrid inception-vit networks for fine-grained single-cell image classification. In: IEEE International Symposium on Biomedical Imaging (ISBI). IEEE (2026)

work page 2026
[16]

arXiv preprint arXiv:2402.06191 (2024)

Pinkard, H., et al.: The berkeley single cell computational microscopy (bsccm) dataset. arXiv preprint arXiv:2402.06191 (2024)

work page arXiv 2024
[17]

Computers in Biology and Medicine136, 104650 (2021)

Razzak, M.I., et al.: Raabin-WBC: A Large Dataset for White Blood Cells Classi- fication. Computers in Biology and Medicine136, 104650 (2021)

work page 2021
[18]

Light: Science & Applications8, 23 (2019)

Rivenson, Y., et al.: Phasestain: the digital staining of label-free quantitative phase microscopy images using deep learning. Light: Science & Applications8, 23 (2019)

work page 2019
[19]

Heliyon9(8), e18297 (2023)

Ryu,D.,etal.:Deeplearning-basedlabel-freehematologyanalysisframeworkusing optical diffraction tomography. Heliyon9(8), e18297 (2023)

work page 2023
[20]

In: International Conference on Learning Representations (2015)

Simonyan, K., et al.: Very Deep Convolutional Networks for Large-Scale Image Recognition. In: International Conference on Learning Representations (2015)

work page 2015
[21]

Rethinking the Inception Architecture for Computer Vision

Szegedy, C., et al.: Rethinking the inception architecture for computer vision (2015). arXiv preprint arXiv:1512.00567 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

BMC Methods1(1), 17 (2024)

Tomkinson, J., et al.: Toward generalizable phenotype prediction from single-cell morphology representations. BMC Methods1(1), 17 (2024)

work page 2024
[23]

In: MICCAI (2021)

Valanarasu, J., et al.: Medt: Context gated transformer for medical image segmen- tation. In: MICCAI (2021)

work page 2021
[24]

CVPR (2020)

Wang, Q., et al.: Eca-net: Efficient channel attention for deep convolutional neural networks. CVPR (2020)

work page 2020
[25]

Journal of Advanced Research (2025)

Xing, X.d., et al.: Deep-dpc: Deep learning-assisted label-free temporal imaging discovery of anti-fibrotic compounds by controlling cell morphology. Journal of Advanced Research (2025)

work page 2025
[26]

In: EMNLP 2023

Yan, B., et al.: Style-aware radiology report generation with radgraph and few-shot prompting. In: EMNLP 2023. pp. 14676–14688 (2023)

work page 2023
[27]

Cell Reports Methods (2022)

Zhang, W., et al.: Protein expression prediction from imaging flow cytometry using deep learning. Cell Reports Methods (2022)

work page 2022
[28]

Zhou,L.,etal.:Multi-taskLearningforMedicalImageAnalysis:ASurvey.Medical Image Analysis70, 101992 (2021)

work page 2021