pith. sign in

arxiv: 2606.23744 · v1 · pith:WYGIRHJJnew · submitted 2026-06-21 · 🧬 q-bio.QM · cs.CV

Performance and Interpretability of Convolutional, Transformer, and Hybrid Deep Learning Models in Colorectal Histology Classification

Pith reviewed 2026-06-26 09:17 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.CV
keywords colorectal histopathologydeep learning classificationtransformer modelsconvolutional neural networksKather datasettransfer learningimage classification
0
0 comments X

The pith

Transformer models reach the highest accuracy in colorectal histology classification but the edge over top CNNs remains small.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates twelve ImageNet-pretrained CNN, transformer, and hybrid models on the Kather dataset of 5,000 colorectal tissue image tiles across eight classes. It applies one fixed transfer-learning and fine-tuning protocol to every model and records accuracy, F1-score, ROC-AUC, and related metrics. Transformer architectures post the strongest numbers overall, with EVA-02 at 97.1 percent accuracy and 97.0 percent F1-score, while ResNet34 and ConvNeXt-Tiny among the CNNs stay within a few points. The work therefore supplies a direct head-to-head benchmark showing that architectural family matters less than expected once modern CNNs are included.

Core claim

All twelve models reached high performance on the eight-class Kather task, with accuracies between 93.2 percent and 97.1 percent. Transformer models produced the best scores across metrics, yet the gap separating the leading transformer from the leading CNN was relatively small. EVA-02 recorded the single highest result; ResNet34 and ConvNeXt-Tiny followed closely among convolutional networks.

What carries the argument

Standardized transfer-learning and fine-tuning protocol applied uniformly to ImageNet-pretrained CNN, transformer, and hybrid models on the Kather colorectal histopathology dataset.

If this is right

  • Modern CNNs can be chosen when model size or inference speed matters more than the last percent of accuracy.
  • Complex Stroma remains the hardest class for every architecture tested.
  • Hybrid models sit between the two families but do not exceed the best pure transformer.
  • The small performance gap implies that further gains may come from data scale or training tricks rather than architecture alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The ranking could shift if the evaluation moved to whole-slide images instead of pre-cropped tiles.
  • Interpretability results promised by the title are not reported in the performance-focused abstract, so separate analysis would be needed to compare explainability across families.
  • Because all models start from ImageNet weights, the benchmark tests fine-tuning behavior more than raw architectural capacity.

Load-bearing premise

A single standardized transfer-learning and fine-tuning protocol on ImageNet-pretrained models produces a fair comparison of architectural families when tested only on the Kather dataset.

What would settle it

Repeating the comparison after training the same models from random initialization or on a different colorectal histology dataset would show whether the current ranking of transformers over CNNs holds.

read the original abstract

Deep learning has become an important tool in computational pathology, enabling automated analysis of histopathological images. While convolutional neural networks (CNNs) have traditionally dominated this field, transformer-based and hybrid architectures have recently demonstrated promising performance. However, comprehensive comparisons of these approaches for colorectal histopathology remain limited. This study evaluated twelve ImageNet-pretrained CNN, transformer, and hybrid architectures using the Kather colorectal histopathology dataset containing 5,000 image tiles from eight tissue classes. All models were trained using a standardized transfer-learning and fine-tuning protocol and assessed using multiple performance metrics, including accuracy, precision, sensitivity, specificity, F1-score, ROC-AUC, Cohen's kappa, and Matthews correlation coefficient. All evaluated models achieved high classification performance, with accuracies ranging from 93.2% to 97.1%. EVA-02 achieved the highest overall performance (97.1% accuracy, 97.0% F1-score), closely followed by ViT-B/16. Among CNNs, ResNet34 and ConvNeXt-Tiny demonstrated highly competitive performance, achieving accuracies of 96.4% and 96.3%, respectively. Transformer architectures generally produced the strongest results across evaluation metrics, although the performance gap between the best transformer and CNN models was relatively small. Per-class analysis showed consistently strong classification performance across all tissue categories, with Complex Stroma representing the most challenging class. Overall, transformer-based architectures achieved the highest predictive performance, whereas modern CNNs provided a favorable balance between accuracy and model complexity. These findings provide a comprehensive benchmark of major deep learning paradigms for colorectal histopathology classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims to provide a comprehensive benchmark comparing twelve ImageNet-pretrained CNN, transformer, and hybrid deep learning models on the Kather colorectal histopathology dataset (5,000 tiles, 8 tissue classes). Using a single standardized transfer-learning and fine-tuning protocol, all models achieve high performance (accuracies 93.2%–97.1%), with EVA-02 (transformer) highest at 97.1% accuracy / 97.0% F1, closely followed by ViT-B/16; top CNNs (ResNet34 96.4%, ConvNeXt-Tiny 96.3%) are competitive. The abstract concludes that transformer architectures generally produce the strongest results across metrics (accuracy, F1, ROC-AUC, kappa, MCC), though gaps are small, modern CNNs balance accuracy and complexity well, and Complex Stroma is the hardest class.

Significance. If the comparison holds, the work supplies a useful multi-metric empirical benchmark for architecture selection in computational pathology, showing transformers can achieve top performance on this task while documenting that recent CNNs remain competitive. The inclusion of Cohen’s kappa, MCC, and per-class results strengthens its reference value.

major comments (1)
  1. [Methods] Methods (standardized protocol): The paper applies one fixed transfer-learning/fine-tuning protocol to all twelve models but does not state whether learning-rate schedules, augmentation strength, regularization, or optimizer settings were held constant across families or tuned separately. This directly affects the central claim that “transformer architectures generally produced the strongest results” (abstract), because the 0.7 pp gap between EVA-02 (97.1 %) and ResNet34 (96.4 %) could reflect protocol bias rather than intrinsic architectural merit; an ablation or per-family hyperparameter search is required to support the architectural ranking.
minor comments (2)
  1. [Abstract] Abstract and Results: No information is given on train-test split ratios, number of random seeds, or statistical significance testing of the reported accuracy differences, weakening the strength of the performance claims.
  2. [Results] Results: The statement that Complex Stroma is the most challenging class is noted, but no confusion matrix, per-class error breakdown, or qualitative error analysis is referenced to support this observation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential reference value of this benchmark. We address the single major comment below.

read point-by-point responses
  1. Referee: [Methods] Methods (standardized protocol): The paper applies one fixed transfer-learning/fine-tuning protocol to all twelve models but does not state whether learning-rate schedules, augmentation strength, regularization, or optimizer settings were held constant across families or tuned separately. This directly affects the central claim that “transformer architectures generally produced the strongest results” (abstract), because the 0.7 pp gap between EVA-02 (97.1 %) and ResNet34 (96.4 %) could reflect protocol bias rather than intrinsic architectural merit; an ablation or per-family hyperparameter search is required to support the architectural ranking.

    Authors: We appreciate the referee drawing attention to the need for greater methodological transparency. The protocol was deliberately standardized and identical for all models: the same learning-rate schedule, augmentation pipeline, regularization, optimizer, and training hyperparameters were applied uniformly to the twelve architectures. This design choice enables a controlled comparison of architectural families under equivalent training conditions rather than an optimized per-model comparison. We acknowledge that the original Methods section did not explicitly enumerate this uniformity at the level of detail requested. We will revise the Methods section to state explicitly that all listed settings were held constant across CNN, transformer, and hybrid models. We do not believe a per-family hyperparameter search is required for the stated contribution, which is a standardized benchmark rather than an architecture-optimization study; performing such a search would change the experimental question being answered. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark with held-out evaluation

full rationale

The paper reports classification accuracies, F1-scores and other metrics obtained by training ImageNet-pretrained models on the Kather dataset under a fixed transfer-learning protocol and measuring performance on held-out tiles. No equations, first-principles derivations, or 'predictions' appear; all numerical claims are direct experimental outputs. The standardized protocol is an explicit methodological choice whose fairness can be debated on external grounds, but it does not create any self-definitional, fitted-input, or self-citation reduction inside the reported results. This matches the default expectation for an empirical architecture-comparison study.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of the Kather dataset for colorectal histopathology and the assumption that a uniform transfer-learning protocol produces architecture-neutral comparisons; no new entities are postulated and no free parameters are fitted to produce the headline metrics.

free parameters (2)
  • model selection
    Choice of the twelve specific architectures (EVA-02, ViT-B/16, ResNet34, etc.)
  • training protocol details
    Exact learning rates, batch sizes, and epoch counts within the standardized fine-tuning procedure
axioms (2)
  • domain assumption The Kather dataset tiles are representative of real-world colorectal histopathology images
    Used as the sole benchmark without external validation
  • domain assumption ImageNet pretraining followed by fine-tuning is a fair and sufficient adaptation method for all architectures
    Applied uniformly to CNNs, transformers, and hybrids

pith-pipeline@v0.9.1-grok · 5826 in / 1369 out tokens · 40831 ms · 2026-06-26T09:17:37.935664+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    CA: a cancer journal for clinicians, 2021

    Sung, H., et al., Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 2021. 71(3): p. 209–249

  2. [2]

    Journal of surgical oncology, 2017

    Douaiher, J., et al., Colorectal cancer—global burden, trends, and geographical variations. Journal of surgical oncology, 2017. 115(5): p. 619–630

  3. [3]

    Cancers, 2021

    Sawicki, T., et al., A review of colorectal cancer in terms of epidemiology, risk factors, development, symptoms and diagnosis. Cancers, 2021. 13(9): p. 2025

  4. [4]

    Ibrahim, and M.J

    Müller, M.F., A.E. Ibrahim, and M.J. Arends, Molecular pathological classification of colorectal cancer. Virchows Archiv, 2016. 469(2): p. 125–134

  5. [5]

    American Journal of Clinical Pathology, 2020

    Kelly, M., et al., Job stress, burnout, work-life balance, well-being, and job satisfaction among pathology residents and fellows. American Journal of Clinical Pathology, 2020. 153(4): p. 449– 469

  6. [6]

    Marzouk, O. and J. Schofield, Review of histopathological and molecular prognostic features in colorectal cancer. Cancers, 2011. 3(2): p. 2767–2810

  7. [7]

    Sideris, M. and S. Papagrigoriadis, Molecular biomarkers and classification models in the evaluation of the prognosis of colorectal cancer. Anticancer research, 2014. 34(5): p. 2061–2068

  8. [8]

    Rom J Morphol Embryol, 2015

    Pallag, A., et al., Monitoring the effects of treatment in colon cancer cells using immunohistochemical and histoenzymatic techniques. Rom J Morphol Embryol, 2015. 56(3): p. 1103–1109

  9. [9]

    Histopathology, 2007

    Jass, J., Classification of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology, 2007. 50(1): p. 113–130

  10. [10]

    Diseases of the Colon & Rectum, 2017

    V ogel, J.D., et al., The American Society of Colon and Rectal Surgeons clinical practice guidelines for the treatment of colon cancer. Diseases of the Colon & Rectum, 2017. 60(10): p. 999–1017

  11. [11]

    Histopathology,

    Smits, L.J., et al., Diagnostic variability in the histopathological assessment of advanced colorectal adenomas and early colorectal cancer in a screening population. Histopathology,

  12. [12]

    Histopathology, 2011

    Van Putten, P.G., et al., Inter‐observer variation in the histological diagnosis of polyps in colorectal cancer screening. Histopathology, 2011. 58(6): p. 974–981

  13. [13]

    Pena, G.P. and J.S. Andrade-Filho, How does a pathologist make a diagnosis? Archives of pathology & laboratory medicine, 2009. 133(1): p. 124–132

  14. [14]

    Modern Pathology, 2023

    Bokhorst, J.-M., et al., Fully automated tumor bud assessment in hematoxylin and eosin-stained whole slide images of colorectal cancer. Modern Pathology, 2023. 36(9): p. 100233

  15. [15]

    Journal of Pathology Informatics, 2023

    Gu, Q., et al., Using an anomaly detection approach for the segmentation of colorectal cancer tumors in whole slide images. Journal of Pathology Informatics, 2023. 14: p. 100336

  16. [16]

    Kirar, and S

    Jabin, A., J.S. Kirar, and S. Ahmad, AI-based methods for modelling whole-slide imaging data in cancer diagnosis and transcriptome profile prediction. BMC Artificial Intelligence, 2025. 1(1): p. 16

  17. [17]

    npj Digital Medicine, 2025

    Jonnagaddala, J., et al., Multimodal analysis of whole slide images in colorectal cancer. npj Digital Medicine, 2025. 8(1): p. 719

  18. [18]

    Journal of pathology informatics, 2019

    Aeffner, F., et al., Introduction to digital image analysis in whole-slide imaging: a white paper from the digital pathology association. Journal of pathology informatics, 2019. 10(1): p. 9

  19. [19]

    Journal of Pathology Informatics,

    Patel, A., et al., Contemporary whole slide imaging devices and their applications within the modern pathology department: a selected hardware review. Journal of Pathology Informatics,

  20. [20]

    Cancer letters, 2020

    Huang, S., et al., Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer letters, 2020. 471: p. 61–71

  21. [21]

    Yoon, and Y

    Thakur, N., H. Yoon, and Y . Chong, Current trends of artificial intelligence for colorectal cancer pathology image analysis: a systematic review. Cancers, 2020. 12(7): p. 1884

  22. [22]

    Modern Pathology, 2025

    Hanna, M.G., et al., Future of artificial intelligence—machine learning trends in pathology and medicine. Modern Pathology, 2025. 38(4): p. 100705

  23. [23]

    Discover oncology, 2025

    Tiwari, A., et al., The current landscape of artificial intelligence in computational histopathology for cancer diagnosis. Discover oncology, 2025. 16(1): p. 438

  24. [24]

    Koeller, and E

    Wasinger, G., M.C. Koeller, and E. Compérat, Pathology in the artificial intelligence era: practical insights for immunohistochemistry and molecular pathology. Diagnostic Histopathology, 2025. 31(7): p. 416–423

  25. [25]

    Computers in Biology and Medicine, 2021

    Hamida, A.B., et al., Deep learning for colon cancer histopathological images analysis. Computers in Biology and Medicine, 2021. 136: p. 104730

  26. [26]

    Procedia Computer Science, 2021

    Sarwinda, D., et al., Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer. Procedia Computer Science, 2021. 179: p. 423–431

  27. [27]

    Clinical Proteomics, 2025

    Luo, Y ., et al., Unveiling the protein landscape for early detection of colorectal precancerous lesions. Clinical Proteomics, 2025. 22(1): p. 27

  28. [28]

    Journal of pathology informatics, 2017

    Korbar, B., et al., Deep learning for classification of colorectal polyps on whole-slide images. Journal of pathology informatics, 2017. 8(1): p. 30

  29. [29]

    Gut, 2021

    Sirinukunwattana, K., et al., Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. Gut, 2021. 70(3): p. 544–554

  30. [30]

    Deep residual learning for image recognition

    He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

  31. [31]

    Identity mappings in deep residual networks

    He, K., et al. Identity mappings in deep residual networks. in European conference on computer vision. 2016. Springer

  32. [32]

    Ciga, and A.L

    Srinidhi, C.L., O. Ciga, and A.L. Martel, Deep neural network models for computational histopathology: A survey. Medical image analysis, 2021. 67: p. 101813

  33. [33]

    Advances in neural information processing systems, 2021

    Dai, Z., et al., Coatnet: Marrying convolution and attention for all data sizes. Advances in neural information processing systems, 2021. 34: p. 3965–3977

  34. [34]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., et al., An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

  35. [35]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Liu, Z., et al. Swin transformer: Hierarchical vision transformer using shifted windows. in Proceedings of the IEEE/CVF international conference on computer vision. 2021

  36. [36]

    Cvt: Introducing convolutions to vision transformers

    Wu, H., et al. Cvt: Introducing convolutions to vision transformers. in Proceedings of the IEEE/CVF international conference on computer vision. 2021

  37. [37]

    Nature reviews Clinical oncology, 2019

    Bera, K., et al., Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nature reviews Clinical oncology, 2019. 16(11): p. 703–715

  38. [38]

    Gastroenterology, 2020

    Echle, A., et al., Clinical-grade detection of microsatellite instability in colorectal tumors by deep learning. Gastroenterology, 2020. 159(4): p. 1406–1416. e11

  39. [39]

    Scientific reports,

    Kather, J.N., et al., Multi-class texture analysis in colorectal cancer histology. Scientific reports,

  40. [40]

    Colorectal cancer classification using deep convolutional networks

    Ponzio, F., et al. Colorectal cancer classification using deep convolutional networks. in Proceedings of the 11th international joint conference on biomedical engineering systems and technologies. 2018

  41. [41]

    Sharkas, M. and O. Attallah, Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform. Scientific Reports, 2024. 14(1): p. 6914

  42. [42]

    and Y .-H

    Tsai, M.-J. and Y .-H. Tao, Deep learning techniques for the classification of colorectal cancer tissue. Electronics, 2021. 10(14): p. 1662

  43. [43]

    Quantitative imaging in medicine and surgery, 2025

    Ke, Q., et al., Advanced deep learning for multi-class colorectal cancer histopathology: integrating transfer learning and ensemble methods. Quantitative imaging in medicine and surgery, 2025. 15(3): p. 2329–2346

  44. [44]

    Biology Methods and Protocols, 2025

    Le, T.T., et al., Deep learning-based classification of colorectal cancer in histopathology images for category detection. Biology Methods and Protocols, 2025. 10(1): p. bpaf077

  45. [45]

    Scientific Reports, 2023

    Bokhorst, J.-M., et al., Deep learning for multi-class semantic segmentation enables colorectal cancer detection and classification in digital pathology images. Scientific Reports, 2023. 13(1): p. 8398

  46. [46]

    Applied Intelligence, 2025

    Muhammad Hakimi Tan, D.I.T., et al., A systematic review of semantic segmentation methods for histopathology images: a focused survey on breast, colon, and prostate cancers. Applied Intelligence, 2025. 55(16): p. 1–40

  47. [47]

    IEEE transactions on neural networks and learning systems, 2024

    Yao, L., et al., A colorectal coordinate-driven method for colorectum and colorectal cancer segmentation in conventional ct scans. IEEE transactions on neural networks and learning systems, 2024. 36(4): p. 7395–7406

  48. [48]

    Scientific Reports, 2025

    Hammad, M., et al., Explainable AI for lung cancer detection via a custom CNN on CT images. Scientific Reports, 2025. 15(1): p. 12707

  49. [49]

    Scientific Reports, 2024

    Mehedi, M.H.K., et al., A lightweight deep learning method to identify different types of cervical cancer. Scientific Reports, 2024. 14(1): p. 29446

  50. [50]

    Scientific reports, 2025

    Yadav, D.P., et al., Explainable label guided lightweight network with axial transformer encoder for early detection of oral cancer. Scientific reports, 2025. 15(1): p. 6391

  51. [51]

    Explainable Deep Learning Approach for Early Detection of Colorectal Cancer via Multiclass Histopathology Classification

    Dey, A., et al. Explainable Deep Learning Approach for Early Detection of Colorectal Cancer via Multiclass Histopathology Classification. in 2025 2nd Asia Pacific Conference on Innovation in Technology (APCIT). 2025. IEEE

  52. [52]

    Discover Artificial Intelligence, 2026

    Jeganathan, J., A knowledge distillation framework integrating Grad-CAM in ResNet for imbalanced gastrointestinal abnormality classification in capsule endoscopy. Discover Artificial Intelligence, 2026. 6(1): p. 359

  53. [53]

    Frontiers in Physiology,

    Zhen, C., et al., Grad-CAM based deep learning analytics for image-level colon disease classification based on graph neural networks and vision transformers. Frontiers in Physiology,

  54. [54]

    Physica Medica, 2023

    Hu, W., et al., EBHI: A new enteroscope biopsy histopathological H&E image dataset for image classification evaluation. Physica Medica, 2023. 107: p. 102534

  55. [55]

    Frontiers in Medicine, 2023

    Shi, L., et al., EBHI-Seg: a novel enteroscope biopsy histopathological hematoxylin and eosin image dataset for image segmentation tasks. Frontiers in Medicine, 2023. 10: p. 1114673

  56. [56]

    Collection of textures in colorectal cancer histology

    Kather, J.N.Z., Frank Gerrit; Bianconi, Francesco; Melchers, Susanne M; Schad, Lothar R; Gaiser, Timo; Marx, Alexander; Weis, Cleo-Aron. Collection of textures in colorectal cancer histology. 2016; Available from: https://zenodo.org/records/53169#.W6HwwP4zbOQ