Performance and Interpretability of Convolutional, Transformer, and Hybrid Deep Learning Models in Colorectal Histology Classification
Pith reviewed 2026-06-26 09:17 UTC · model grok-4.3
The pith
Transformer models reach the highest accuracy in colorectal histology classification but the edge over top CNNs remains small.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
All twelve models reached high performance on the eight-class Kather task, with accuracies between 93.2 percent and 97.1 percent. Transformer models produced the best scores across metrics, yet the gap separating the leading transformer from the leading CNN was relatively small. EVA-02 recorded the single highest result; ResNet34 and ConvNeXt-Tiny followed closely among convolutional networks.
What carries the argument
Standardized transfer-learning and fine-tuning protocol applied uniformly to ImageNet-pretrained CNN, transformer, and hybrid models on the Kather colorectal histopathology dataset.
If this is right
- Modern CNNs can be chosen when model size or inference speed matters more than the last percent of accuracy.
- Complex Stroma remains the hardest class for every architecture tested.
- Hybrid models sit between the two families but do not exceed the best pure transformer.
- The small performance gap implies that further gains may come from data scale or training tricks rather than architecture alone.
Where Pith is reading between the lines
- The ranking could shift if the evaluation moved to whole-slide images instead of pre-cropped tiles.
- Interpretability results promised by the title are not reported in the performance-focused abstract, so separate analysis would be needed to compare explainability across families.
- Because all models start from ImageNet weights, the benchmark tests fine-tuning behavior more than raw architectural capacity.
Load-bearing premise
A single standardized transfer-learning and fine-tuning protocol on ImageNet-pretrained models produces a fair comparison of architectural families when tested only on the Kather dataset.
What would settle it
Repeating the comparison after training the same models from random initialization or on a different colorectal histology dataset would show whether the current ranking of transformers over CNNs holds.
read the original abstract
Deep learning has become an important tool in computational pathology, enabling automated analysis of histopathological images. While convolutional neural networks (CNNs) have traditionally dominated this field, transformer-based and hybrid architectures have recently demonstrated promising performance. However, comprehensive comparisons of these approaches for colorectal histopathology remain limited. This study evaluated twelve ImageNet-pretrained CNN, transformer, and hybrid architectures using the Kather colorectal histopathology dataset containing 5,000 image tiles from eight tissue classes. All models were trained using a standardized transfer-learning and fine-tuning protocol and assessed using multiple performance metrics, including accuracy, precision, sensitivity, specificity, F1-score, ROC-AUC, Cohen's kappa, and Matthews correlation coefficient. All evaluated models achieved high classification performance, with accuracies ranging from 93.2% to 97.1%. EVA-02 achieved the highest overall performance (97.1% accuracy, 97.0% F1-score), closely followed by ViT-B/16. Among CNNs, ResNet34 and ConvNeXt-Tiny demonstrated highly competitive performance, achieving accuracies of 96.4% and 96.3%, respectively. Transformer architectures generally produced the strongest results across evaluation metrics, although the performance gap between the best transformer and CNN models was relatively small. Per-class analysis showed consistently strong classification performance across all tissue categories, with Complex Stroma representing the most challenging class. Overall, transformer-based architectures achieved the highest predictive performance, whereas modern CNNs provided a favorable balance between accuracy and model complexity. These findings provide a comprehensive benchmark of major deep learning paradigms for colorectal histopathology classification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide a comprehensive benchmark comparing twelve ImageNet-pretrained CNN, transformer, and hybrid deep learning models on the Kather colorectal histopathology dataset (5,000 tiles, 8 tissue classes). Using a single standardized transfer-learning and fine-tuning protocol, all models achieve high performance (accuracies 93.2%–97.1%), with EVA-02 (transformer) highest at 97.1% accuracy / 97.0% F1, closely followed by ViT-B/16; top CNNs (ResNet34 96.4%, ConvNeXt-Tiny 96.3%) are competitive. The abstract concludes that transformer architectures generally produce the strongest results across metrics (accuracy, F1, ROC-AUC, kappa, MCC), though gaps are small, modern CNNs balance accuracy and complexity well, and Complex Stroma is the hardest class.
Significance. If the comparison holds, the work supplies a useful multi-metric empirical benchmark for architecture selection in computational pathology, showing transformers can achieve top performance on this task while documenting that recent CNNs remain competitive. The inclusion of Cohen’s kappa, MCC, and per-class results strengthens its reference value.
major comments (1)
- [Methods] Methods (standardized protocol): The paper applies one fixed transfer-learning/fine-tuning protocol to all twelve models but does not state whether learning-rate schedules, augmentation strength, regularization, or optimizer settings were held constant across families or tuned separately. This directly affects the central claim that “transformer architectures generally produced the strongest results” (abstract), because the 0.7 pp gap between EVA-02 (97.1 %) and ResNet34 (96.4 %) could reflect protocol bias rather than intrinsic architectural merit; an ablation or per-family hyperparameter search is required to support the architectural ranking.
minor comments (2)
- [Abstract] Abstract and Results: No information is given on train-test split ratios, number of random seeds, or statistical significance testing of the reported accuracy differences, weakening the strength of the performance claims.
- [Results] Results: The statement that Complex Stroma is the most challenging class is noted, but no confusion matrix, per-class error breakdown, or qualitative error analysis is referenced to support this observation.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the potential reference value of this benchmark. We address the single major comment below.
read point-by-point responses
-
Referee: [Methods] Methods (standardized protocol): The paper applies one fixed transfer-learning/fine-tuning protocol to all twelve models but does not state whether learning-rate schedules, augmentation strength, regularization, or optimizer settings were held constant across families or tuned separately. This directly affects the central claim that “transformer architectures generally produced the strongest results” (abstract), because the 0.7 pp gap between EVA-02 (97.1 %) and ResNet34 (96.4 %) could reflect protocol bias rather than intrinsic architectural merit; an ablation or per-family hyperparameter search is required to support the architectural ranking.
Authors: We appreciate the referee drawing attention to the need for greater methodological transparency. The protocol was deliberately standardized and identical for all models: the same learning-rate schedule, augmentation pipeline, regularization, optimizer, and training hyperparameters were applied uniformly to the twelve architectures. This design choice enables a controlled comparison of architectural families under equivalent training conditions rather than an optimized per-model comparison. We acknowledge that the original Methods section did not explicitly enumerate this uniformity at the level of detail requested. We will revise the Methods section to state explicitly that all listed settings were held constant across CNN, transformer, and hybrid models. We do not believe a per-family hyperparameter search is required for the stated contribution, which is a standardized benchmark rather than an architecture-optimization study; performing such a search would change the experimental question being answered. revision: yes
Circularity Check
No circularity: purely empirical benchmark with held-out evaluation
full rationale
The paper reports classification accuracies, F1-scores and other metrics obtained by training ImageNet-pretrained models on the Kather dataset under a fixed transfer-learning protocol and measuring performance on held-out tiles. No equations, first-principles derivations, or 'predictions' appear; all numerical claims are direct experimental outputs. The standardized protocol is an explicit methodological choice whose fairness can be debated on external grounds, but it does not create any self-definitional, fitted-input, or self-citation reduction inside the reported results. This matches the default expectation for an empirical architecture-comparison study.
Axiom & Free-Parameter Ledger
free parameters (2)
- model selection
- training protocol details
axioms (2)
- domain assumption The Kather dataset tiles are representative of real-world colorectal histopathology images
- domain assumption ImageNet pretraining followed by fine-tuning is a fair and sufficient adaptation method for all architectures
Reference graph
Works this paper leans on
-
[1]
CA: a cancer journal for clinicians, 2021
Sung, H., et al., Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 2021. 71(3): p. 209–249
2020
-
[2]
Journal of surgical oncology, 2017
Douaiher, J., et al., Colorectal cancer—global burden, trends, and geographical variations. Journal of surgical oncology, 2017. 115(5): p. 619–630
2017
-
[3]
Cancers, 2021
Sawicki, T., et al., A review of colorectal cancer in terms of epidemiology, risk factors, development, symptoms and diagnosis. Cancers, 2021. 13(9): p. 2025
2021
-
[4]
Ibrahim, and M.J
Müller, M.F., A.E. Ibrahim, and M.J. Arends, Molecular pathological classification of colorectal cancer. Virchows Archiv, 2016. 469(2): p. 125–134
2016
-
[5]
American Journal of Clinical Pathology, 2020
Kelly, M., et al., Job stress, burnout, work-life balance, well-being, and job satisfaction among pathology residents and fellows. American Journal of Clinical Pathology, 2020. 153(4): p. 449– 469
2020
-
[6]
Marzouk, O. and J. Schofield, Review of histopathological and molecular prognostic features in colorectal cancer. Cancers, 2011. 3(2): p. 2767–2810
2011
-
[7]
Sideris, M. and S. Papagrigoriadis, Molecular biomarkers and classification models in the evaluation of the prognosis of colorectal cancer. Anticancer research, 2014. 34(5): p. 2061–2068
2014
-
[8]
Rom J Morphol Embryol, 2015
Pallag, A., et al., Monitoring the effects of treatment in colon cancer cells using immunohistochemical and histoenzymatic techniques. Rom J Morphol Embryol, 2015. 56(3): p. 1103–1109
2015
-
[9]
Histopathology, 2007
Jass, J., Classification of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology, 2007. 50(1): p. 113–130
2007
-
[10]
Diseases of the Colon & Rectum, 2017
V ogel, J.D., et al., The American Society of Colon and Rectal Surgeons clinical practice guidelines for the treatment of colon cancer. Diseases of the Colon & Rectum, 2017. 60(10): p. 999–1017
2017
-
[11]
Histopathology,
Smits, L.J., et al., Diagnostic variability in the histopathological assessment of advanced colorectal adenomas and early colorectal cancer in a screening population. Histopathology,
-
[12]
Histopathology, 2011
Van Putten, P.G., et al., Inter‐observer variation in the histological diagnosis of polyps in colorectal cancer screening. Histopathology, 2011. 58(6): p. 974–981
2011
-
[13]
Pena, G.P. and J.S. Andrade-Filho, How does a pathologist make a diagnosis? Archives of pathology & laboratory medicine, 2009. 133(1): p. 124–132
2009
-
[14]
Modern Pathology, 2023
Bokhorst, J.-M., et al., Fully automated tumor bud assessment in hematoxylin and eosin-stained whole slide images of colorectal cancer. Modern Pathology, 2023. 36(9): p. 100233
2023
-
[15]
Journal of Pathology Informatics, 2023
Gu, Q., et al., Using an anomaly detection approach for the segmentation of colorectal cancer tumors in whole slide images. Journal of Pathology Informatics, 2023. 14: p. 100336
2023
-
[16]
Kirar, and S
Jabin, A., J.S. Kirar, and S. Ahmad, AI-based methods for modelling whole-slide imaging data in cancer diagnosis and transcriptome profile prediction. BMC Artificial Intelligence, 2025. 1(1): p. 16
2025
-
[17]
npj Digital Medicine, 2025
Jonnagaddala, J., et al., Multimodal analysis of whole slide images in colorectal cancer. npj Digital Medicine, 2025. 8(1): p. 719
2025
-
[18]
Journal of pathology informatics, 2019
Aeffner, F., et al., Introduction to digital image analysis in whole-slide imaging: a white paper from the digital pathology association. Journal of pathology informatics, 2019. 10(1): p. 9
2019
-
[19]
Journal of Pathology Informatics,
Patel, A., et al., Contemporary whole slide imaging devices and their applications within the modern pathology department: a selected hardware review. Journal of Pathology Informatics,
-
[20]
Cancer letters, 2020
Huang, S., et al., Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer letters, 2020. 471: p. 61–71
2020
-
[21]
Yoon, and Y
Thakur, N., H. Yoon, and Y . Chong, Current trends of artificial intelligence for colorectal cancer pathology image analysis: a systematic review. Cancers, 2020. 12(7): p. 1884
2020
-
[22]
Modern Pathology, 2025
Hanna, M.G., et al., Future of artificial intelligence—machine learning trends in pathology and medicine. Modern Pathology, 2025. 38(4): p. 100705
2025
-
[23]
Discover oncology, 2025
Tiwari, A., et al., The current landscape of artificial intelligence in computational histopathology for cancer diagnosis. Discover oncology, 2025. 16(1): p. 438
2025
-
[24]
Koeller, and E
Wasinger, G., M.C. Koeller, and E. Compérat, Pathology in the artificial intelligence era: practical insights for immunohistochemistry and molecular pathology. Diagnostic Histopathology, 2025. 31(7): p. 416–423
2025
-
[25]
Computers in Biology and Medicine, 2021
Hamida, A.B., et al., Deep learning for colon cancer histopathological images analysis. Computers in Biology and Medicine, 2021. 136: p. 104730
2021
-
[26]
Procedia Computer Science, 2021
Sarwinda, D., et al., Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer. Procedia Computer Science, 2021. 179: p. 423–431
2021
-
[27]
Clinical Proteomics, 2025
Luo, Y ., et al., Unveiling the protein landscape for early detection of colorectal precancerous lesions. Clinical Proteomics, 2025. 22(1): p. 27
2025
-
[28]
Journal of pathology informatics, 2017
Korbar, B., et al., Deep learning for classification of colorectal polyps on whole-slide images. Journal of pathology informatics, 2017. 8(1): p. 30
2017
-
[29]
Gut, 2021
Sirinukunwattana, K., et al., Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. Gut, 2021. 70(3): p. 544–554
2021
-
[30]
Deep residual learning for image recognition
He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016
2016
-
[31]
Identity mappings in deep residual networks
He, K., et al. Identity mappings in deep residual networks. in European conference on computer vision. 2016. Springer
2016
-
[32]
Ciga, and A.L
Srinidhi, C.L., O. Ciga, and A.L. Martel, Deep neural network models for computational histopathology: A survey. Medical image analysis, 2021. 67: p. 101813
2021
-
[33]
Advances in neural information processing systems, 2021
Dai, Z., et al., Coatnet: Marrying convolution and attention for all data sizes. Advances in neural information processing systems, 2021. 34: p. 3965–3977
2021
-
[34]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., et al., An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[35]
Swin transformer: Hierarchical vision transformer using shifted windows
Liu, Z., et al. Swin transformer: Hierarchical vision transformer using shifted windows. in Proceedings of the IEEE/CVF international conference on computer vision. 2021
2021
-
[36]
Cvt: Introducing convolutions to vision transformers
Wu, H., et al. Cvt: Introducing convolutions to vision transformers. in Proceedings of the IEEE/CVF international conference on computer vision. 2021
2021
-
[37]
Nature reviews Clinical oncology, 2019
Bera, K., et al., Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nature reviews Clinical oncology, 2019. 16(11): p. 703–715
2019
-
[38]
Gastroenterology, 2020
Echle, A., et al., Clinical-grade detection of microsatellite instability in colorectal tumors by deep learning. Gastroenterology, 2020. 159(4): p. 1406–1416. e11
2020
-
[39]
Scientific reports,
Kather, J.N., et al., Multi-class texture analysis in colorectal cancer histology. Scientific reports,
-
[40]
Colorectal cancer classification using deep convolutional networks
Ponzio, F., et al. Colorectal cancer classification using deep convolutional networks. in Proceedings of the 11th international joint conference on biomedical engineering systems and technologies. 2018
2018
-
[41]
Sharkas, M. and O. Attallah, Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform. Scientific Reports, 2024. 14(1): p. 6914
2024
-
[42]
and Y .-H
Tsai, M.-J. and Y .-H. Tao, Deep learning techniques for the classification of colorectal cancer tissue. Electronics, 2021. 10(14): p. 1662
2021
-
[43]
Quantitative imaging in medicine and surgery, 2025
Ke, Q., et al., Advanced deep learning for multi-class colorectal cancer histopathology: integrating transfer learning and ensemble methods. Quantitative imaging in medicine and surgery, 2025. 15(3): p. 2329–2346
2025
-
[44]
Biology Methods and Protocols, 2025
Le, T.T., et al., Deep learning-based classification of colorectal cancer in histopathology images for category detection. Biology Methods and Protocols, 2025. 10(1): p. bpaf077
2025
-
[45]
Scientific Reports, 2023
Bokhorst, J.-M., et al., Deep learning for multi-class semantic segmentation enables colorectal cancer detection and classification in digital pathology images. Scientific Reports, 2023. 13(1): p. 8398
2023
-
[46]
Applied Intelligence, 2025
Muhammad Hakimi Tan, D.I.T., et al., A systematic review of semantic segmentation methods for histopathology images: a focused survey on breast, colon, and prostate cancers. Applied Intelligence, 2025. 55(16): p. 1–40
2025
-
[47]
IEEE transactions on neural networks and learning systems, 2024
Yao, L., et al., A colorectal coordinate-driven method for colorectum and colorectal cancer segmentation in conventional ct scans. IEEE transactions on neural networks and learning systems, 2024. 36(4): p. 7395–7406
2024
-
[48]
Scientific Reports, 2025
Hammad, M., et al., Explainable AI for lung cancer detection via a custom CNN on CT images. Scientific Reports, 2025. 15(1): p. 12707
2025
-
[49]
Scientific Reports, 2024
Mehedi, M.H.K., et al., A lightweight deep learning method to identify different types of cervical cancer. Scientific Reports, 2024. 14(1): p. 29446
2024
-
[50]
Scientific reports, 2025
Yadav, D.P., et al., Explainable label guided lightweight network with axial transformer encoder for early detection of oral cancer. Scientific reports, 2025. 15(1): p. 6391
2025
-
[51]
Explainable Deep Learning Approach for Early Detection of Colorectal Cancer via Multiclass Histopathology Classification
Dey, A., et al. Explainable Deep Learning Approach for Early Detection of Colorectal Cancer via Multiclass Histopathology Classification. in 2025 2nd Asia Pacific Conference on Innovation in Technology (APCIT). 2025. IEEE
2025
-
[52]
Discover Artificial Intelligence, 2026
Jeganathan, J., A knowledge distillation framework integrating Grad-CAM in ResNet for imbalanced gastrointestinal abnormality classification in capsule endoscopy. Discover Artificial Intelligence, 2026. 6(1): p. 359
2026
-
[53]
Frontiers in Physiology,
Zhen, C., et al., Grad-CAM based deep learning analytics for image-level colon disease classification based on graph neural networks and vision transformers. Frontiers in Physiology,
-
[54]
Physica Medica, 2023
Hu, W., et al., EBHI: A new enteroscope biopsy histopathological H&E image dataset for image classification evaluation. Physica Medica, 2023. 107: p. 102534
2023
-
[55]
Frontiers in Medicine, 2023
Shi, L., et al., EBHI-Seg: a novel enteroscope biopsy histopathological hematoxylin and eosin image dataset for image segmentation tasks. Frontiers in Medicine, 2023. 10: p. 1114673
2023
-
[56]
Collection of textures in colorectal cancer histology
Kather, J.N.Z., Frank Gerrit; Bianconi, Francesco; Melchers, Susanne M; Schad, Lothar R; Gaiser, Timo; Marx, Alexander; Weis, Cleo-Aron. Collection of textures in colorectal cancer histology. 2016; Available from: https://zenodo.org/records/53169#.W6HwwP4zbOQ
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.