Recognition: 2 theorem links
· Lean TheoremAnalogical Reasoning as a Doctor: A Foundation Model for Gastrointestinal Endoscopy Diagnosis
Pith reviewed 2026-05-10 19:56 UTC · model grok-4.3
The pith
A foundation model with analogical reasoning outperforms priors in gastrointestinal endoscopy diagnosis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RATNet is a foundation model for gastrointestinal endoscopy imaging based on analogical reasoning. It acquires and transfers knowledge from heterogeneous expert annotations across five datasets via cyclic pre-training. The model consists of an encoder, a relevance-knowledge acquisition and transfer (RAT) module, a projector, and a multi-task head. The analogical reasoning matches image-derived posterior knowledge to a learned prior knowledge base and transfers relative knowledge to guide diagnosis, leading to improved generalization and resistance to bias.
What carries the argument
The relevance-knowledge acquisition and transfer (RAT) module that performs analogical reasoning by matching posterior knowledge from input images to a prior knowledge base and transferring relative knowledge to support diagnosis decisions.
Load-bearing premise
The cyclic pre-training strategy successfully acquires transferable knowledge from heterogeneous annotations, and the analogical reasoning mechanism is the key driver of improved generalization and bias resistance.
What would settle it
An experiment where the RAT module is ablated or replaced with a standard attention mechanism, resulting in performance equal to or worse than baseline models like GastroVision on the six evaluation scenarios.
read the original abstract
Gastrointestinal diseases impose a growing global health burden, and endoscopy is a primary tool for early diagnosis. However, routine endoscopic image interpretation still suffers from missed lesions and limited efficiency. Although AI-assisted diagnosis has shown promise, existing models often lack generalizability, adaptability, robustness, and scalability because of limited medical data, domain shift, and heterogeneous annotations. To address these challenges, we develop RATNet, a foundation model for gastrointestinal endoscopy imaging based on analogical reasoning. RATNet acquires and transfers knowledge from heterogeneous expert annotations across five gastrointestinal endoscopy datasets through a cyclic pre-training strategy. Its architecture consists of an encoder, a relevance-knowledge acquisition and transfer (RAT) module, a projector, and a multi-task head, and supports fine-tuning, linear probing, and zero-shot transfer. Evaluations show that RATNet outperforms existing foundation models, including GastroNet and GastroVision, across six scenarios: diagnosis of common gastrointestinal diseases, few-shot learning for rare diseases, zero-shot transfer to new medical sites, robustness under long-tailed disease distributions, adaptation to novel diseases, and privacy-preserving deployment via federated learning. Its advantage comes from an analogical reasoning mechanism that matches image-derived posterior knowledge to a learned prior knowledge base and transfers relative knowledge to guide diagnosis, improving generalization and resistance to bias. RATNet is open and cost-effective, supports automatic integration of heterogeneous annotations without manual label unification, and reduces data acquisition costs, making it a practical foundation for intelligent gastrointestinal diagnosis, especially in resource-limited settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RATNet, a foundation model for gastrointestinal endoscopy diagnosis based on analogical reasoning. It acquires and transfers knowledge from heterogeneous expert annotations across five datasets via a cyclic pre-training strategy without manual label unification. The architecture comprises an encoder, a relevance-knowledge acquisition and transfer (RAT) module, a projector, and a multi-task head, supporting fine-tuning, linear probing, and zero-shot transfer. The paper claims that RATNet outperforms existing foundation models such as GastroNet and GastroVision across six scenarios—diagnosis of common GI diseases, few-shot learning for rare diseases, zero-shot transfer to new sites, robustness under long-tailed distributions, adaptation to novel diseases, and privacy-preserving federated learning—due to its analogical mechanism that matches image-derived posterior knowledge to a learned prior knowledge base and transfers relative knowledge to guide diagnosis.
Significance. If the empirical claims hold after addressing the noted gaps, this work could meaningfully advance medical computer vision by offering a more generalizable and data-efficient foundation model for endoscopy that handles real-world annotation heterogeneity and supports federated deployment. The automatic integration of heterogeneous annotations without manual unification and the open-source release are explicit strengths that lower barriers to adoption in resource-limited clinical settings.
major comments (2)
- §5 (Experimental Results) and §4.3 (Ablation Studies): The central claim that the analogical reasoning mechanism (via the RAT module) is responsible for outperformance across the six scenarios is not isolated. No ablation is described that keeps the encoder, projector, multi-task head, and cyclic pre-training fixed while removing or replacing the relevance-knowledge acquisition/transfer logic; all comparisons are only to external models (GastroNet, GastroVision), leaving open whether gains derive from the analogical component or simply from broader multi-dataset exposure.
- §3.2 (Cyclic Pre-training Strategy): The description of how the cyclic pre-training acquires and transfers knowledge from heterogeneous annotations across five datasets without manual label unification does not specify the independence of the prior knowledge base construction from the downstream evaluation metrics used in the zero-shot and few-shot scenarios, raising a potential circularity risk for the generalization claims.
minor comments (2)
- Abstract: The abstract asserts outperformance across six scenarios but supplies no quantitative metrics, statistical tests, or error bars; including at least one representative result (e.g., mean AUC or accuracy delta with significance) would strengthen the summary.
- Figure 1 (Architecture Diagram): The RAT module internals are not labeled with equation references or data-flow arrows, making it difficult to trace how posterior-to-prior matching occurs.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We appreciate the referee's careful reading and address each major comment below. We have prepared revisions to strengthen the presentation of our results and clarify methodological details.
read point-by-point responses
-
Referee: §5 (Experimental Results) and §4.3 (Ablation Studies): The central claim that the analogical reasoning mechanism (via the RAT module) is responsible for outperformance across the six scenarios is not isolated. No ablation is described that keeps the encoder, projector, multi-task head, and cyclic pre-training fixed while removing or replacing the relevance-knowledge acquisition/transfer logic; all comparisons are only to external models (GastroNet, GastroVision), leaving open whether gains derive from the analogical component or simply from broader multi-dataset exposure.
Authors: We agree that an internal ablation isolating the RAT module's analogical reasoning logic would provide stronger causal evidence for its contribution. In the revised manuscript, we will add a new ablation in §4.3 that retains the encoder, projector, multi-task head, and cyclic pre-training strategy exactly as described, but replaces the relevance-knowledge acquisition and transfer logic with a baseline (e.g., standard self-attention without the prior-posterior matching and relative knowledge transfer). Performance will be reported across all six evaluation scenarios to quantify the incremental benefit of the analogical mechanism beyond multi-dataset exposure alone. revision: yes
-
Referee: §3.2 (Cyclic Pre-training Strategy): The description of how the cyclic pre-training acquires and transfers knowledge from heterogeneous annotations across five datasets without manual label unification does not specify the independence of the prior knowledge base construction from the downstream evaluation metrics used in the zero-shot and few-shot scenarios, raising a potential circularity risk for the generalization claims.
Authors: We thank the referee for highlighting this point of potential ambiguity. The prior knowledge base is built exclusively from the training splits of the five pre-training datasets during the cyclic process; all zero-shot, few-shot, and other downstream evaluations use held-out test sets drawn from entirely disjoint sites, patients, or disease distributions with no overlap in images or labels. In the revision, we will expand §3.2 with an explicit subsection on data partitioning and independence, including a table or diagram confirming that no evaluation metrics or test data influence the construction of the prior knowledge base. This will remove any risk of circularity in the generalization claims. revision: yes
Circularity Check
No significant circularity; empirical claims rest on external comparisons
full rationale
The paper describes RATNet's architecture (encoder + RAT module + projector + multi-task head) and cyclic pre-training on five heterogeneous datasets, then reports empirical outperformance versus external baselines (GastroNet, GastroVision) across six evaluation scenarios. No equations or derivations reduce the claimed advantage to a fitted parameter or self-citation by construction; the analogical-reasoning attribution is presented as an interpretive explanation of observed results rather than a tautological re-labeling of inputs. The absence of internal ablations is a separate methodological limitation but does not create a self-definitional or load-bearing circular step within the derivation chain itself.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Cyclic pre-training integrates heterogeneous annotations without manual unification
invented entities (1)
-
RAT module
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RAT module computes cosine similarity sim_i = k_p · b_i / (‖k_p‖ ‖b_i‖), softmax weights, weighted sum k_a, orthogonality constraint on KB, and L_ts loss
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
cyclic pre-training on five heterogeneous endoscopy datasets without label unification
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Gastroenterology165(3), 773–783 (2023)
Wang, Y., Huang, Y., Chase, R.C., Li, T., Ramai, D., Li, S., Huang, X., Antwi, S.O., Keaveny, A.P., Pang, M.: Global burden of digestive diseases: a systematic analysis of the global burden of diseases study, 1990 to 2019. Gastroenterology165(3), 773–783 (2023)
1990
-
[2]
Gastroenterology159(1), 335–349 (2020)
Arnold, M., Abnet, C.C., Neale, R.E., Vignat, J., Giovannucci, E.L., McGlynn, K.A., Bray, F.: Global burden of 5 major types of gastrointestinal cancer. Gastroenterology159(1), 335–349 (2020)
2020
-
[3]
CA: a cancer journal for clinicians71(3), 209–249 (2021) 17
Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F.: Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians71(3), 209–249 (2021) 17
2020
-
[4]
Gastroenterology Clinics of North America42(2), 211 (2013)
Correa, P.: Gastric cancer: overview. Gastroenterology Clinics of North America42(2), 211 (2013)
2013
-
[5]
Molecular Oncology15(10), 2580–2599 (2021)
Tang, Y., Anandasabapathy, S., Richards-Kortum, R.: Advances in optical gastrointestinal endoscopy: a technical review. Molecular Oncology15(10), 2580–2599 (2021)
2021
-
[6]
Cancers15(9), 2445 (2023)
Martins, B.C., Moura, R.N., Kum, A.S.T., Matsubayashi, C.O., Marques, S.B., Safatle-Ribeiro, A.V.: Endoscopic imaging for the diagnosis of neoplastic and pre-neoplastic conditions of the stomach. Cancers15(9), 2445 (2023)
2023
-
[7]
The Ulster Medical Journal94(1), 16 (2025)
Tham, C., Rea, D., Tham, T.: Artificial intelligence in endoscopy: A narrative review. The Ulster Medical Journal94(1), 16 (2025)
2025
-
[8]
Digestive and Liver Disease (2025)
Xu, Z., Li, Y., Su, P., Zhong, Z., Zeng, Z., Chen, M., Chen, D., Lan, C.: Artificial intelligence system improves the quality of digestive endoscopy: A prospective pretest and post-test single-center clinical trial. Digestive and Liver Disease (2025)
2025
-
[9]
Diagnostics16(2), 239 (2026)
Mushtaq, K., Lim, Y.J., Spada, C., Mussetto, A., Koulaouzidis, A., Kaung, T., Borrow, D.- M., Casadei, C., Patel, P., Rahman, I.: Ai-assisted double-headed capsule endoscopy: Multicentre prospective diagnostic accuracy study across small bowel indications. Diagnostics16(2), 239 (2026)
2026
-
[10]
arXiv preprint arXiv:2407.15851 (2024)
Shi, C., Rezai, R., Yang, J., Dou, Q., Li, X.: A survey on trustworthiness in foundation models for medical image analysis. arXiv preprint arXiv:2407.15851 (2024)
-
[11]
Medical image analysis91, 102996 (2024)
Zhang, S., Metaxas, D.: On the challenges and perspectives of foundation models for medical image analysis. Medical image analysis91, 102996 (2024)
2024
-
[12]
Nature, 1–11 (2025)
Ma, D., Pang, J., Gotway, M.B., Liang, J.: A fully open ai foundation model applied to chest radiography. Nature, 1–11 (2025)
2025
-
[13]
Nature Communications16(1), 7866 (2025)
Wu, C., Zhang, X., Zhang, Y., Hui, H., Wang, Y., Xie, W.: Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data. Nature Communications16(1), 7866 (2025)
2025
-
[14]
Nature634(8035), 970–978 (2024)
Wang, X., Zhao, J., Marostica, E., Yuan, W., Jin, J., Zhang, J., Li, R., Tang, H., Wang, K., Li, Y.,et al.: A pathology foundation model for cancer diagnosis and prognosis prediction. Nature634(8035), 970–978 (2024)
2024
-
[15]
Medical Image Analysis 98, 103298 (2024)
Boers, T.G., Fockens, K.N., Putten, J.A., Jaspers, T.J., Kusters, C.H., Jukema, J.B., Jong, M.R., Struyvenberg, M.R., Groof, J., Bergman, J.J.,et al.: Foundation models in gastrointestinal endo- scopic ai: Impact of architecture, pre-training approach and data efficiency. Medical Image Analysis 98, 103298 (2024)
2024
-
[16]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
He, Y., Chen, Q., Liu, B., Cao, Y.: Foundational multi-task multimodal model for upper gi endoscopy. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6612–6621 (2025)
2025
-
[17]
Procedia Computer Science250, 188–194 (2024)
Zhang, B., Chen, Y., Bai, L., Zhao, Y., Sun, Y., Yuan, Y., Zhang, J., Ren, H.: Learning to adapt foundation model dinov2 for capsule endoscopy diagnosis. Procedia Computer Science250, 188–194 (2024)
2024
-
[18]
arXiv preprint arXiv:2501.05488 (2025)
Dermyer, P., Kalra, A., Schwartz, M.: Endodino: A foundation model for gi endoscopy. arXiv preprint arXiv:2501.05488 (2025)
-
[19]
BMC Medical Imaging20(1), 83 (2020)
Wang, W., Tian, J., Zhang, C., Luo, Y., Wang, X., Li, J.: An improved deep learning approach and its applications on colonic polyp images detection. BMC Medical Imaging20(1), 83 (2020)
2020
-
[20]
Inflammatory Bowel Diseases29(9), 1431–1439 (2023) 18
Polat, G., Kani, H.T., Ergenc, I., Ozen Alahdab, Y., Temizel, A., Atug, O.: Improving the computer- aided estimation of ulcerative colitis severity according to mayo endoscopic score by using regression- based deep learning. Inflammatory Bowel Diseases29(9), 1431–1439 (2023) 18
2023
-
[21]
In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp
Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., Lange, T., Johansen, D., Spampinato, C., Dang-Nguyen, D.-T., Lux, M., Schmidt, P.T.,et al.: Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 164–169 (2017)
2017
-
[22]
Scientific data7(1), 283 (2020)
Borgli, H., Thambawita, V., Smedsrud, P.H., Hicks, S., Jha, D., Eskeland, S.L., Randel, K.R., Pogorelov, K., Lux, M., Nguyen, D.T.D.,et al.: Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Scientific data7(1), 283 (2020)
2020
-
[23]
arXiv preprint arXiv:2505.24108 (2025)
Devkota, A., Amireskandari, A., Palko, J., Thakkar, S., Adjeroh, D., Jiang, X., Bhattarai, B., Gyawali, P.K.: Federated foundation model for gi endoscopy images. arXiv preprint arXiv:2505.24108 (2025)
-
[24]
In: Thirteenth International Conference on Machine Vision, vol
Kondrateva, E., Pominova, M., Popova, E., Sharaev, M., Bernstein, A., Burnaev, E.: Domain shift in computer vision models for mri data analysis: an overview. In: Thirteenth International Conference on Machine Vision, vol. 11605, pp. 126–133 (2021). SPIE
2021
-
[25]
Artificial Intelligence Review57(9), 232 (2024)
Ayana, G., Dese, K., Abagaro, A.M., Jeong, K.C., Yoon, S.-D., Choe, S.-w.: Multistage transfer learning for medical images. Artificial Intelligence Review57(9), 232 (2024)
2024
-
[26]
Scientific Data10(1), 75 (2023)
Ali, S., Jha, D., Ghatwary, N., Realdon, S., Cannizzaro, R., Salem, O.E., Lamarque, D., Daul, C., Riegler, M.A., Anonsen, K.V.,et al.: A multi-centre polyp detection and segmentation dataset for generalisability assessment. Scientific Data10(1), 75 (2023)
2023
-
[27]
Routledge, ??? (2017)
Ball, L.J., Thompson, V.A.: International Handbook of Thinking and Reasoning. Routledge, ??? (2017)
2017
-
[28]
K¨ unstliche Intell.22(1), 8–12 (2008)
Gust, H., Krumnack, U., K¨ uhnberger, K.-U., Schwering, A.: Analogical reasoning: a core of cognition. K¨ unstliche Intell.22(1), 8–12 (2008)
2008
-
[29]
American psychologist 52(1), 32 (1997)
Gentner, D., Holyoak, K.J.: Reasoning and learning by analogy: Introduction. American psychologist 52(1), 32 (1997)
1997
-
[30]
Ribeiro, H.J.: Systematic Approaches to Argument by Analogy vol. 25. Springer, ??? (2014)
2014
-
[31]
IEEE transactions on medical imaging35(9), 2051–2063 (2016)
Mesejo, P., Pizarro, D., Abergel, A., Rouquette, O., Beorchia, S., Poincloux, L., Bartoli, A.: Computer-aided classification of gastrointestinal lesions in regular colonoscopy. IEEE transactions on medical imaging35(9), 2051–2063 (2016)
2051
-
[32]
In: Workshop on Machine Learning for Multimodal Healthcare Data, pp
Jha, D., Sharma, V., Dasu, N., Tomar, N.K., Hicks, S., Bhuyan, M.K., Das, P.K., Riegler, M.A., Halvorsen, P., Bagci, U.,et al.: Gastrovision: A multi-class endoscopy image dataset for com- puter aided gastrointestinal disease detection. In: Workshop on Machine Learning for Multimodal Healthcare Data, pp. 125–140 (2023). Springer
2023
-
[33]
Scientific Data8(1), 142 (2021)
Smedsrud, P.H., Thambawita, V., Hicks, S.A., Gjestang, H., Nedrejord, O.O., Næss, E., Borgli, H., Jha, D., Berstad, T.J.D., Eskeland, S.L.,et al.: Kvasir-capsule, a video capsule endoscopy dataset. Scientific Data8(1), 142 (2021)
2021
-
[34]
Journal of machine learning research9(11) (2008)
Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research9(11) (2008)
2008
-
[35]
In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), pp
Bravo, D., Ruano, J., G´ omez, M., Gonz´ alez, F.A., Romero, E.: Self-supervised learning for multi- category endoscopy classification and data quality evaluation using masked autoencoders. In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), pp. 1–5 (2025). IEEE
2025
-
[36]
Scientific Data12(1), 102 (2025) 19
Bravo, D., Frias, J., Vera, F., Trejos, J., Mart´ ınez, C., G´ omez, M., Gonz´ alez, F., Romero, E.: Gastrohun an endoscopy dataset of complete systematic screening protocol for the stomach. Scientific Data12(1), 102 (2025) 19
2025
-
[37]
Machine Intelligence Research23(1), 70–114 (2026)
Ji, G.-P., Liu, J., Xu, P., Barnes, N., Khan, F.S., Khan, S., Fan, D.-P.: Frontiers in intelligent colonoscopy. Machine Intelligence Research23(1), 70–114 (2026)
2026
-
[38]
Frontiers in surgery9, 832219 (2022)
Xu, W., Liu, F., Tang, W., Gu, Y., Zhong, J., Cui, L., Du, P.: The mayo endoscopic score is a novel predictive indicator for malignant transformation in ulcerative colitis: a long-term follow-up multicenter study. Frontiers in surgery9, 832219 (2022)
2022
-
[39]
In: International Workshop on Machine Learning in Medical Imaging, pp
Xu, Z., Ali, S., Gupta, S., Leedham, S., East, J.E., Rittscher, J.: Patch-level instance-group discrim- ination with pretext-invariant learning for colitis scoring. In: International Workshop on Machine Learning in Medical Imaging, pp. 101–110 (2022). Springer
2022
-
[40]
International journal of medical informatics135, 104073 (2020)
Jia, Z., Zeng, X., Duan, H., Lu, X., Li, H.: A patient-similarity-based model for diagnostic prediction. International journal of medical informatics135, 104073 (2020)
2020
-
[41]
In: International Conference on Process Mining, pp
Gr¨ uger, J., Kuhn, M., Amri, K., Bergmann, R.: Enhancing healthcare decision-making with analogy- based reasoning. In: International Conference on Process Mining, pp. 447–459 (2024). Springer
2024
-
[42]
Advances in Neural Information Processing Systems33, 16846–16856 (2020)
Kim, Y., Shin, J., Yang, E., Hwang, S.J.: Few-shot visual reasoning with meta-analogical contrastive learning. Advances in Neural Information Processing Systems33, 16846–16856 (2020)
2020
-
[43]
Psychological review129(5), 999 (2022)
Doumas, L.A., Puebla, G., Martin, A.E., Hummel, J.E.: A theory of relation learning and cross- domain generalization. Psychological review129(5), 999 (2022)
2022
-
[44]
Biomedical Signal Processing and Control95, 106387 (2024)
Jin, J., Hu, D., Pu, W., Luo, Y., Feng, X.: Few-shot learning with task adaptation for multi- category gastrointestinal endoscopy classification. Biomedical Signal Processing and Control95, 106387 (2024)
2024
-
[45]
The Lancet Oncology20(5), 262–273 (2019)
Ngiam, K.Y., Khor, W.: Big data and machine learning algorithms for health-care delivery. The Lancet Oncology20(5), 262–273 (2019)
2019
-
[46]
In: International MICCAI Brainlesion Workshop, pp
Sheller, M.J., Reina, G.A., Edwards, B., Martin, J., Bakas, S.: Multi-institutional deep learning mod- eling without sharing patient data: A feasibility study on brain tumor segmentation. In: International MICCAI Brainlesion Workshop, pp. 92–104 (2018). Springer
2018
-
[47]
Scientific reports10(1), 12598 (2020)
Sheller, M.J., Edwards, B., Reina, G.A., Martin, J., Pati, S., Kotrotsou, A., Milchenko, M., Xu, W., Marcus, D., Colen, R.R.,et al.: Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific reports10(1), 12598 (2020)
2020
-
[48]
Future Generation Computer Systems152, 361–371 (2024)
Wahab, H., Mehmood, I., Ugail, H., Del Ser, J., Muhammad, K.: Federated deep learning for wireless capsule endoscopy analysis: Enabling collaboration across multiple data centers for robust learning of diverse pathologies. Future Generation Computer Systems152, 361–371 (2024)
2024
-
[49]
Medical Image Analysis, 103497 (2025)
Li, M., Xu, P., Hu, J., Tang, Z., Yang, G.: From challenges and pitfalls to recommendations and opportunities: Implementing federated learning in healthcare. Medical Image Analysis, 103497 (2025)
2025
-
[50]
Technical report, Arizona State University (2024)
Senthil Velan, S.: Benchmarking and boosting localizers for chest x-ray images. Technical report, Arizona State University (2024)
2024
-
[51]
Technical report, Arizona State University (2024)
Saravanan, M.: Benchmarking and boosting of 3d segmentation models. Technical report, Arizona State University (2024)
2024
-
[52]
In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp
Islam, N.U., Ma, D., Pang, J., Velan, S.S., Gotway, M., Liang, J.: Foundation x: integrating clas- sification, localization, and segmentation through lock-release pretraining strategy for chest x-ray analysis. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3647–3656 (2025). IEEE
2025
-
[53]
In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, 20 M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (2021)
2021
-
[54]
In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 (2023)
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 (2023)
2023
-
[55]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
2022
-
[56]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976– 11986 (2022)
2022
-
[57]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional net- works. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
2017
-
[58]
Minigpt-v2: large language model as a unified interface for vision-language multi-task learning
Chen, J., Zhu, D., Shen, X., Li, X., Liu, Z., Zhang, P., Krishnamoorthi, R., Chandra, V., Xiong, Y., Elhoseiny, M.: Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478 (2023)
-
[59]
Advances in neural information processing systems36, 34892–34916 (2023)
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023)
2023
-
[60]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Liu, H., Li, C., Li, Y., Lee, Y.J.: Improved baselines with visual instruction tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 26296–26306 (2024)
2024
-
[61]
Efficient multimodal learning from data-centric perspective,
He, M., Liu, Y., Wu, B., Yuan, J., Wang, Y., Huang, T., Zhao, B.: Efficient multimodal learning from data-centric perspective. arXiv preprint arXiv:2402.11530 (2024)
-
[62]
Li, Y., Zhang, Y., Wang, C., Zhong, Z., Chen, Y., Chu, R., Liu, S., Jia, J.: Mini-gemini: Mining the potential of multi-modality vision language models. arXiv preprint arXiv:2403.18814 (2024)
-
[63]
Mobilevlm : A fast, strong and open vision language assistant for mobile devices
Chu, X., Qiao, L., Lin, X., Xu, S., Yang, Y., Hu, Y., Wei, F., Zhang, X., Zhang, B., Wei, X.,et al.: Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices. arXiv preprint arXiv:2312.168862(6), 7 (2023)
-
[64]
Advances in Neural Information Processing Systems36, 28541–28564 (2023) 21
Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., Gao, J.: Llava- med: Training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems36, 28541–28564 (2023) 21
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.