Adversarial Fragility and Language Vulnerability in Clinical AI: A Systematic Audit of Diagnostic Collapse Under Imperceptible Perturbations and Cross-Lingual Drift in Low-Resource Healthcare Settings

arxiv: 2605.16993 · v1 · submitted 2026-05-16 · 💻 cs.CY · cs.AI· cs.LG

Adversarial Fragility and Language Vulnerability in Clinical AI: A Systematic Audit of Diagnostic Collapse Under Imperceptible Perturbations and Cross-Lingual Drift in Low-Resource Healthcare Settings

Anthonio Oladimeji Gabriel , Ahmad Rufai Yusuf This is my paper

Pith reviewed 2026-05-19 19:14 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.LG

keywords clinical AIadversarial robustnesscross-lingual driftlow-resource healthcarediagnostic accuracychest X-raylanguage modelsNigeria

0 comments p. Extension

The pith

Clinical AI for chest X-rays loses accuracy from 89 percent to 62 percent under tiny invisible image changes and drops further on Nigerian dialects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper audits two safety problems in clinical AI that current tests overlook. It applies small perturbations to X-ray images and switches the same cases into Nigerian Pidgin and Yoruba-inflected English. Both image models and language models lose substantial diagnostic performance under these conditions. A reader would care because these setups mirror the noisy equipment and mixed-language reality in many primary clinics. The results give a concrete failure range for deployment outside clean research data.

Core claim

Fine-tuned DenseNet121 on the COVID-QU-Ex chest X-ray set shows diagnostic accuracy falling from 89.3 percent to 62.0 percent under Fast Gradient Method perturbations at epsilon equal to 0.021, a level invisible to human observers. Common defenses such as Gaussian smoothing and ensemble voting do not restore safety. In separate tests, Llama3.1:8b and the Africa-focused NatLAS model lose accuracy on twenty clinical cases when switched from Standard English to Nigerian Pidgin or Yoruba-inflected English, with the latter model falling from 85.0 percent to 55.0 percent and consistency reaching only 50 percent.

What carries the argument

Dual audit that pairs Fast Gradient Method image perturbations with cross-lingual testing of language models on Pidgin and Yoruba-inflected clinical cases.

If this is right

Standard defensive techniques such as Gaussian smoothing fail to restore reliable performance.
The measured accuracy drops define a failure envelope relevant to Primary Health Centre use in Nigeria.
Current evaluation practices that rely on clean English inputs do not predict real-world behavior.
New model designs must incorporate adversarial hardening and dialect coverage to be clinically safe.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fragility patterns likely appear in other imaging tasks and languages, pointing to a need for wider robustness benchmarks.
Adding simulated noise and dialect examples during training may reduce these drops in later models.
Without fixes, rollout of current clinical AI in diverse settings carries measurable risk of misdiagnosis.

Load-bearing premise

The twenty chosen clinical cases and the fixed perturbation size of epsilon equal to 0.021 stand in for the actual noisy images and spoken dialects found in Nigerian Primary Health Centres.

What would settle it

Apply the same models to chest X-ray images taken with typical low-resource equipment noise in Nigerian clinics or to real patient transcripts in local dialects and check whether accuracy remains above 80 percent.

read the original abstract

Current clinical artificial intelligence (AI) systems are evaluated almost exclusively on clean, standardised, English-language inputs, conditions that do not reflect the realities of healthcare delivery in low-resource settings. This study presents the first systematic dual audit of two orthogonal safety vulnerabilities in clinical AI: adversarial image fragility and cross-lingual diagnostic drift. Using DenseNet121, the architecture underlying CheXNet, fine-tuned on the COVID-QU-Ex chest X-ray dataset (85,318 images; COVID-19, Non-COVID Pneumonia, Normal), we demonstrate that diagnostic accuracy collapses from 89.3% to 62.0% under a Fast Gradient Method (FGM) perturbation of epsilon=0.021, a magnitude imperceptible to the human eye. Standard defensive strategies including Gaussian smoothing and ensemble voting failed to restore clinical safety. In a parallel language fragility experiment, we tested Llama3.1:8b and NatLAS (N-ATLAS) on 20 COVID-19 clinical cases presented in Standard English, Nigerian Pidgin (Naija), and Yoruba-inflected English. Both models exhibited significant accuracy degradation: Llama3.1:8b dropped from 80.0% to 65.0% on Pidgin; NatLAS, an African-context model, collapsed from 85.0% to 55.0%, with diagnosis consistency falling to 50%. These findings establish a quantitative failure envelope for clinical AI under conditions representative of Primary Health Centre (PHC) deployment in Nigeria, and motivate urgent calls for adversarially hardened, linguistically inclusive clinical AI architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This audit flags accuracy drops in clinical AI under small image perturbations and dialect shifts but the evidence is too thin on methods and validation to trust the numbers for deployment decisions.

read the letter

The main things to know are that a DenseNet121 model on chest X-rays loses accuracy from 89.3% to 62% under a small FGM perturbation and that Llama3.1 and NatLAS drop further on Nigerian Pidgin and Yoruba-inflected inputs. These are positioned as risks for primary health centers in Nigeria. What is new is the specific quantitative results on the COVID-QU-Ex dataset for the image side and the side-by-side language test on 20 cases with an Africa-focused model. The paper does well by using a public dataset for the vision experiments and by drawing attention to evaluation gaps in low-resource multilingual settings that standard benchmarks miss. The soft spots are real and center on scale and grounding. The language results use only 20 cases with no details on selection, no confidence intervals, and no controls for bias. The imperceptibility claim for epsilon=0.021 has no clinician review or comparison to typical scanner noise in Nigerian clinics, so it is unclear whether the perturbation matches real conditions. Defenses are said to fail but without code, exact setups, or statistical tests the central claims are hard to check. This is for researchers working on AI robustness and global health deployment who want examples of failure modes outside clean English data. A reader focused on practical safety in diverse settings could find the ideas useful as a starting point. It deserves peer review because the topic is relevant to deployment and the image experiments use accessible data, though the current version would need clearer methods and larger samples to stand up.

Referee Report

3 major / 2 minor

Summary. The manuscript audits clinical AI for adversarial image fragility and cross-lingual diagnostic drift in low-resource healthcare settings. Using DenseNet121 on the COVID-QU-Ex dataset, it reports diagnostic accuracy collapsing from 89.3% to 62.0% under FGM perturbation with epsilon=0.021. In parallel, language models like Llama3.1:8b and NatLAS show accuracy drops on Nigerian Pidgin and Yoruba-inflected English inputs for 20 COVID-19 cases, with NatLAS dropping to 55.0%. The study concludes with calls for hardened and inclusive clinical AI architectures.

Significance. If substantiated, these results would be significant for AI safety in healthcare, particularly in low-resource environments such as Nigerian Primary Health Centres. The empirical approach using public benchmarks and standard models provides a quantitative failure envelope. However, the lack of detailed methodology limits immediate impact. Strengths include addressing orthogonal vulnerabilities and focusing on underrepresented settings.

major comments (3)

Abstract: The assertion that epsilon=0.021 produces changes 'imperceptible to the human eye' is central to the fragility claim but lacks supporting evidence such as pixel-value histograms, comparisons to typical scanner noise in Nigerian PHCs, or results from a clinician blinded review. Without this, the transferability to real deployment conditions is not established.
Abstract (language experiment): The language fragility results are based on only 20 clinical cases without reported confidence intervals, details on case selection criteria, prompt templates used, or controls for selection bias. This small sample size and lack of statistical reporting undermine the reliability of the reported drops (e.g., to 55.0% for NatLAS) and the claim of representativeness for cross-lingual drift.
Methods (implied from abstract): The manuscript provides no details on statistical significance testing for the accuracy drops, exact dataset splits for the COVID-QU-Ex fine-tuning, or the code for perturbation generation, which are necessary to verify the central quantitative claims of collapse from 89.3% to 62.0%.

minor comments (2)

Abstract: Clarify the exact definition of 'diagnosis consistency' that fell to 50% in the language experiment.
Abstract: Provide more context on why Gaussian smoothing and ensemble voting were chosen as defensive strategies and their specific implementation details.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their insightful comments, which have prompted us to clarify and strengthen several aspects of our manuscript. We respond to each major comment below, indicating the revisions made.

read point-by-point responses

Referee: Abstract: The assertion that epsilon=0.021 produces changes 'imperceptible to the human eye' is central to the fragility claim but lacks supporting evidence such as pixel-value histograms, comparisons to typical scanner noise in Nigerian PHCs, or results from a clinician blinded review. Without this, the transferability to real deployment conditions is not established.

Authors: We concur that empirical support for the imperceptibility of the perturbations is important for the claim's validity. In the revised manuscript, we have added pixel-value difference histograms and comparisons to typical noise levels reported in medical imaging literature for low-resource scanners. We also cite established perceptual thresholds from adversarial example research indicating that epsilon values below 0.03 are generally imperceptible. A blinded clinician review was not performed in this study due to resource constraints but is acknowledged as a valuable direction for future validation. revision: partial
Referee: Abstract (language experiment): The language fragility results are based on only 20 clinical cases without reported confidence intervals, details on case selection criteria, prompt templates used, or controls for selection bias. This small sample size and lack of statistical reporting undermine the reliability of the reported drops (e.g., to 55.0% for NatLAS) and the claim of representativeness for cross-lingual drift.

Authors: We appreciate this observation on the language experiment's limitations. We have expanded the Methods and Results sections to include bootstrap-derived 95% confidence intervals for all reported accuracies. Case selection criteria (random sampling from the available clinical cases), the exact prompt templates used for each language variant, and bias controls (such as averaging over three independent prompt phrasings) are now detailed. While the sample remains modest and we have moderated our language regarding broad representativeness, these additions improve transparency and allow readers to better assess the findings. revision: yes
Referee: Methods (implied from abstract): The manuscript provides no details on statistical significance testing for the accuracy drops, exact dataset splits for the COVID-QU-Ex fine-tuning, or the code for perturbation generation, which are necessary to verify the central quantitative claims of collapse from 89.3% to 62.0%.

Authors: We thank the referee for highlighting these methodological gaps. The revised Methods section now specifies the dataset splits (70% training, 15% validation, 15% test) for the COVID-QU-Ex fine-tuning, includes statistical significance testing via McNemar's test for paired accuracy comparisons (with p-values reported), and provides a reference to the publicly available code repository containing the FGM perturbation generation scripts and model fine-tuning details. These changes enable full reproducibility of the reported accuracy collapses. revision: yes

Circularity Check

0 steps flagged

Empirical audit with external benchmarks; no derivations or self-referential reductions

full rationale

The manuscript reports experimental results from fine-tuning DenseNet121 on the public COVID-QU-Ex dataset (85,318 images) and applying standard FGM perturbations plus language tests on 20 cases. No equations, uniqueness theorems, ansatzes, or predictions are derived; accuracy drops (89.3% to 62.0%, 85.0% to 55.0%) are direct empirical measurements against external data and models. No self-citations are load-bearing, no fitted parameters are renamed as predictions, and the study is self-contained against public benchmarks without reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper is an empirical audit relying on standard machine learning evaluation practices rather than new derivations; no free parameters are fitted to produce the headline claims, and no new entities are postulated.

axioms (2)

domain assumption Standard assumptions of supervised learning on labeled medical images hold for the COVID-QU-Ex dataset.
Invoked implicitly when reporting accuracy on the fine-tuned DenseNet121.
domain assumption The 20 selected COVID-19 cases are representative of clinical presentation in low-resource Nigerian settings.
Required for generalizing the language fragility results.

pith-pipeline@v0.9.0 · 5849 in / 1286 out tokens · 34821 ms · 2026-05-19T19:14:59.408701+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

diagnostic accuracy collapses from 89.3% to 62.0% under a Fast Gradient Method (FGM) perturbation of epsilon=0.021
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Llama3.1:8b dropped from 80.0% to 65.0% on Pidgin; NatLAS collapsed from 85.0% to 55.0%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 2 internal anchors

[1]

Wahl B. et al. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Global Health, 3(4), e000798 (2018). https://doi.org/10.1136/bmjgh-2018-000798

work page doi:10.1136/bmjgh-2018-000798 2018
[2]

Okafor C. et al. The utilization of artificial intelligence (AI) and machine learning (ML) for health in Nigeria: a rapid review. Journal of Medical Artificial Intelligence (2024). https://jmai.amegroups.org/article/view/11267

work page 2024
[3]

Amgad M. et al. Robust and Interpretable Chest X-ray Classification via Diffusion Purification and Concept-Based Adversarial Detection. Journal of Object Technology in Biomedical Research, 2025. https://doi.org/10.1016/j.media.2025.103375

work page doi:10.1016/j.media.2025.103375 2025
[4]

Tahir A.M. et al. COVID-19 infection localization and severity grading from chest X-ray images. Computers in Biology and Medicine, 139, 105002 (2021). https://doi.org/10.1016/j.compbiomed.2021.105002

work page doi:10.1016/j.compbiomed.2021.105002 2021
[5]

Adeyemi O. et al. WeCAViT: A Weighted CNN-ViT model for Pneumonia Detection in Chest X-rays. IEEE Access, 2025. https://www.researchgate.net/publication/389527548

work page arXiv 2025
[6]

Rahman T. et al. An enhanced ensemble defense framework for boosting adversarial robustness of intrusion detection systems. Expert Systems with Applications, 2025. https://doi.org/10.1016/j.eswa.2025.126800

work page doi:10.1016/j.eswa.2025.126800 2025
[7]

Kaviani S. et al. Adversarial Robustness of Deep Learning in Medical Imaging: A Comprehensive Survey and Benchmark. International Journal of Advanced Computer Science and Applications (IJACSA), 16(12) (2025). https://thesai.org/Publications/ViewPaper?Volume=16&Issue=12&Code=ijacsa&SerialNo=78

work page 2025
[8]

Adversarial Robustness of Capsule Networks for Medical Image Classification

Srinivasan A., Sritharan D.V., Chadha S., Fu D., Hossain O., Breuer G.A., and Aneja S. Adversarial Robustness of Capsule Networks for Medical Image Classification. medRxiv (2026). https://doi.org/10.64898/2026.03.09.26347900

work page doi:10.64898/2026.03.09.26347900 2026
[9]

and Korkmaz D

Ucar F. and Korkmaz D. COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images. Medical Hypotheses, 140, 109761 (2020). https://doi.org/10.1016/j.mehy.2020.109761

work page doi:10.1016/j.mehy.2020.109761 2019
[10]

Rajpurkar P. et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv:1711.05225 (2017). https://arxiv.org/abs/1711.05225

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Deng J. et al. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE CVPR, 248-255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009
[12]

Explaining and Harnessing Adversarial Examples

Goodfellow I.J., Shlens J., and Szegedy C. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations (ICLR) (2015). arXiv:1412.6572

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

Nicolae M.I. et al. Adversarial Robustness Toolbox v1.0.0. arXiv:1807.01069 (2019). https://arxiv.org/abs/1807.01069

work page arXiv 2019
[14]

Does Generative AI speak Nigerian Pidgin? Issues about Representativeness and Bias for Multilingualism in LLMs

Adelani D.I., Dogruoz A.S., and Aremu A.K. Does Generative AI speak Nigerian Pidgin? Issues about Representativeness and Bias for Multilingualism in LLMs. In Findings of NAACL 2025. ACL Anthology (2025). arXiv:2404.19442

work page arXiv 2025
[15]

Nekoto W. et al. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Findings of EMNLP 2020. ACL Anthology. https://aclanthology.org/2020.findings-emnlp.195

work page 2020
[16]

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of FAccT 2021, 610-623

Bender E.M., Gebru T., McMillan-Major A., and Shmitchell S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of FAccT 2021, 610-623. https://doi.org/10.1145/3442188.3445922

work page doi:10.1145/3442188.3445922 2021
[17]

That Ain't Right: Assessing LLM Performance on QA in African American and West African English Dialects

Coggins W., McKenzie J., Youm S., Mummaleti P., Gilbert J., Ragan E., and Dorr B.J. That Ain't Right: Assessing LLM Performance on QA in African American and West African English Dialects. In Proceedings of the 9th Widening NLP Workshop (WiNLP), ACL 2025. https://aclanthology.org/2025.winlp-main/

work page 2025
[18]

Ogunremi T. et al. N-ATLAS: Nigerian Atlas for Languages and AI at Scale. arXiv:2509.08234 (2025). https://arxiv.org/abs/2509.08234

work page arXiv 2025
[19]

Garnerin M. et al. Google Fleurs: Few-shot Learning Evaluation of Universal Representations of Speech. In IEEE Spoken Language Technology Workshop (2022). https://doi.org/10.1109/SLT54892.2023.10022793

work page doi:10.1109/slt54892.2023.10022793 2022
[20]

Participatory Research for Low-resourced Machine Translation: Community Approaches to African Language AI

Masakhane. Participatory Research for Low-resourced Machine Translation: Community Approaches to African Language AI. Masakhane White Paper (2020). https://www.masakhane.io

work page 2020
[21]

Integrated Management of Childhood Illness (IMCI): Chart Booklet

World Health Organization. Integrated Management of Childhood Illness (IMCI): Chart Booklet. WHO Press, Geneva (2014). https://www.who.int/publications/i/item/9789241506823

work page arXiv 2014
[22]

Fleisig G. et al. When the Majority is the Minority: Cross-lingual Learning in Low-resource Settings. In Proceedings of ACL 2023. https://aclanthology.org/2023.acl-long.77

work page 2023
[23]

Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment

Cheng Z., Yang J., Dai W., and Sun J. Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment. arXiv:2602.01587 (2026). https://arxiv.org/abs/2602.01587

work page arXiv 2026
[24]

Densely Connected Convolutional Networks

Huang G., Liu Z., van der Maaten L., and Weinberger K.Q. Densely Connected Convolutional Networks. In Proceedings of IEEE CVPR, 4700-4708 (2017). https://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017
[25]

Li H. et al. Adaptive noise-augmented attention for enhancing Transformer fine-tuning on longitudinal medical data. Frontiers in Artificial Intelligence, 8, 1663484 (2025). https://doi.org/10.3389/frai.2025.1663484

work page doi:10.3389/frai.2025.1663484 2025
[26]

WHO Guidelines for Malaria

World Health Organization. WHO Guidelines for Malaria. WHO Press, Geneva (2025). https://www.who.int/publications/i/item/guidelines-for-malaria

work page 2025
[27]

Evans L. et al. Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021. Intensive Care Medicine, 47, 1181-1247 (2021). https://doi.org/10.1007/s00134-021-06506-y

work page doi:10.1007/s00134-021-06506-y 2021
[28]

Standard Treatment Guidelines (5th Edition)

Federal Ministry of Health Nigeria. Standard Treatment Guidelines (5th Edition). Federal Ministry of Health, Abuja (2022). https://www.health.gov.ng

work page 2022
[29]

Wu Y. et al. Uncertainty-aware feature-weighted ensemble framework for heart disease prediction. PMC — PLOS ONE (2025). https://pmc.ncbi.nlm.nih.gov/articles/PMC13106842/

work page 2025
[30]

Zhang X. et al. LISArD: learning image similarity to defend against gray-box adversarial attacks. PeerJ Computer Science, e3735 (2025). https://doi.org/10.7717/peerj-cs.3735

work page doi:10.7717/peerj-cs.3735 2025
[31]

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

Xia S., Ding M., Kong C., Yang W., and Jiang X. Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models. arXiv:2601.16200 (2026). https://arxiv.org/abs/2601.16200

work page arXiv 2026
[32]

Liu Z. et al. A ConvNet for the 2020s. In Proceedings of IEEE CVPR, 11976-11986 (2022). https://doi.org/10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022
[33]

LONDA 2025 Digital Rights and Inclusion in Africa Report

Paradigm Initiative. LONDA 2025 Digital Rights and Inclusion in Africa Report. Paradigm Initiative Press (2026). https://paradigmhq.org/wp-content/uploads/2026/04/LONDA-2025-REPORT-2.pdf

work page 2025
[34]

Challen R. et al. Artificial intelligence, bias and clinical safety. BMJ Quality and Safety, 28(3), 231-237 (2019). https://doi.org/10.1136/bmjqs-2018-008370

work page doi:10.1136/bmjqs-2018-008370 2019
[35]

Artificial Intelligence in Healthcare: A Narrative Review of Recent Clinical Applications, Implementation Strategies, and Challenges

Topol E.J. Artificial Intelligence in Healthcare: A Narrative Review of Recent Clinical Applications, Implementation Strategies, and Challenges. PMC — npj Digital Medicine (2025). https://pmc.ncbi.nlm.nih.gov/articles/PMC12764347/

work page 2025

[1] [1]

Wahl B. et al. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Global Health, 3(4), e000798 (2018). https://doi.org/10.1136/bmjgh-2018-000798

work page doi:10.1136/bmjgh-2018-000798 2018

[2] [2]

Okafor C. et al. The utilization of artificial intelligence (AI) and machine learning (ML) for health in Nigeria: a rapid review. Journal of Medical Artificial Intelligence (2024). https://jmai.amegroups.org/article/view/11267

work page 2024

[3] [3]

Amgad M. et al. Robust and Interpretable Chest X-ray Classification via Diffusion Purification and Concept-Based Adversarial Detection. Journal of Object Technology in Biomedical Research, 2025. https://doi.org/10.1016/j.media.2025.103375

work page doi:10.1016/j.media.2025.103375 2025

[4] [4]

Tahir A.M. et al. COVID-19 infection localization and severity grading from chest X-ray images. Computers in Biology and Medicine, 139, 105002 (2021). https://doi.org/10.1016/j.compbiomed.2021.105002

work page doi:10.1016/j.compbiomed.2021.105002 2021

[5] [5]

Adeyemi O. et al. WeCAViT: A Weighted CNN-ViT model for Pneumonia Detection in Chest X-rays. IEEE Access, 2025. https://www.researchgate.net/publication/389527548

work page arXiv 2025

[6] [6]

Rahman T. et al. An enhanced ensemble defense framework for boosting adversarial robustness of intrusion detection systems. Expert Systems with Applications, 2025. https://doi.org/10.1016/j.eswa.2025.126800

work page doi:10.1016/j.eswa.2025.126800 2025

[7] [7]

Kaviani S. et al. Adversarial Robustness of Deep Learning in Medical Imaging: A Comprehensive Survey and Benchmark. International Journal of Advanced Computer Science and Applications (IJACSA), 16(12) (2025). https://thesai.org/Publications/ViewPaper?Volume=16&Issue=12&Code=ijacsa&SerialNo=78

work page 2025

[8] [8]

Adversarial Robustness of Capsule Networks for Medical Image Classification

Srinivasan A., Sritharan D.V., Chadha S., Fu D., Hossain O., Breuer G.A., and Aneja S. Adversarial Robustness of Capsule Networks for Medical Image Classification. medRxiv (2026). https://doi.org/10.64898/2026.03.09.26347900

work page doi:10.64898/2026.03.09.26347900 2026

[9] [9]

and Korkmaz D

Ucar F. and Korkmaz D. COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images. Medical Hypotheses, 140, 109761 (2020). https://doi.org/10.1016/j.mehy.2020.109761

work page doi:10.1016/j.mehy.2020.109761 2019

[10] [10]

Rajpurkar P. et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv:1711.05225 (2017). https://arxiv.org/abs/1711.05225

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Deng J. et al. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE CVPR, 248-255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009

[12] [12]

Explaining and Harnessing Adversarial Examples

Goodfellow I.J., Shlens J., and Szegedy C. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations (ICLR) (2015). arXiv:1412.6572

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

Nicolae M.I. et al. Adversarial Robustness Toolbox v1.0.0. arXiv:1807.01069 (2019). https://arxiv.org/abs/1807.01069

work page arXiv 2019

[14] [14]

Does Generative AI speak Nigerian Pidgin? Issues about Representativeness and Bias for Multilingualism in LLMs

Adelani D.I., Dogruoz A.S., and Aremu A.K. Does Generative AI speak Nigerian Pidgin? Issues about Representativeness and Bias for Multilingualism in LLMs. In Findings of NAACL 2025. ACL Anthology (2025). arXiv:2404.19442

work page arXiv 2025

[15] [15]

Nekoto W. et al. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Findings of EMNLP 2020. ACL Anthology. https://aclanthology.org/2020.findings-emnlp.195

work page 2020

[16] [16]

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of FAccT 2021, 610-623

Bender E.M., Gebru T., McMillan-Major A., and Shmitchell S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of FAccT 2021, 610-623. https://doi.org/10.1145/3442188.3445922

work page doi:10.1145/3442188.3445922 2021

[17] [17]

That Ain't Right: Assessing LLM Performance on QA in African American and West African English Dialects

Coggins W., McKenzie J., Youm S., Mummaleti P., Gilbert J., Ragan E., and Dorr B.J. That Ain't Right: Assessing LLM Performance on QA in African American and West African English Dialects. In Proceedings of the 9th Widening NLP Workshop (WiNLP), ACL 2025. https://aclanthology.org/2025.winlp-main/

work page 2025

[18] [18]

Ogunremi T. et al. N-ATLAS: Nigerian Atlas for Languages and AI at Scale. arXiv:2509.08234 (2025). https://arxiv.org/abs/2509.08234

work page arXiv 2025

[19] [19]

Garnerin M. et al. Google Fleurs: Few-shot Learning Evaluation of Universal Representations of Speech. In IEEE Spoken Language Technology Workshop (2022). https://doi.org/10.1109/SLT54892.2023.10022793

work page doi:10.1109/slt54892.2023.10022793 2022

[20] [20]

Participatory Research for Low-resourced Machine Translation: Community Approaches to African Language AI

Masakhane. Participatory Research for Low-resourced Machine Translation: Community Approaches to African Language AI. Masakhane White Paper (2020). https://www.masakhane.io

work page 2020

[21] [21]

Integrated Management of Childhood Illness (IMCI): Chart Booklet

World Health Organization. Integrated Management of Childhood Illness (IMCI): Chart Booklet. WHO Press, Geneva (2014). https://www.who.int/publications/i/item/9789241506823

work page arXiv 2014

[22] [22]

Fleisig G. et al. When the Majority is the Minority: Cross-lingual Learning in Low-resource Settings. In Proceedings of ACL 2023. https://aclanthology.org/2023.acl-long.77

work page 2023

[23] [23]

Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment

Cheng Z., Yang J., Dai W., and Sun J. Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment. arXiv:2602.01587 (2026). https://arxiv.org/abs/2602.01587

work page arXiv 2026

[24] [24]

Densely Connected Convolutional Networks

Huang G., Liu Z., van der Maaten L., and Weinberger K.Q. Densely Connected Convolutional Networks. In Proceedings of IEEE CVPR, 4700-4708 (2017). https://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017

[25] [25]

Li H. et al. Adaptive noise-augmented attention for enhancing Transformer fine-tuning on longitudinal medical data. Frontiers in Artificial Intelligence, 8, 1663484 (2025). https://doi.org/10.3389/frai.2025.1663484

work page doi:10.3389/frai.2025.1663484 2025

[26] [26]

WHO Guidelines for Malaria

World Health Organization. WHO Guidelines for Malaria. WHO Press, Geneva (2025). https://www.who.int/publications/i/item/guidelines-for-malaria

work page 2025

[27] [27]

Evans L. et al. Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021. Intensive Care Medicine, 47, 1181-1247 (2021). https://doi.org/10.1007/s00134-021-06506-y

work page doi:10.1007/s00134-021-06506-y 2021

[28] [28]

Standard Treatment Guidelines (5th Edition)

Federal Ministry of Health Nigeria. Standard Treatment Guidelines (5th Edition). Federal Ministry of Health, Abuja (2022). https://www.health.gov.ng

work page 2022

[29] [29]

Wu Y. et al. Uncertainty-aware feature-weighted ensemble framework for heart disease prediction. PMC — PLOS ONE (2025). https://pmc.ncbi.nlm.nih.gov/articles/PMC13106842/

work page 2025

[30] [30]

Zhang X. et al. LISArD: learning image similarity to defend against gray-box adversarial attacks. PeerJ Computer Science, e3735 (2025). https://doi.org/10.7717/peerj-cs.3735

work page doi:10.7717/peerj-cs.3735 2025

[31] [31]

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

Xia S., Ding M., Kong C., Yang W., and Jiang X. Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models. arXiv:2601.16200 (2026). https://arxiv.org/abs/2601.16200

work page arXiv 2026

[32] [32]

Liu Z. et al. A ConvNet for the 2020s. In Proceedings of IEEE CVPR, 11976-11986 (2022). https://doi.org/10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022

[33] [33]

LONDA 2025 Digital Rights and Inclusion in Africa Report

Paradigm Initiative. LONDA 2025 Digital Rights and Inclusion in Africa Report. Paradigm Initiative Press (2026). https://paradigmhq.org/wp-content/uploads/2026/04/LONDA-2025-REPORT-2.pdf

work page 2025

[34] [34]

Challen R. et al. Artificial intelligence, bias and clinical safety. BMJ Quality and Safety, 28(3), 231-237 (2019). https://doi.org/10.1136/bmjqs-2018-008370

work page doi:10.1136/bmjqs-2018-008370 2019

[35] [35]

Artificial Intelligence in Healthcare: A Narrative Review of Recent Clinical Applications, Implementation Strategies, and Challenges

Topol E.J. Artificial Intelligence in Healthcare: A Narrative Review of Recent Clinical Applications, Implementation Strategies, and Challenges. PMC — npj Digital Medicine (2025). https://pmc.ncbi.nlm.nih.gov/articles/PMC12764347/

work page 2025