pith. machine review for the scientific record. sign in

arxiv: 2605.04079 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.AI

Recognition: unknown

Efficient Handwriting-Based Alzheimer,s Disease Diagnosis Using a Low-Rank Mixture of Experts Deep Learning Framework

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords Alzheimer's diseasehandwriting analysismixture of expertsLoRA adaptersdeep learningdigital biomarkersDARWIN datasetparameter efficiency
0
0 comments X

The pith

A low-rank mixture of experts model diagnoses Alzheimer's from handwriting signals while activating far fewer parameters than standard neural networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning system that identifies Alzheimer's disease by analyzing handwriting patterns captured in the DARWIN dataset. It structures the model as multiple expert subnetworks that each focus on distinct writing features, all built on a shared base but kept lightweight through low-rank adapters. A reader would care because this setup promises accurate, non-invasive screening that runs efficiently enough for widespread use in digital health tools. Experiments compare the approach against plain multilayer perceptrons and full mixture-of-experts models, while testing variations in expert count, adapter rank, and ensemble stacking methods.

Core claim

The Low-Rank Mixture of Experts framework lets separate expert modules specialize in different handwriting patterns while sharing a common base network, with each expert fitted only with compact low-rank adapters instead of full parameters. On the DARWIN dataset this yields diagnostic performance that matches or exceeds multilayer perceptrons and conventional mixture-of-experts architectures, yet activates markedly fewer parameters during inference. Ablation studies on hidden size, expert count, and adapter rank, together with StackMean and StackMax ensemble variants, confirm improved training stability and robustness for Alzheimer's screening.

What carries the argument

The Low-Rank Mixture of Experts (LoRA-MoE) architecture, in which inputs are routed to specialized experts that apply low-rank adapters to a shared base network so that pattern-specific learning occurs with minimal added parameters.

If this is right

  • Experts can specialize on distinct handwriting features without mutual interference because they share the base network and use only low-rank updates.
  • Training stability increases and total trainable parameters drop compared with a full mixture-of-experts model.
  • Stacking ensembles built on the LoRA-MoE outputs further raise robustness without restoring the full parameter cost.
  • The resulting efficiency supports deployment in resource-limited digital-health screening applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same lightweight-expert pattern could be applied to other motor or cognitive tasks that produce time-series signals, such as gait or speech recordings.
  • Because parameter activation is low, the model could run directly on smartphones or tablets for at-home collection and instant feedback.
  • Longitudinal handwriting data, if collected, would allow the framework to track progression rather than perform one-time classification.
  • Combining the handwriting pathway with brief cognitive tests inside the same app might raise overall screening sensitivity without adding clinical visits.

Load-bearing premise

Handwriting provides a reliable non-invasive signal of early cognitive-motor decline in Alzheimer's and the DARWIN dataset captures enough variation for the model to work on new patients.

What would settle it

A clear drop in diagnostic accuracy when the trained LoRA-MoE model is evaluated on an independent handwriting collection drawn from a different demographic or acquisition protocol than the DARWIN dataset.

Figures

Figures reproduced from arXiv: 2605.04079 by Fouzi Harrou, Wu Wang, Ying Sun, Yuang Cheng.

Figure 1
Figure 1. Figure 1: Overview of the proposed LoRA-MoE architecture for handwriting-based Alzheimer’s disease diagnosis. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The end-to-end workflow of the proposed LoRA-MoE architecture for Alzheimer’s disease diagnosis using [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Handwriting data acquisition setup using a graphic tablet to capture pen movements in real time [40]. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of (a) training time, (b) base-learner accuracy distribution, and (c) efficiency index (perfor [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of (a) training time, (b) base-learner accuracy distribution, and (c) efficiency index (perfor [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of (a) training time, (b) base-learner accuracy distribution, and (c) efficiency index (perfor [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Early and reliable detection of Alzheimer's disease (AD) is crucial for timely clinical intervention and improved patient management. It also supports the evaluation of emerging therapeutic strategies. In this paper, we propose a Low-Rank Mixture of Experts (LoRA-MoE) deep learning framework for Alzheimer's disease diagnosis based on handwriting analysis. Handwriting signals provide a non-invasive and scalable digital biomarker that captures subtle cognitive-motor impairments associated with early AD progression. The proposed architecture allows multiple experts to specialize in different handwriting patterns while sharing a common base network. This design enables efficient learning of general representations while reducing interference between experts. Each expert is equipped with lightweight low-rank adapters. This mechanism significantly reduces the number of trainable parameters compared with standard Mixture of Experts (MoE) models and improves training stability. The proposed framework is evaluated on the Diagnosis AlzheimeR WIth haNdwriting (DARWIN) dataset. Extensive experiments are conducted, including ablation studies on key architectural parameters such as hidden dimension size, number of experts, and LoRA rank. The method is compared with multilayer perceptron (MLP) and conventional MoE architectures. In addition, stacking ensemble strategies (StackMean and StackMax) are investigated to improve robustness and predictive performance. Experimental results show that the LoRA-MoE framework achieves powerful diagnostic performance while activating significantly fewer parameters during inference. These results highlight the potential of the proposed approach as an accurate and computationally efficient solution for handwriting-based Alzheimer's disease screening and digital health applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Low-Rank Mixture of Experts (LoRA-MoE) deep learning framework for Alzheimer's disease diagnosis from handwriting signals on the DARWIN dataset. Multiple experts with low-rank adapters specialize in different patterns while sharing a base network, reducing trainable parameters relative to standard MoE. The work includes ablations on hidden dimension size, number of experts, and LoRA rank; comparisons against MLP and conventional MoE; and stacking ensembles (StackMean, StackMax). The central claim is that LoRA-MoE delivers strong diagnostic performance while activating significantly fewer parameters at inference.

Significance. If the empirical results are robust and the DARWIN data support generalization, the parameter-efficient LoRA-MoE design could advance scalable, non-invasive digital biomarkers for early AD screening. The emphasis on training stability and reduced interference between experts addresses practical deployment constraints in digital health.

major comments (2)
  1. [Abstract and Evaluation] Abstract and Evaluation section: The abstract states that the framework 'achieves powerful diagnostic performance' and activates 'significantly fewer parameters' yet supplies no numerical metrics (accuracy, AUC, F1, parameter counts, or inference FLOPs), no statistical tests, and no baseline numbers. Without these, the central performance claim cannot be verified or compared.
  2. [Dataset and Experimental Setup] Dataset and Experimental Setup: No sample size, demographic breakdown (age, education, disease stage), class balance, or train/test split details are provided for the DARWIN dataset. This information is load-bearing for the generalizability claim, as the representativeness of the handwriting signals is the key assumption underlying the diagnostic performance results.
minor comments (2)
  1. [Title] Title: 'Alzheimer,s' contains a typographical error and should read 'Alzheimer's'.
  2. [Method] Notation: The description of 'low-rank adapters' and 'experts' would benefit from an explicit equation or diagram showing how the LoRA matrices are injected into the expert layers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have addressed each major comment point by point below. Revisions have been made to the abstract and experimental setup sections to improve transparency and verifiability of the results.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: The abstract states that the framework 'achieves powerful diagnostic performance' and activates 'significantly fewer parameters' yet supplies no numerical metrics (accuracy, AUC, F1, parameter counts, or inference FLOPs), no statistical tests, and no baseline numbers. Without these, the central performance claim cannot be verified or compared.

    Authors: We agree that the abstract would benefit from explicit numerical support for the performance claims. In the revised manuscript, we have updated the abstract to include key quantitative results from our experiments (accuracy, AUC, F1-score, activated parameter counts at inference, and inference FLOPs) together with direct comparisons to the MLP and standard MoE baselines. We also reference the relevant tables and note that statistical significance testing (e.g., McNemar’s test for classification comparisons) is reported in the experimental section. revision: yes

  2. Referee: [Dataset and Experimental Setup] Dataset and Experimental Setup: No sample size, demographic breakdown (age, education, disease stage), class balance, or train/test split details are provided for the DARWIN dataset. This information is load-bearing for the generalizability claim, as the representativeness of the handwriting signals is the key assumption underlying the diagnostic performance results.

    Authors: We acknowledge that a fuller description of the DARWIN dataset is necessary for assessing generalizability. The revised manuscript now includes the total sample size, available demographic statistics (age, education level, and disease stage distribution), class balance between AD and control participants, and the precise train/validation/test split ratios employed in our experiments. These details are drawn from the dataset documentation and our experimental protocol and are presented in the Dataset and Experimental Setup section. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical ML evaluation on external dataset with no derivations

full rationale

The paper proposes a LoRA-MoE architecture for handwriting-based AD diagnosis and reports performance via standard training, ablations, and comparisons on the external DARWIN dataset. No equations, derivations, or first-principles predictions exist that could reduce to self-referential inputs. Central claims are data-driven empirical results, not tautological by construction. Self-citations are absent from load-bearing steps in the provided text.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The model depends on standard supervised deep-learning assumptions plus dataset-specific architectural choices; no new physical entities are postulated.

free parameters (3)
  • LoRA rank
    Hyperparameter selected via ablation to trade off performance against parameter count.
  • number of experts
    Architectural choice evaluated through ablation studies on the DARWIN dataset.
  • hidden dimension size
    Model width tuned during experiments to balance capacity and efficiency.
axioms (2)
  • domain assumption Handwriting signals reliably encode early cognitive-motor changes due to Alzheimer's
    Invoked in the abstract as the basis for using handwriting as a digital biomarker.
  • domain assumption The DARWIN dataset distribution is representative of real-world early AD cases
    Implicit in the claim that experimental results support clinical screening potential.

pith-pipeline@v0.9.0 · 5573 in / 1353 out tokens · 39494 ms · 2026-05-10T15:39:41.466111+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 11 canonical work pages · 4 internal anchors

  1. [1]

    Nia-aa research framework: toward a biological definition of alzheimer’s disease.Alzheimer’s & dementia, 14(4):535–562, 2018

    Clifford R Jack Jr, David A Bennett, Kaj Blennow, Maria C Carrillo, Billy Dunn, Samantha Budd Haeberlein, David M Holtzman, William Jagust, Frank Jessen, Jason Karlawish, et al. Nia-aa research framework: toward a biological definition of alzheimer’s disease.Alzheimer’s & dementia, 14(4):535–562, 2018

  2. [2]

    Alzheimer’s disease.The Lancet, 397(10284):1577–1590, 2021

    Philip Scheltens, Bart De Strooper, Miia Kivipelto, Henne Holstege, Gael Chételat, Charlotte E Teunissen, Jeffrey Cummings, and Wiesje M van der Flier. Alzheimer’s disease.The Lancet, 397(10284):1577–1590, 2021

  3. [3]

    Clinical utility of cerebrospinal fluid biomarkers in the diagnosis of early alzheimer’s disease.Alzheimer’s & Dementia, 11(1):58–69, 2015

    Kaj Blennow, Bruno Dubois, Anne M Fagan, Piotr Lewczuk, Mony J De Leon, and Harald Hampel. Clinical utility of cerebrospinal fluid biomarkers in the diagnosis of early alzheimer’s disease.Alzheimer’s & Dementia, 11(1):58–69, 2015

  4. [4]

    The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception.Alzheimer’s & Dementia, 9(5):e111–e194, 2013

    Michael W Weiner, Dallas P Veitch, Paul S Aisen, Laurel A Beckett, Nigel J Cairns, Robert C Green, Danielle Harvey, Clifford R Jack, William Jagust, Enchi Liu, et al. The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception.Alzheimer’s & Dementia, 9(5):e111–e194, 2013

  5. [5]

    A survey on deep learning in medical image analysis.Medical image analysis, 42:60–88, 2017

    Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis.Medical image analysis, 42:60–88, 2017

  6. [6]

    Stacked deep learning approach for efficient sars-cov-2 detection in blood samples.Artificial Intelligence in Medicine, 148:102767, 2024

    Wu Wang, Fouzi Harrou, Abdelkader Dairi, and Ying Sun. Stacked deep learning approach for efficient sars-cov-2 detection in blood samples.Artificial Intelligence in Medicine, 148:102767, 2024

  7. [7]

    Early detection of parkinson’s disease using deep learning and machine learning.IEEE access, 8:147635–147646, 2020

    Wu Wang, Junho Lee, Fouzi Harrou, and Ying Sun. Early detection of parkinson’s disease using deep learning and machine learning.IEEE access, 8:147635–147646, 2020

  8. [8]

    A manifold learning-based anomaly detection framework for cardiovascular disease diagnosis.Computational Intelligence, 41(5):e70130, 2025

    Fouzi Harrou, Abdelkader Dairi, and Ying Sun. A manifold learning-based anomaly detection framework for cardiovascular disease diagnosis.Computational Intelligence, 41(5):e70130, 2025

  9. [9]

    A machine learning model for early detection of diabetic foot using thermogram images.Computers in biology and medicine, 137:104838, 2021

    Amith Khandakar, Muhammad EH Chowdhury, Mamun Bin Ibne Reaz, Sawal Hamid Md Ali, Md Anwarul Hasan, Serkan Kiranyaz, Tawsifur Rahman, Rashad Alfkey, Ahmad Ashrif A Bakar, and Rayaz A Malik. A machine learning model for early detection of diabetic foot using thermogram images.Computers in biology and medicine, 137:104838, 2021. 19

  10. [10]

    A guide to deep learning in healthcare.Nature medicine, 25(1):24–29, 2019

    Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, V olodymyr Kuleshov, Mark DePristo, Katherine Chou, Claire Cui, Greg Corrado, Sebastian Thrun, and Jeff Dean. A guide to deep learning in healthcare.Nature medicine, 25(1):24–29, 2019

  11. [11]

    Dysgraphia in alzheimer’s disease: a review for clinical and research purposes.Journal of speech, Language, and Hearing research, 49(6):1313–1330, 2006

    Jean Neils-Strunjas, Kathy Groves-Wright, Pauline Mashima, and Stacy Harnish. Dysgraphia in alzheimer’s disease: a review for clinical and research purposes.Journal of speech, Language, and Hearing research, 49(6):1313–1330, 2006

  12. [12]

    Handwriting analysis to support neurodegenerative diseases diagnosis: A review.Pattern Recognition Letters, 121:37–45, 2019

    Claudio De Stefano, Francesco Fontanella, Donato Impedovo, Giuseppe Pirlo, and Alessandra Scotto Di Freca. Handwriting analysis to support neurodegenerative diseases diagnosis: A review.Pattern Recognition Letters, 121:37–45, 2019

  13. [13]

    Alzheimer’s dementia recognition through spontaneous speech, 2021

    Saturnino Luz, Fasih Haider, Sofia de la Fuente Garcia, Davida Fromm, and Brian MacWhinney. Alzheimer’s dementia recognition through spontaneous speech, 2021

  14. [14]

    Linguistic features identify alzheimer’s disease in narrative speech.Journal of Alzheimer’s disease, 49(2):407–422, 2015

    Kathleen C Fraser, Jed A Meltzer, and Frank Rudzicz. Linguistic features identify alzheimer’s disease in narrative speech.Journal of Alzheimer’s disease, 49(2):407–422, 2015

  15. [15]

    Early- stage alzheimer’s disease prediction using machine learning models.Frontiers in public health, 10:853294, 2022

    C Kavitha, Vinodhini Mani, SR Srividhya, Osamah Ibrahim Khalaf, and Carlos Andres Tavera Romero. Early- stage alzheimer’s disease prediction using machine learning models.Frontiers in public health, 10:853294, 2022

  16. [16]

    Multistage classifier-based approach for alzheimer’s disease prediction and retrieval.Informatics in Medicine Unlocked, 14:34–42, 2019

    KR Kruthika, HD Maheshappa, Alzheimer’s Disease Neuroimaging Initiative, et al. Multistage classifier-based approach for alzheimer’s disease prediction and retrieval.Informatics in Medicine Unlocked, 14:34–42, 2019

  17. [17]

    Jacobs, Michael I

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

  18. [18]

    Jordan and R.A

    M.I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the em algorithm. InProceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), volume 2, pages 1339–1344 vol.2, 1993

  19. [19]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V . Le, Geoffrey E. Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

  20. [20]

    Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Z. Chen. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668, 2020

  21. [21]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

  22. [22]

    Damai Dai, Chengqi Deng, Chenggang Zhao, Runxin Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Yu Wu, Zhenda Xie, Y . K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. InAnnual Meeting of the Association for Computational L...

  23. [23]

    Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, De- vendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Tev...

  24. [24]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. InICLR 2022, April 2022

  25. [25]

    Relora: High-rank training through low-rank updates.arXiv preprint arXiv:2307.05695, 2023

    Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, and Anna Rumshisky. ReLoRA: High-Rank Training Through Low-Rank Updates.arXiv preprint arXiv:2307.05695, 2023

  26. [26]

    AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

    Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adalora: Adaptive budget allocation for parameter-efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023

  27. [27]

    Qlora: Efficient finetuning of quantized llms

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. InAdvances in Neural Information Processing Systems, volume 36, pages 10088–10115. Curran Associates, Inc., 2023

  28. [28]

    A kernel-based view of language model fine-tuning

    Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, and Sanjeev Arora. A kernel-based view of language model fine-tuning. InProceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 23610–23641. PMLR, 23–29 Jul 2023. 20

  29. [29]

    The expressive power of low-rank adaptation.arXiv preprint arXiv:2310.17513, 2024

    Yuchen Zeng and Kangwook Lee. The expressive power of low-rank adaptation.arXiv preprint arXiv:2310.17513, 2024

  30. [30]

    arXiv preprint arXiv:2307.13269 , year=

    Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, and Min Lin. Lorahub: Efficient cross-task generalization via dynamic lora composition.arXiv preprint arXiv:2307.13269, 2023

  31. [31]

    Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning.arXiv preprint arXiv:2309.05444,

    Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermi¸ s, Acyr Locatelli, and Sara Hooker. Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning.arXiv preprint arXiv:2309.05444, 2023

  32. [32]

    Mixture-of-LoRAs: An efficient multitask tuning method for large language models

    Wenfeng Feng, Chuzhan Hao, Yuewei Zhang, Yu Han, and Hao Wang. Mixture-of-LoRAs: An efficient multitask tuning method for large language models. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11371–11380, Torino, Italia, May 2024. ELRA and ICCL

  33. [33]

    Moral: Moe augmented lora for llms’ lifelong learning.arXiv preprint arXiv:2402.11260, 2024

    Shu Yang, Muhammad Asif Ali, Cheng-Long Wang, Lijie Hu, and Di Wang. Moral: Moe augmented lora for llms’ lifelong learning.arXiv preprint arXiv:2402.11260, 2024

  34. [34]

    LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin

    Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Wei Shen, Limao Xiong, Yuhao Zhou, Xiao Wang, Zhiheng Xi, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, and Xuanjing Huang. LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin. InProceedings of the 62nd Annual Meeting of the Association for Computa...

  35. [35]

    Kwok, and Yu Zhang

    Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James T. Kwok, and Yu Zhang. Mixture of cluster-conditional lora experts for vision-language instruction tuning.arXiv preprint arXiv:2312.12379, 2024

  36. [36]

    Moelora: An moe-based parameter efficient fine-tuning method for multi-task medical applications,

    Qidong Liu, Xian Wu, Xiangyu Zhao, Yuanshao Zhu, Derong Xu, Feng Tian, and Yefeng Zheng. When moe meets llms: Parameter efficient fine-tuning for multi-task medical applications.arXiv preprint arXiv:2310.18339, 2024

  37. [37]

    Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale

    Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yuxiong He. Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale. InICML 2022, January 2022

  38. [38]

    Unified scaling laws for routed language models

    Aidan Clark, Diego De Las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George Bm Van Den Driessche, Eliza Rutherford, Tom Hennigan, Matthew J Johnson, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Marc’Aureli...

  39. [39]

    Delving Deep into Rectifiers: Surpassing Human- Level Performance on ImageNet Classification

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human- Level Performance on ImageNet Classification . In2015 IEEE International Conference on Computer Vision (ICCV), pages 1026–1034, Los Alamitos, CA, USA, December 2015. IEEE Computer Society

  40. [40]

    Diagnosing alzheimer’s disease from on-line handwriting: A novel dataset and performance benchmarking.Engineering Applications of Artificial Intelligence, 111:104822, 2022

    Nicole D Cilia, Giuseppe De Gregorio, Claudio De Stefano, Francesco Fontanella, Angelo Marcelli, and An- tonio Parziale. Diagnosing alzheimer’s disease from on-line handwriting: A novel dataset and performance benchmarking.Engineering Applications of Artificial Intelligence, 111:104822, 2022. 21 A Results of Classification by Task This appendix provides d...