arxiv: 2605.04079 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.AI

Recognition: unknown

Efficient Handwriting-Based Alzheimer,s Disease Diagnosis Using a Low-Rank Mixture of Experts Deep Learning Framework

Wu Wang , Yuang Cheng , Fouzi Harrou , Ying Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Alzheimer's diseasehandwriting analysismixture of expertsLoRA adaptersdeep learningdigital biomarkersDARWIN datasetparameter efficiency

0 comments

The pith

A low-rank mixture of experts model diagnoses Alzheimer's from handwriting signals while activating far fewer parameters than standard neural networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning system that identifies Alzheimer's disease by analyzing handwriting patterns captured in the DARWIN dataset. It structures the model as multiple expert subnetworks that each focus on distinct writing features, all built on a shared base but kept lightweight through low-rank adapters. A reader would care because this setup promises accurate, non-invasive screening that runs efficiently enough for widespread use in digital health tools. Experiments compare the approach against plain multilayer perceptrons and full mixture-of-experts models, while testing variations in expert count, adapter rank, and ensemble stacking methods.

Core claim

The Low-Rank Mixture of Experts framework lets separate expert modules specialize in different handwriting patterns while sharing a common base network, with each expert fitted only with compact low-rank adapters instead of full parameters. On the DARWIN dataset this yields diagnostic performance that matches or exceeds multilayer perceptrons and conventional mixture-of-experts architectures, yet activates markedly fewer parameters during inference. Ablation studies on hidden size, expert count, and adapter rank, together with StackMean and StackMax ensemble variants, confirm improved training stability and robustness for Alzheimer's screening.

What carries the argument

The Low-Rank Mixture of Experts (LoRA-MoE) architecture, in which inputs are routed to specialized experts that apply low-rank adapters to a shared base network so that pattern-specific learning occurs with minimal added parameters.

If this is right

Experts can specialize on distinct handwriting features without mutual interference because they share the base network and use only low-rank updates.
Training stability increases and total trainable parameters drop compared with a full mixture-of-experts model.
Stacking ensembles built on the LoRA-MoE outputs further raise robustness without restoring the full parameter cost.
The resulting efficiency supports deployment in resource-limited digital-health screening applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lightweight-expert pattern could be applied to other motor or cognitive tasks that produce time-series signals, such as gait or speech recordings.
Because parameter activation is low, the model could run directly on smartphones or tablets for at-home collection and instant feedback.
Longitudinal handwriting data, if collected, would allow the framework to track progression rather than perform one-time classification.
Combining the handwriting pathway with brief cognitive tests inside the same app might raise overall screening sensitivity without adding clinical visits.

Load-bearing premise

Handwriting provides a reliable non-invasive signal of early cognitive-motor decline in Alzheimer's and the DARWIN dataset captures enough variation for the model to work on new patients.

What would settle it

A clear drop in diagnostic accuracy when the trained LoRA-MoE model is evaluated on an independent handwriting collection drawn from a different demographic or acquisition protocol than the DARWIN dataset.

Figures

Figures reproduced from arXiv: 2605.04079 by Fouzi Harrou, Wu Wang, Ying Sun, Yuang Cheng.

**Figure 2.** Figure 2: The end-to-end workflow of the proposed LoRA-MoE architecture for Alzheimer’s disease diagnosis using [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Handwriting data acquisition setup using a graphic tablet to capture pen movements in real time [40]. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of (a) training time, (b) base-learner accuracy distribution, and (c) efficiency index (perfor [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of (a) training time, (b) base-learner accuracy distribution, and (c) efficiency index (perfor [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of (a) training time, (b) base-learner accuracy distribution, and (c) efficiency index (perfor [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

read the original abstract

Early and reliable detection of Alzheimer's disease (AD) is crucial for timely clinical intervention and improved patient management. It also supports the evaluation of emerging therapeutic strategies. In this paper, we propose a Low-Rank Mixture of Experts (LoRA-MoE) deep learning framework for Alzheimer's disease diagnosis based on handwriting analysis. Handwriting signals provide a non-invasive and scalable digital biomarker that captures subtle cognitive-motor impairments associated with early AD progression. The proposed architecture allows multiple experts to specialize in different handwriting patterns while sharing a common base network. This design enables efficient learning of general representations while reducing interference between experts. Each expert is equipped with lightweight low-rank adapters. This mechanism significantly reduces the number of trainable parameters compared with standard Mixture of Experts (MoE) models and improves training stability. The proposed framework is evaluated on the Diagnosis AlzheimeR WIth haNdwriting (DARWIN) dataset. Extensive experiments are conducted, including ablation studies on key architectural parameters such as hidden dimension size, number of experts, and LoRA rank. The method is compared with multilayer perceptron (MLP) and conventional MoE architectures. In addition, stacking ensemble strategies (StackMean and StackMax) are investigated to improve robustness and predictive performance. Experimental results show that the LoRA-MoE framework achieves powerful diagnostic performance while activating significantly fewer parameters during inference. These results highlight the potential of the proposed approach as an accurate and computationally efficient solution for handwriting-based Alzheimer's disease screening and digital health applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward LoRA-MoE adaptation to the DARWIN handwriting dataset for AD detection, with sensible efficiency tweaks and ablations, but the abstract supplies no numbers to judge whether the performance claims hold.

read the letter

The main takeaway is that the authors apply low-rank adapters to a mixture-of-experts setup for classifying Alzheimer's from handwriting signals. The architecture lets experts specialize on different patterns while keeping a shared base and adding lightweight LoRA modules to cut parameters and improve stability. They run ablations on hidden size, expert count, and LoRA rank, compare against MLP and plain MoE, and test two stacking ensembles for extra robustness. That combination is a reasonable domain extension and the efficiency angle is practical for medical signal work. The paper does a clean job laying out the design choices and the comparison plan. The soft spot is the complete absence of concrete results in the summary. No accuracy, F1, parameter counts, or inference speed numbers appear, and the DARWIN dataset gets no sample size, demographics, split details, or external validation mention. Without those, the claim of powerful diagnostic performance with far fewer active parameters cannot be checked, and any generalization argument rests on an unexamined assumption about how representative the data are. If the full manuscript has the tables and stats, the evaluation section will decide how far the results travel. This kind of paper suits readers working on efficient models for digital biomarkers or early screening tools. It is not a foundational advance but gives a usable template for similar tasks. I would send it to peer review so referees can examine the actual numbers and dataset description; the core setup is clear enough to merit that step.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Low-Rank Mixture of Experts (LoRA-MoE) deep learning framework for Alzheimer's disease diagnosis from handwriting signals on the DARWIN dataset. Multiple experts with low-rank adapters specialize in different patterns while sharing a base network, reducing trainable parameters relative to standard MoE. The work includes ablations on hidden dimension size, number of experts, and LoRA rank; comparisons against MLP and conventional MoE; and stacking ensembles (StackMean, StackMax). The central claim is that LoRA-MoE delivers strong diagnostic performance while activating significantly fewer parameters at inference.

Significance. If the empirical results are robust and the DARWIN data support generalization, the parameter-efficient LoRA-MoE design could advance scalable, non-invasive digital biomarkers for early AD screening. The emphasis on training stability and reduced interference between experts addresses practical deployment constraints in digital health.

major comments (2)

[Abstract and Evaluation] Abstract and Evaluation section: The abstract states that the framework 'achieves powerful diagnostic performance' and activates 'significantly fewer parameters' yet supplies no numerical metrics (accuracy, AUC, F1, parameter counts, or inference FLOPs), no statistical tests, and no baseline numbers. Without these, the central performance claim cannot be verified or compared.
[Dataset and Experimental Setup] Dataset and Experimental Setup: No sample size, demographic breakdown (age, education, disease stage), class balance, or train/test split details are provided for the DARWIN dataset. This information is load-bearing for the generalizability claim, as the representativeness of the handwriting signals is the key assumption underlying the diagnostic performance results.

minor comments (2)

[Title] Title: 'Alzheimer,s' contains a typographical error and should read 'Alzheimer's'.
[Method] Notation: The description of 'low-rank adapters' and 'experts' would benefit from an explicit equation or diagram showing how the LoRA matrices are injected into the expert layers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have addressed each major comment point by point below. Revisions have been made to the abstract and experimental setup sections to improve transparency and verifiability of the results.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation section: The abstract states that the framework 'achieves powerful diagnostic performance' and activates 'significantly fewer parameters' yet supplies no numerical metrics (accuracy, AUC, F1, parameter counts, or inference FLOPs), no statistical tests, and no baseline numbers. Without these, the central performance claim cannot be verified or compared.

Authors: We agree that the abstract would benefit from explicit numerical support for the performance claims. In the revised manuscript, we have updated the abstract to include key quantitative results from our experiments (accuracy, AUC, F1-score, activated parameter counts at inference, and inference FLOPs) together with direct comparisons to the MLP and standard MoE baselines. We also reference the relevant tables and note that statistical significance testing (e.g., McNemar’s test for classification comparisons) is reported in the experimental section. revision: yes
Referee: [Dataset and Experimental Setup] Dataset and Experimental Setup: No sample size, demographic breakdown (age, education, disease stage), class balance, or train/test split details are provided for the DARWIN dataset. This information is load-bearing for the generalizability claim, as the representativeness of the handwriting signals is the key assumption underlying the diagnostic performance results.

Authors: We acknowledge that a fuller description of the DARWIN dataset is necessary for assessing generalizability. The revised manuscript now includes the total sample size, available demographic statistics (age, education level, and disease stage distribution), class balance between AD and control participants, and the precise train/validation/test split ratios employed in our experiments. These details are drawn from the dataset documentation and our experimental protocol and are presented in the Dataset and Experimental Setup section. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical ML evaluation on external dataset with no derivations

full rationale

The paper proposes a LoRA-MoE architecture for handwriting-based AD diagnosis and reports performance via standard training, ablations, and comparisons on the external DARWIN dataset. No equations, derivations, or first-principles predictions exist that could reduce to self-referential inputs. Central claims are data-driven empirical results, not tautological by construction. Self-citations are absent from load-bearing steps in the provided text.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The model depends on standard supervised deep-learning assumptions plus dataset-specific architectural choices; no new physical entities are postulated.

free parameters (3)

LoRA rank
Hyperparameter selected via ablation to trade off performance against parameter count.
number of experts
Architectural choice evaluated through ablation studies on the DARWIN dataset.
hidden dimension size
Model width tuned during experiments to balance capacity and efficiency.

axioms (2)

domain assumption Handwriting signals reliably encode early cognitive-motor changes due to Alzheimer's
Invoked in the abstract as the basis for using handwriting as a digital biomarker.
domain assumption The DARWIN dataset distribution is representative of real-world early AD cases
Implicit in the claim that experimental results support clinical screening potential.

pith-pipeline@v0.9.0 · 5573 in / 1353 out tokens · 39494 ms · 2026-05-10T15:39:41.466111+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 11 canonical work pages · 4 internal anchors

[1]

Nia-aa research framework: toward a biological definition of alzheimer’s disease.Alzheimer’s & dementia, 14(4):535–562, 2018

Clifford R Jack Jr, David A Bennett, Kaj Blennow, Maria C Carrillo, Billy Dunn, Samantha Budd Haeberlein, David M Holtzman, William Jagust, Frank Jessen, Jason Karlawish, et al. Nia-aa research framework: toward a biological definition of alzheimer’s disease.Alzheimer’s & dementia, 14(4):535–562, 2018

2018
[2]

Alzheimer’s disease.The Lancet, 397(10284):1577–1590, 2021

Philip Scheltens, Bart De Strooper, Miia Kivipelto, Henne Holstege, Gael Chételat, Charlotte E Teunissen, Jeffrey Cummings, and Wiesje M van der Flier. Alzheimer’s disease.The Lancet, 397(10284):1577–1590, 2021

2021
[3]

Clinical utility of cerebrospinal fluid biomarkers in the diagnosis of early alzheimer’s disease.Alzheimer’s & Dementia, 11(1):58–69, 2015

Kaj Blennow, Bruno Dubois, Anne M Fagan, Piotr Lewczuk, Mony J De Leon, and Harald Hampel. Clinical utility of cerebrospinal fluid biomarkers in the diagnosis of early alzheimer’s disease.Alzheimer’s & Dementia, 11(1):58–69, 2015

2015
[4]

The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception.Alzheimer’s & Dementia, 9(5):e111–e194, 2013

Michael W Weiner, Dallas P Veitch, Paul S Aisen, Laurel A Beckett, Nigel J Cairns, Robert C Green, Danielle Harvey, Clifford R Jack, William Jagust, Enchi Liu, et al. The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception.Alzheimer’s & Dementia, 9(5):e111–e194, 2013

2013
[5]

A survey on deep learning in medical image analysis.Medical image analysis, 42:60–88, 2017

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis.Medical image analysis, 42:60–88, 2017

2017
[6]

Stacked deep learning approach for efficient sars-cov-2 detection in blood samples.Artificial Intelligence in Medicine, 148:102767, 2024

Wu Wang, Fouzi Harrou, Abdelkader Dairi, and Ying Sun. Stacked deep learning approach for efficient sars-cov-2 detection in blood samples.Artificial Intelligence in Medicine, 148:102767, 2024

2024
[7]

Early detection of parkinson’s disease using deep learning and machine learning.IEEE access, 8:147635–147646, 2020

Wu Wang, Junho Lee, Fouzi Harrou, and Ying Sun. Early detection of parkinson’s disease using deep learning and machine learning.IEEE access, 8:147635–147646, 2020

2020
[8]

A manifold learning-based anomaly detection framework for cardiovascular disease diagnosis.Computational Intelligence, 41(5):e70130, 2025

Fouzi Harrou, Abdelkader Dairi, and Ying Sun. A manifold learning-based anomaly detection framework for cardiovascular disease diagnosis.Computational Intelligence, 41(5):e70130, 2025

2025
[9]

A machine learning model for early detection of diabetic foot using thermogram images.Computers in biology and medicine, 137:104838, 2021

Amith Khandakar, Muhammad EH Chowdhury, Mamun Bin Ibne Reaz, Sawal Hamid Md Ali, Md Anwarul Hasan, Serkan Kiranyaz, Tawsifur Rahman, Rashad Alfkey, Ahmad Ashrif A Bakar, and Rayaz A Malik. A machine learning model for early detection of diabetic foot using thermogram images.Computers in biology and medicine, 137:104838, 2021. 19

2021
[10]

A guide to deep learning in healthcare.Nature medicine, 25(1):24–29, 2019

Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, V olodymyr Kuleshov, Mark DePristo, Katherine Chou, Claire Cui, Greg Corrado, Sebastian Thrun, and Jeff Dean. A guide to deep learning in healthcare.Nature medicine, 25(1):24–29, 2019

2019
[11]

Dysgraphia in alzheimer’s disease: a review for clinical and research purposes.Journal of speech, Language, and Hearing research, 49(6):1313–1330, 2006

Jean Neils-Strunjas, Kathy Groves-Wright, Pauline Mashima, and Stacy Harnish. Dysgraphia in alzheimer’s disease: a review for clinical and research purposes.Journal of speech, Language, and Hearing research, 49(6):1313–1330, 2006

2006
[12]

Handwriting analysis to support neurodegenerative diseases diagnosis: A review.Pattern Recognition Letters, 121:37–45, 2019

Claudio De Stefano, Francesco Fontanella, Donato Impedovo, Giuseppe Pirlo, and Alessandra Scotto Di Freca. Handwriting analysis to support neurodegenerative diseases diagnosis: A review.Pattern Recognition Letters, 121:37–45, 2019

2019
[13]

Alzheimer’s dementia recognition through spontaneous speech, 2021

Saturnino Luz, Fasih Haider, Sofia de la Fuente Garcia, Davida Fromm, and Brian MacWhinney. Alzheimer’s dementia recognition through spontaneous speech, 2021

2021
[14]

Linguistic features identify alzheimer’s disease in narrative speech.Journal of Alzheimer’s disease, 49(2):407–422, 2015

Kathleen C Fraser, Jed A Meltzer, and Frank Rudzicz. Linguistic features identify alzheimer’s disease in narrative speech.Journal of Alzheimer’s disease, 49(2):407–422, 2015

2015
[15]

Early- stage alzheimer’s disease prediction using machine learning models.Frontiers in public health, 10:853294, 2022

C Kavitha, Vinodhini Mani, SR Srividhya, Osamah Ibrahim Khalaf, and Carlos Andres Tavera Romero. Early- stage alzheimer’s disease prediction using machine learning models.Frontiers in public health, 10:853294, 2022

2022
[16]

Multistage classifier-based approach for alzheimer’s disease prediction and retrieval.Informatics in Medicine Unlocked, 14:34–42, 2019

KR Kruthika, HD Maheshappa, Alzheimer’s Disease Neuroimaging Initiative, et al. Multistage classifier-based approach for alzheimer’s disease prediction and retrieval.Informatics in Medicine Unlocked, 14:34–42, 2019

2019
[17]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

1991
[18]

Jordan and R.A

M.I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the em algorithm. InProceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), volume 2, pages 1339–1344 vol.2, 1993

1993
[19]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V . Le, Geoffrey E. Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Z. Chen. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668, 2020

work page internal anchor Pith review arXiv 2006
[21]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

2022
[22]

Damai Dai, Chengqi Deng, Chenggang Zhao, Runxin Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Yu Wu, Zhenda Xie, Y . K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. InAnnual Meeting of the Association for Computational L...

2024
[23]

Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, De- vendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Tev...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. InICLR 2022, April 2022

2022
[25]

Relora: High-rank training through low-rank updates.arXiv preprint arXiv:2307.05695, 2023

Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, and Anna Rumshisky. ReLoRA: High-Rank Training Through Low-Rank Updates.arXiv preprint arXiv:2307.05695, 2023

work page arXiv 2023
[26]

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adalora: Adaptive budget allocation for parameter-efficient fine-tuning.arXiv preprint arXiv:2303.10512, 2023

work page internal anchor Pith review arXiv 2023
[27]

Qlora: Efficient finetuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. InAdvances in Neural Information Processing Systems, volume 36, pages 10088–10115. Curran Associates, Inc., 2023

2023
[28]

A kernel-based view of language model fine-tuning

Sadhika Malladi, Alexander Wettig, Dingli Yu, Danqi Chen, and Sanjeev Arora. A kernel-based view of language model fine-tuning. InProceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 23610–23641. PMLR, 23–29 Jul 2023. 20

2023
[29]

The expressive power of low-rank adaptation.arXiv preprint arXiv:2310.17513, 2024

Yuchen Zeng and Kangwook Lee. The expressive power of low-rank adaptation.arXiv preprint arXiv:2310.17513, 2024

work page arXiv 2024
[30]

arXiv preprint arXiv:2307.13269 , year=

Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, and Min Lin. Lorahub: Efficient cross-task generalization via dynamic lora composition.arXiv preprint arXiv:2307.13269, 2023

work page arXiv 2023
[31]

Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning.arXiv preprint arXiv:2309.05444,

Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermi¸ s, Acyr Locatelli, and Sara Hooker. Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning.arXiv preprint arXiv:2309.05444, 2023

work page arXiv 2023
[32]

Mixture-of-LoRAs: An efficient multitask tuning method for large language models

Wenfeng Feng, Chuzhan Hao, Yuewei Zhang, Yu Han, and Hao Wang. Mixture-of-LoRAs: An efficient multitask tuning method for large language models. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11371–11380, Torino, Italia, May 2024. ELRA and ICCL

2024
[33]

Moral: Moe augmented lora for llms’ lifelong learning.arXiv preprint arXiv:2402.11260, 2024

Shu Yang, Muhammad Asif Ali, Cheng-Long Wang, Lijie Hu, and Di Wang. Moral: Moe augmented lora for llms’ lifelong learning.arXiv preprint arXiv:2402.11260, 2024

work page arXiv 2024
[34]

LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin

Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Wei Shen, Limao Xiong, Yuhao Zhou, Xiao Wang, Zhiheng Xi, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, and Xuanjing Huang. LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin. InProceedings of the 62nd Annual Meeting of the Association for Computa...

1932
[35]

Kwok, and Yu Zhang

Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James T. Kwok, and Yu Zhang. Mixture of cluster-conditional lora experts for vision-language instruction tuning.arXiv preprint arXiv:2312.12379, 2024

work page arXiv 2024
[36]

Moelora: An moe-based parameter efficient fine-tuning method for multi-task medical applications,

Qidong Liu, Xian Wu, Xiangyu Zhao, Yuanshao Zhu, Derong Xu, Feng Tian, and Yefeng Zheng. When moe meets llms: Parameter efficient fine-tuning for multi-task medical applications.arXiv preprint arXiv:2310.18339, 2024

work page arXiv 2024
[37]

Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale

Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yuxiong He. Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale. InICML 2022, January 2022

2022
[38]

Unified scaling laws for routed language models

Aidan Clark, Diego De Las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George Bm Van Den Driessche, Eliza Rutherford, Tom Hennigan, Matthew J Johnson, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Marc’Aureli...

2022
[39]

Delving Deep into Rectifiers: Surpassing Human- Level Performance on ImageNet Classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human- Level Performance on ImageNet Classification . In2015 IEEE International Conference on Computer Vision (ICCV), pages 1026–1034, Los Alamitos, CA, USA, December 2015. IEEE Computer Society

2015
[40]

Diagnosing alzheimer’s disease from on-line handwriting: A novel dataset and performance benchmarking.Engineering Applications of Artificial Intelligence, 111:104822, 2022

Nicole D Cilia, Giuseppe De Gregorio, Claudio De Stefano, Francesco Fontanella, Angelo Marcelli, and An- tonio Parziale. Diagnosing alzheimer’s disease from on-line handwriting: A novel dataset and performance benchmarking.Engineering Applications of Artificial Intelligence, 111:104822, 2022. 21 A Results of Classification by Task This appendix provides d...

2022