Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

Liwen Chen; Long Feng; Shuaida He

arxiv: 2605.21217 · v1 · pith:X74GBTCKnew · submitted 2026-05-20 · 📊 stat.ML · cs.LG

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

Shuaida He , Liwen Chen , Long Feng This is my paper

Pith reviewed 2026-05-21 01:43 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords federated learningLoRAlow-rank adaptationcontamination detectioncollaborative fine-tuningLLMsubspace recoveryrobust estimation

0 comments

The pith

A low-rank plus block-sparse decomposition recovers the shared LoRA subspace across federated clients and identifies contaminated ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that federated LoRA fine-tuning of LLMs can align heterogeneous clients to a common subspace even when only partial structure is shared and a subset of clients may be contaminated. It does so by casting the collection of preliminary local LoRA matrices into a structured low-rank plus block-sparse decomposition that isolates the shared adaptation directions and flags the outliers. A sympathetic reader would care because the resulting cross-client averaging within the recovered subspace reduces off-subspace error while retaining client-specific variation, offering a concrete performance gain over purely local fine-tuning whenever that gain exceeds the cost of estimating the subspace. The analysis supplies exact recovery guarantees in the noiseless setting, stable recovery under preliminary estimation error, and consistent detection of the collaborative set under mild separation conditions.

Core claim

CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity.

What carries the argument

The structured low-rank plus block-sparse decomposition of the matrix whose columns are the preliminary local LoRA updates, which separates the shared subspace from client-specific and contamination blocks.

If this is right

Exact recovery of the shared LoRA subspace holds in the noiseless case.
Stable recovery of the subspace is obtained when preliminary local estimators contain bounded error.
Consistent identification of the collaborative set of benign clients occurs under mild separation conditions.
Off-subspace estimation error is reduced by averaging the aligned components across benign clients.
Fine-tuning performance for benign clients exceeds that of independent local training when the alignment benefit outweighs subspace estimation and heterogeneity costs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition could be applied to other matrix-valued parameter updates such as adapter layers or prompt-tuning matrices in federated settings.
The approach suggests a practical way to harden federated LLM training against a moderate fraction of faulty or adversarial clients without requiring data sharing.
Empirical verification on larger models and more diverse tasks would test whether the quantified oracle gain remains positive in high-dimensional regimes.
The framework may extend naturally to sequential or continual federated adaptation where the shared subspace evolves over time.

Load-bearing premise

Clients share a partial low-rank structure in their LoRA updates that remains separable from contaminated clients under mild separation conditions.

What would settle it

A controlled experiment in which the recovered subspace deviates substantially from the known ground-truth shared subspace once contamination is introduced at levels that still satisfy the stated separation conditions.

Figures

Figures reproduced from arXiv: 2605.21217 by Liwen Chen, Long Feng, Shuaida He.

**Figure 1.** Figure 1: Estimation error of 𝑃Ab compared to 𝐾 across (𝑝, 𝑞, 𝑛) regimes. 8 Experiments on sequence copying tasks We conduct a federated LoRA fine-tuning experiment with Transformer models, assigning each client a copying-based reasoning task following [29]. This controlled design allows client heterogeneity, low-rank adaptation, and client-level contamination to be directly examined. The experiment evaluates wheth… view at source ↗

**Figure 2.** Figure 2: Homogeneous copying experiment evaluated on the common copying task. The top panel reports client-level masked next-token accuracy averaged over 100 replicates; the bottom panel reports relative accuracy change with respect to local LoRA fine-tuning. 9 Summary We develop CLAIR as a contamination-aware framework for collaborative federated LoRA finetuning when the common backbone is unknown and only prelim… view at source ↗

**Figure 3.** Figure 3: Heterogeneous copying experiment evaluated on client-specific tasks. The top panel reports client-level masked next-token accuracy averaged over 100 replicates; the bottom panel reports relative accuracy change with respect to local LoRA fine-tuning. optimization error, nonlinear representation drift, and module-wise interactions. These problems warrant further investigation. 27 [PITH_FULL_IMAGE:figures/f… view at source ↗

read the original abstract

Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CLAIR adds a low-rank plus block-sparse decomposition for contamination handling in federated LoRA, but the collaborative-set recovery still depends on unverified separation conditions.

read the letter

CLAIR adds a low-rank plus block-sparse decomposition for contamination handling in federated LoRA, but the collaborative-set recovery still depends on unverified separation conditions. The paper proves exact recovery of the shared subspace in the noiseless case and stable recovery when preliminary estimators have some error. It also claims consistent detection of the good clients under mild separation between the shared block and the contaminated blocks. The refinement step uses cross-client averaging to cut off-subspace error while keeping client-specific variation inside the subspace, which is a straightforward way to beat pure local fine-tuning when the gain exceeds the estimation cost. The formulation is written to cover any matrix-valued local updates, so it reaches beyond linear regression to Transformer modules. The text-copying experiment shows accurate detection and better benign-client performance than local fine-tuning or plain federated averaging. The soft spot is the detection guarantee. It explicitly needs mild separation conditions, yet the abstract gives no numbers on the required gap or angle, and the single experiment does not check whether the observed updates actually meet them. In the LLM regime the paper targets, client heterogeneity can easily produce off-subspace components that sit too close to the shared subspace, so the condition may not hold even when partial structure is present. This paper is for people working on robust federated fine-tuning of large models who already know LoRA and FedAvg. A reader who wants theoretical recovery results for contaminated distributed training could pull something useful from the proofs. It deserves a serious referee because the problem is practical, the decomposition is new in this setting, and there is both a proof sketch and an empirical check, even if the separation assumption needs closer examination during review.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces CLAIR, a contamination-aware framework for federated LoRA fine-tuning of LLMs in highly heterogeneous regimes where clients share only partial structure. It recovers the shared LoRA subspace and detects contaminated clients using a structured low-rank plus block-sparse decomposition applied to preliminary local estimators. Theoretical results claim exact recovery of the shared subspace in the noiseless case, stable recovery under estimation error, and consistent collaborative-set recovery under mild separation conditions. The work quantifies the gain from cross-client averaging in reducing off-subspace error while preserving client-specific variation, and demonstrates benefits on a Transformer text-copying task with accurate detection and improved benign-client performance over local fine-tuning and non-robust federated averaging.

Significance. If the recovery guarantees hold under the stated assumptions, the framework offers a principled way to achieve robust collaborative fine-tuning for LLMs while handling contamination, which is a practically relevant advance over standard federated averaging or purely local adaptation. The explicit separation of shared subspace recovery from client-specific components and the quantification of oracle gains versus estimation costs provide a clear analytical lens. The theoretical proofs for exact and stable recovery constitute a strength, though the empirical evaluation is limited to a single synthetic task.

major comments (1)

[Abstract] Abstract and theoretical recovery section: the central claim of consistent collaborative-set recovery under mild separation conditions is load-bearing, yet the manuscript provides no explicit quantification of the required separation (e.g., a minimum principal angle between the shared subspace and contaminated blocks or a Frobenius-norm gap threshold). In the LLM regime with high client heterogeneity, off-subspace components of local updates may violate this condition even when partial structure holds, undermining the detection guarantee.

minor comments (2)

The description of how preliminary local estimators are computed (mentioned as the only required input) would benefit from a short algorithmic outline or pseudocode early in the paper to clarify the end-to-end procedure.
The empirical section reports improved performance on the text-copying task but does not include an ablation or diagnostic verifying that the observed client updates satisfy the separation condition used in the consistency proof.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive summary and for identifying this key point for clarification. We respond to the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and theoretical recovery section: the central claim of consistent collaborative-set recovery under mild separation conditions is load-bearing, yet the manuscript provides no explicit quantification of the required separation (e.g., a minimum principal angle between the shared subspace and contaminated blocks or a Frobenius-norm gap threshold). In the LLM regime with high client heterogeneity, off-subspace components of local updates may violate this condition even when partial structure holds, undermining the detection guarantee.

Authors: We agree that the separation condition would benefit from more explicit quantification to strengthen the claims, particularly for the LLM setting. In the current manuscript, the condition is described as 'mild' to indicate that it permits a range of heterogeneity while ensuring the low-rank plus block-sparse decomposition uniquely identifies the shared subspace. To address this, we will revise the manuscript to include a precise statement of the separation requirement, such as a lower bound on the principal angle between the shared subspace and the contaminated directions, or an equivalent Frobenius norm gap. This will be added to the abstract and elaborated in the theoretical recovery section, along with a brief discussion of its implications for high-heterogeneity regimes. We believe this will clarify that the guarantee holds under the partial structure assumed in the problem setup. revision: yes

Circularity Check

0 steps flagged

No circularity: recovery guarantees derived from explicit decomposition and separation assumptions

full rationale

The paper's central claims consist of exact recovery, stable recovery, and consistent collaborative-set recovery proved from a low-rank plus block-sparse decomposition together with explicitly stated noiseless, preliminary-error, and mild-separation conditions. These are modeling assumptions and theorem hypotheses, not quantities fitted from the target data or imported via self-citation chains. No step renames a fitted parameter as a prediction, defines the target subspace in terms of the recovery result, or relies on an unverified uniqueness theorem from the authors' prior work. The empirical Transformer experiment is presented separately and does not enter the proofs. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on domain assumptions about partial client structure and matrix-valued local updates plus the introduction of a new decomposition technique; no explicit free parameters or invented physical entities are stated.

axioms (2)

domain assumption Clients share only partial structure and a substantial subset may be contaminated.
Defines the highly heterogeneous regime targeted by CLAIR.
domain assumption Local adaptation can be represented by matrix-valued updates.
Enables the formulation to cover linear regression through LLM modules.

invented entities (1)

structured low-rank plus block-sparse decomposition no independent evidence
purpose: To recover shared LoRA subspace and identify contaminated clients
Core technical device introduced by the paper for the federated setting.

pith-pipeline@v0.9.0 · 5786 in / 1439 out tokens · 100090 ms · 2026-05-21T01:43:22.984861+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

First-order methods in optimization

Amir Beck. First-order methods in optimization . SIAM, 2017

work page 2017
[2]

Lora-fair: Federated lora fine-tuning with aggregation and initialization refinement

Jieming Bian, Lei Wang, Letian Zhang, and Jie Xu. Lora-fair: Federated lora fine-tuning with aggregation and initialization refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 3737–3746, 2025

work page 2025
[3]

Domain separation networks

Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. Domain separation networks. Advances in neural information processing systems , 29, 2016

work page 2016
[4]

Exact matrix completion via convex optimization

Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM , 55(6):111–119, 2012

work page 2012
[5]

Robust principal component analysis? Journal of the ACM (JACM) , 58(3):11, 2011

Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM) , 58(3):11, 2011

work page 2011
[6]

Rank-sparsity incoherence for matrix decomposition

Venkat Chandrasekaran, Sujay Sanghavi, Pablo A Parrilo, and Alan S Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization , 21(2):572–596, 2011

work page 2011
[7]

Minimax estimation for personalized federated learning: an alternative between fedavg and local training? Journal of Machine Learning Research, 24(262):1–59, 2023

Shuxiao Chen, Qinqing Zheng, Qi Long, and Weijie J Su. Minimax estimation for personalized federated learning: an alternative between fedavg and local training? Journal of Machine Learning Research, 24(262):1–59, 2023

work page 2023
[8]

How fine-tuning allows for effective meta-learning

Kurtland Chua, Qi Lei, and Jason D Lee. How fine-tuning allows for effective meta-learning. Advances in Neural Information Processing Systems , 34:8871–8884, 2021

work page 2021
[9]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Optimal estimation of high- dimensional gaussian location mixtures

Natalie Doss, Yihong Wu, Pengkun Yang, and Harrison H Zhou. Optimal estimation of high- dimensional gaussian location mixtures. The Annals of Statistics , 51(1):62–95, 2023

work page 2023
[11]

Adaptive and robust multi-task learning

Yaqi Duan and Kaizheng Wang. Adaptive and robust multi-task learning. The Annals of Statistics, 51(5):2015–2039, 2023

work page 2015
[12]

Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach

Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. Advances in neural infor- mation processing systems , 33:3557–3568, 2020

work page 2020
[13]

Projected robust pca with application to smooth image recovery

Long Feng and Junhui Wang. Projected robust pca with application to smooth image recovery. Journal of Machine Learning Research , 23(249):1–41, 2022. 73

work page 2022
[14]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Robust angle-based transfer learning in high dimensions

Tian Gu, Yi Han, and Rui Duan. Robust angle-based transfer learning in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology , 87(3):723–745, 2025

work page 2025
[16]

Selective aggregation for low-rank adaptation in federated learning

Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang, and Liangqiong Qu. Selective aggregation for low-rank adaptation in federated learning. In 13th International Conference on Learning Representations Iclr 2025 , 2025

work page 2025
[17]

Robust inference for federated meta-learning

Zijian Guo, Xiudi Li, Larry Han, and Tianxi Cai. Robust inference for federated meta-learning. Journal of the American Statistical Association , 120(551):1695–1710, 2025

work page 2025
[18]

Parameter-eﬀicient transfer learning for nlp

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-eﬀicient transfer learning for nlp. In International conference on machine learning , pages 2790–2799. PMLR, 2019

work page 2019
[19]

Robust matrix decomposition with sparse corruptions

Daniel Hsu, Sham M Kakade, and Tong Zhang. Robust matrix decomposition with sparse corruptions. IEEE Transactions on Information Theory , 57(11):7221–7234, 2011

work page 2011
[20]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 (2):3, 2022

work page 2022
[21]

An overview of large language models for statisticians

Wenlong Ji, Weizhe Yuan, Emily Getzen, Kyunghyun Cho, Michael I Jordan, Song Mei, Jason Weston, Weijie J Su, Jing Xu, and Linjun Zhang. An overview of large language models for statisticians. The American Statistician , (just-accepted):1–106, 2026

work page 2026
[22]

Structured matrix estima- tion and completion

Olga Klopp, Yu Lu, Alexandre B Tsybakov, and Harrison H Zhou. Structured matrix estima- tion and completion. Bernoulli, 25(4B):3883–3911, 2019

work page 2019
[23]

The impact of adversarial attacks on federated learning: A survey

Kummari Naveen Kumar, Chalavadi Krishna Mohan, and Linga Reddy Cenkeramaddi. The impact of adversarial attacks on federated learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence , 46(5):2672–2691, 2023

work page 2023
[24]

Lora subtraction for drift-resistant space in exemplar-free continual learning

Xuan Liu and Xiaobin Chang. Lora subtraction for drift-resistant space in exemplar-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15308–15318, 2025

work page 2025
[25]

Communication-eﬀicient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Ar- cas. Communication-eﬀicient learning of deep networks from decentralized data. In Artificial intelligence and statistics , pages 1273–1282. PMLR, 2017. 74

work page 2017
[26]

Collaborative learning with shared linear representations: Statistical rates and optimal algorithms

Xiaochun Niu, Lili Su, Jiaming Xu, and Pengkun Yang. Collaborative learning with shared linear representations: Statistical rates and optimal algorithms. In International Workshop on Federated Foundation Models in Conjunction with NeurIPS 2024 , 2024

work page 2024
[27]

Implicit regularization of gradient flow on one-layer softmax attention

Heejune Sheen, Siyu Chen, Tianhao Wang, and Harrison H Zhou. Implicit regularization of gradient flow on one-layer softmax attention. arXiv preprint arXiv:2403.08699 , 2024

work page arXiv 2024
[28]

Federated multi- task learning

Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. Federated multi- task learning. Advances in neural information processing systems , 30, 2017

work page 2017
[29]

Out-of-distribution generalization via composi- tion: a lens through induction heads in transformers

Jiajun Song, Zhuoyan Xu, and Yiqiao Zhong. Out-of-distribution generalization via composi- tion: a lens through induction heads in transformers. Proceedings of the National Academy of Sciences, 122(6):e2417182122, 2025

work page 2025
[30]

Improving lora in privacy-preserving federated learning

Youbang Sun, Zitao Li, Yaliang Li, and Bolin Ding. Improving lora in privacy-preserving federated learning. In The Twelfth International Conference on Learning Representations

work page
[31]

Hydralora: An asymmetric lora architecture for eﬀicient fine-tuning

Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Chengzhong Xu. Hydralora: An asymmetric lora architecture for eﬀicient fine-tuning. Advances in Neural Information Processing Systems , 37:9565–9584, 2024

work page 2024
[32]

Learning from similar linear representations: Adaptivity, minimaxity, and robustness

Ye Tian, Yuqi Gu, and Yang Feng. Learning from similar linear representations: Adaptivity, minimaxity, and robustness. Journal of Machine Learning Research , 26(187):1–125, 2025

work page 2025
[33]

Adversarial discriminative do- main adaptation

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative do- main adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7167–7176, 2017

work page 2017
[34]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

work page 2017
[35]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018
[36]

Adaptive lora experts allocation and selec- tion for federated fine-tuning

Lei Wang, Jieming Bian, Letian Zhang, and Jie Xu. Adaptive lora experts allocation and selec- tion for federated fine-tuning. In The Thirty-ninth Annual Conference on Neural Information Processing Systems

work page
[37]

Trans-lora: towards data-free transferable parameter eﬀicient finetuning

Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, and Leonid Karlinsky. Trans-lora: towards data-free transferable parameter eﬀicient finetuning. Advances in Neural Information Processing Systems , 37:61217–61237, 2024

work page 2024
[38]

Flora: Federated fine-tuning large language models with heterogeneous low-rank adaptations

Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, and Ang Li. Flora: Federated fine-tuning large language models with heterogeneous low-rank adaptations. Advances in Neural Information Processing Systems , 37:22513–22533, 2024. 75

work page 2024
[39]

A brief overview of chatgpt: The history, status quo and potential future development

Tianyu Wu, Shizhu He, Jingping Liu, Siqi Sun, Kang Liu, Qing-Long Han, and Yang Tang. A brief overview of chatgpt: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica , 10(5):1122–1136, 2023

work page 2023
[40]

Robust pca via outlier pursuit

Huan Xu, Constantine CARAMANIS, and Sujay SANGHA VI. Robust pca via outlier pursuit. IEEE transactions on information theory , 58(5):3047–3064, 2012

work page 2012
[41]

Parameter-eﬀicient fine-tuning methods for pretrained language models: A critical review and assessment

Lingling Xu, Haoran Xie, S Joe Qin, Xiaohui Tao, and Fu Lee Wang. Parameter-eﬀicient fine-tuning methods for pretrained language models: A critical review and assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2026

work page 2026
[42]

Dimension reduction and coeﬀicient estimation in multivariate linear regression

Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coeﬀicient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(3):329–346, 2007

work page 2007
[43]

Towards building the federatedgpt: Federated instruction tuning

Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Guoyin Wang, and Yiran Chen. Towards building the federatedgpt: Federated instruction tuning. In ICASSP 2024-2024 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6915–6919. IEEE, 2024

work page 2024
[44]

Tensor regression with applications in neuroimaging data analysis

Hua Zhou, Lexin Li, and Hongtu Zhu. Tensor regression with applications in neuroimaging data analysis. Journal of the American Statistical Association , 108(502):540–552, 2013. 76

work page 2013

[1] [1]

First-order methods in optimization

Amir Beck. First-order methods in optimization . SIAM, 2017

work page 2017

[2] [2]

Lora-fair: Federated lora fine-tuning with aggregation and initialization refinement

Jieming Bian, Lei Wang, Letian Zhang, and Jie Xu. Lora-fair: Federated lora fine-tuning with aggregation and initialization refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 3737–3746, 2025

work page 2025

[3] [3]

Domain separation networks

Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. Domain separation networks. Advances in neural information processing systems , 29, 2016

work page 2016

[4] [4]

Exact matrix completion via convex optimization

Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM , 55(6):111–119, 2012

work page 2012

[5] [5]

Robust principal component analysis? Journal of the ACM (JACM) , 58(3):11, 2011

Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM) , 58(3):11, 2011

work page 2011

[6] [6]

Rank-sparsity incoherence for matrix decomposition

Venkat Chandrasekaran, Sujay Sanghavi, Pablo A Parrilo, and Alan S Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization , 21(2):572–596, 2011

work page 2011

[7] [7]

Minimax estimation for personalized federated learning: an alternative between fedavg and local training? Journal of Machine Learning Research, 24(262):1–59, 2023

Shuxiao Chen, Qinqing Zheng, Qi Long, and Weijie J Su. Minimax estimation for personalized federated learning: an alternative between fedavg and local training? Journal of Machine Learning Research, 24(262):1–59, 2023

work page 2023

[8] [8]

How fine-tuning allows for effective meta-learning

Kurtland Chua, Qi Lei, and Jason D Lee. How fine-tuning allows for effective meta-learning. Advances in Neural Information Processing Systems , 34:8871–8884, 2021

work page 2021

[9] [9]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Optimal estimation of high- dimensional gaussian location mixtures

Natalie Doss, Yihong Wu, Pengkun Yang, and Harrison H Zhou. Optimal estimation of high- dimensional gaussian location mixtures. The Annals of Statistics , 51(1):62–95, 2023

work page 2023

[11] [11]

Adaptive and robust multi-task learning

Yaqi Duan and Kaizheng Wang. Adaptive and robust multi-task learning. The Annals of Statistics, 51(5):2015–2039, 2023

work page 2015

[12] [12]

Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach

Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. Advances in neural infor- mation processing systems , 33:3557–3568, 2020

work page 2020

[13] [13]

Projected robust pca with application to smooth image recovery

Long Feng and Junhui Wang. Projected robust pca with application to smooth image recovery. Journal of Machine Learning Research , 23(249):1–41, 2022. 73

work page 2022

[14] [14]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Robust angle-based transfer learning in high dimensions

Tian Gu, Yi Han, and Rui Duan. Robust angle-based transfer learning in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology , 87(3):723–745, 2025

work page 2025

[16] [16]

Selective aggregation for low-rank adaptation in federated learning

Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang, and Liangqiong Qu. Selective aggregation for low-rank adaptation in federated learning. In 13th International Conference on Learning Representations Iclr 2025 , 2025

work page 2025

[17] [17]

Robust inference for federated meta-learning

Zijian Guo, Xiudi Li, Larry Han, and Tianxi Cai. Robust inference for federated meta-learning. Journal of the American Statistical Association , 120(551):1695–1710, 2025

work page 2025

[18] [18]

Parameter-eﬀicient transfer learning for nlp

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-eﬀicient transfer learning for nlp. In International conference on machine learning , pages 2790–2799. PMLR, 2019

work page 2019

[19] [19]

Robust matrix decomposition with sparse corruptions

Daniel Hsu, Sham M Kakade, and Tong Zhang. Robust matrix decomposition with sparse corruptions. IEEE Transactions on Information Theory , 57(11):7221–7234, 2011

work page 2011

[20] [20]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 (2):3, 2022

work page 2022

[21] [21]

An overview of large language models for statisticians

Wenlong Ji, Weizhe Yuan, Emily Getzen, Kyunghyun Cho, Michael I Jordan, Song Mei, Jason Weston, Weijie J Su, Jing Xu, and Linjun Zhang. An overview of large language models for statisticians. The American Statistician , (just-accepted):1–106, 2026

work page 2026

[22] [22]

Structured matrix estima- tion and completion

Olga Klopp, Yu Lu, Alexandre B Tsybakov, and Harrison H Zhou. Structured matrix estima- tion and completion. Bernoulli, 25(4B):3883–3911, 2019

work page 2019

[23] [23]

The impact of adversarial attacks on federated learning: A survey

Kummari Naveen Kumar, Chalavadi Krishna Mohan, and Linga Reddy Cenkeramaddi. The impact of adversarial attacks on federated learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence , 46(5):2672–2691, 2023

work page 2023

[24] [24]

Lora subtraction for drift-resistant space in exemplar-free continual learning

Xuan Liu and Xiaobin Chang. Lora subtraction for drift-resistant space in exemplar-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15308–15318, 2025

work page 2025

[25] [25]

Communication-eﬀicient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Ar- cas. Communication-eﬀicient learning of deep networks from decentralized data. In Artificial intelligence and statistics , pages 1273–1282. PMLR, 2017. 74

work page 2017

[26] [26]

Collaborative learning with shared linear representations: Statistical rates and optimal algorithms

Xiaochun Niu, Lili Su, Jiaming Xu, and Pengkun Yang. Collaborative learning with shared linear representations: Statistical rates and optimal algorithms. In International Workshop on Federated Foundation Models in Conjunction with NeurIPS 2024 , 2024

work page 2024

[27] [27]

Implicit regularization of gradient flow on one-layer softmax attention

Heejune Sheen, Siyu Chen, Tianhao Wang, and Harrison H Zhou. Implicit regularization of gradient flow on one-layer softmax attention. arXiv preprint arXiv:2403.08699 , 2024

work page arXiv 2024

[28] [28]

Federated multi- task learning

Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. Federated multi- task learning. Advances in neural information processing systems , 30, 2017

work page 2017

[29] [29]

Out-of-distribution generalization via composi- tion: a lens through induction heads in transformers

Jiajun Song, Zhuoyan Xu, and Yiqiao Zhong. Out-of-distribution generalization via composi- tion: a lens through induction heads in transformers. Proceedings of the National Academy of Sciences, 122(6):e2417182122, 2025

work page 2025

[30] [30]

Improving lora in privacy-preserving federated learning

Youbang Sun, Zitao Li, Yaliang Li, and Bolin Ding. Improving lora in privacy-preserving federated learning. In The Twelfth International Conference on Learning Representations

work page

[31] [31]

Hydralora: An asymmetric lora architecture for eﬀicient fine-tuning

Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Chengzhong Xu. Hydralora: An asymmetric lora architecture for eﬀicient fine-tuning. Advances in Neural Information Processing Systems , 37:9565–9584, 2024

work page 2024

[32] [32]

Learning from similar linear representations: Adaptivity, minimaxity, and robustness

Ye Tian, Yuqi Gu, and Yang Feng. Learning from similar linear representations: Adaptivity, minimaxity, and robustness. Journal of Machine Learning Research , 26(187):1–125, 2025

work page 2025

[33] [33]

Adversarial discriminative do- main adaptation

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative do- main adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7167–7176, 2017

work page 2017

[34] [34]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

work page 2017

[35] [35]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018

[36] [36]

Adaptive lora experts allocation and selec- tion for federated fine-tuning

Lei Wang, Jieming Bian, Letian Zhang, and Jie Xu. Adaptive lora experts allocation and selec- tion for federated fine-tuning. In The Thirty-ninth Annual Conference on Neural Information Processing Systems

work page

[37] [37]

Trans-lora: towards data-free transferable parameter eﬀicient finetuning

Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, and Leonid Karlinsky. Trans-lora: towards data-free transferable parameter eﬀicient finetuning. Advances in Neural Information Processing Systems , 37:61217–61237, 2024

work page 2024

[38] [38]

Flora: Federated fine-tuning large language models with heterogeneous low-rank adaptations

Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, and Ang Li. Flora: Federated fine-tuning large language models with heterogeneous low-rank adaptations. Advances in Neural Information Processing Systems , 37:22513–22533, 2024. 75

work page 2024

[39] [39]

A brief overview of chatgpt: The history, status quo and potential future development

Tianyu Wu, Shizhu He, Jingping Liu, Siqi Sun, Kang Liu, Qing-Long Han, and Yang Tang. A brief overview of chatgpt: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica , 10(5):1122–1136, 2023

work page 2023

[40] [40]

Robust pca via outlier pursuit

Huan Xu, Constantine CARAMANIS, and Sujay SANGHA VI. Robust pca via outlier pursuit. IEEE transactions on information theory , 58(5):3047–3064, 2012

work page 2012

[41] [41]

Parameter-eﬀicient fine-tuning methods for pretrained language models: A critical review and assessment

Lingling Xu, Haoran Xie, S Joe Qin, Xiaohui Tao, and Fu Lee Wang. Parameter-eﬀicient fine-tuning methods for pretrained language models: A critical review and assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2026

work page 2026

[42] [42]

Dimension reduction and coeﬀicient estimation in multivariate linear regression

Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coeﬀicient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(3):329–346, 2007

work page 2007

[43] [43]

Towards building the federatedgpt: Federated instruction tuning

Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Guoyin Wang, and Yiran Chen. Towards building the federatedgpt: Federated instruction tuning. In ICASSP 2024-2024 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6915–6919. IEEE, 2024

work page 2024

[44] [44]

Tensor regression with applications in neuroimaging data analysis

Hua Zhou, Lexin Li, and Hongtu Zhu. Tensor regression with applications in neuroimaging data analysis. Journal of the American Statistical Association , 108(502):540–552, 2013. 76

work page 2013