arxiv: 2604.06819 · v1 · submitted 2026-04-08 · 💻 cs.DC

Recognition: no theorem link

Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge

Yebo Wu , Jingguang Li , Chunlin Tian , Kahou Tam , Zhijiang Guo , Li Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:24 UTC · model grok-4.3

classification 💻 cs.DC

keywords federated fine-tuningLLM adaptationedge computingchain optimizationsequential trainingprivacy preservationmemory efficiencyadapter tuning

0 comments

The pith

Sequential layer-by-layer adapter training replaces end-to-end updates to enable private LLM fine-tuning on edge devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Chain Federated Fine-Tuning to overcome the memory gap between large language models and edge hardware in federated settings. Rather than updating the full model at once, the method trains one adapter layer to convergence, freezes its weights, and repeats for subsequent layers to build task performance gradually. Three supporting techniques handle information flow between frozen layers, give each step broader awareness of the overall goal, and select effective starting points automatically. This chain structure keeps all computation local for privacy while fitting within tight device memory limits.

Core claim

Chain Federated Fine-Tuning forgoes end-to-end updates in favor of a sequential, layer-by-layer manner. It first trains the initial adapter to convergence, freezes its weights, and then proceeds to the next. This iterative train-and-freeze process forms an optimization chain that gradually enhances the model's task-specific proficiency, supported by Dynamic Layer Co-Tuning to bridge semantic gaps, Globally Perceptive Optimization to give each adapter foresight, and Function-Oriented Adaptive Tuning to identify optimal starting points.

What carries the argument

The ChainFed optimization chain, which trains and freezes adapters sequentially while using dynamic co-tuning, global perception, and adaptive starting-point selection to maintain performance across layers.

Load-bearing premise

The three added techniques can close semantic gaps between layers and avoid optimization failures that would otherwise cause accuracy to fall below end-to-end training.

What would settle it

Compare peak memory usage and final accuracy of ChainFed against end-to-end fine-tuning on the same LLM, dataset, and edge device with a fixed memory budget; if ChainFed exceeds the memory limit or shows lower accuracy, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2604.06819 by Chunlin Tian, Jingguang Li, Kahou Tam, Li Li, Yebo Wu, Zhijiang Guo.

**Figure 1.** Figure 1: The configuration of the adapter. IIDnon-IID 65 70 75 80 85 (a) YELP-P. Accuracy (%) Practical. Ideal. IIDnon-IID 75 80 85 90 (b) AGNEWS. Accuracy (%) IIDnon-IID 50 55 60 65 70 (c) YAHOO. Accuracy (%) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 5.** Figure 5: Overview of the Dynamic Layer Co-Tuning mechanism, illustrated with an example where the sliding window size is two (Q = 2). vantages: 1) Bridging Semantic Gaps. The interstage overlap directly mitigates representational mismatch. By training an adapter alongside both its upstream and downstream neighbors, it is incentivized to function as a semantic anchor, aligning feature representations across layer… view at source ↗

**Figure 6.** Figure 6: The Globally Perceptive Optimization strat [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: The Function-Oriented Adaptive Tuning scheme. Layer functionality is analyzed using CKA. formations, enabling accurate estimation of the endto-end loss with minimal overhead. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: Impact of the co-tuning window size (Q) on model performance and memory usage for BERT. 11.74%, respectively. Furthermore, the consistently poor performance of Linear Probing confirms that merely fine-tuning the output layer is insufficient for effective task adaptation. • Comparison with Memory-Aware Methods. FwdLLM and FedKSeed reduce activation memory but ignore the dominant parameter bottleneck. Furth… view at source ↗

**Figure 9.** Figure 9: Impact of the global loss weight on model [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Sensitivity analysis of key hyperparameters [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Case Study for LLaMA3.1-8B. ing its generated responses to those from the Full Adapters† for the same instruction. As shown in [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

read the original abstract

Federated fine-tuning enables privacy-preserving LLM adaptation but faces a critical bottleneck: the disparity between LLMs' high memory demands and edge devices' limited capacity. To break the memory barrier, we propose Chain Federated Fine-Tuning (ChainFed), an innovative paradigm that forgoes end-to-end updates in favor of a sequential, layer-by-layer manner. It first trains the initial adapter to convergence, freezes its weights, and then proceeds to the next. This iterative train-and-freeze process forms an optimization chain, gradually enhancing the model's task-specific proficiency. ChainFed further integrates three core techniques: 1) Dynamic Layer Co-Tuning to bridge semantic gaps between sequentially tuned layers and facilitate information flow; 2) Globally Perceptive Optimization to endow each adapter with foresight beyond its local objective; 3) Function-Oriented Adaptive Tuning to automatically identify the optimal fine-tuning starting point. Extensive experiments on multiple benchmarks demonstrate the superiority of ChainFed over existing methods, boosting average accuracy by up to 46.46\%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ChainFed introduces a sequential layer-by-layer adapter chain for memory-efficient federated LLM tuning on edge devices, but the large accuracy claims rest on details that are not visible in the abstract.

read the letter

The main point is a shift from end-to-end federated fine-tuning to a chain where each adapter is trained to convergence, then frozen, before moving to the next layer. The authors add three supporting pieces: Dynamic Layer Co-Tuning to keep information flowing across the frozen boundaries, Globally Perceptive Optimization so each step looks beyond its local loss, and Function-Oriented Adaptive Tuning to pick good starting points. This is a clear departure from standard adapter-based federated work and directly targets the memory wall on edge hardware while keeping data local for privacy. That framing of the problem is useful and timely. The paper does a reasonable job naming the practical constraints that make full-model updates impossible on phones or IoT devices. The chain construction itself is a straightforward way to cut peak memory. The three named techniques are presented as fixes for the obvious risks of sequential training. The soft spot is the performance evidence. The abstract states an average accuracy lift of up to 46.46 percent over existing methods, yet supplies no baselines, no dataset sizes, no ablation tables, and no error bars. Without those, it is impossible to tell whether the gains come from the chain structure or from favorable hyper-parameter choices or from the specific models tested. The concern about frozen early layers is also live: once an early adapter is locked, its representation is fixed for everything downstream, and the paper does not show a derivation or concrete mechanism proving that the co-tuning terms actually allow later adapters to correct upstream errors. If the full manuscript contains controlled experiments that isolate each technique and compare against strong end-to-end federated baselines, that would address the gap. Until then the central claim stays unverified. This work is aimed at people building distributed LLM systems for resource-limited settings. The idea is distinct enough and the problem real enough that it deserves a serious referee rather than a desk reject, though any review will need to press hard on the experimental controls and on whether the frozen-chain construction truly preserves global optimality.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes Chain Federated Fine-Tuning (ChainFed), a new paradigm for federated fine-tuning of large language models on edge devices that replaces end-to-end updates with a sequential layer-by-layer process. Each adapter is trained to convergence, frozen, and the process moves to the next layer, supported by three techniques: Dynamic Layer Co-Tuning to bridge semantic gaps, Globally Perceptive Optimization for broader foresight, and Function-Oriented Adaptive Tuning to choose optimal starting points. The authors report that this method achieves up to 46.46% improvement in average accuracy over existing approaches on multiple benchmarks while addressing memory constraints for privacy-preserving adaptation.

Significance. Should the superiority claims be substantiated, this work could have substantial impact in the field of distributed and edge computing for AI, as it offers a practical way to fine-tune LLMs under strict memory and privacy constraints without requiring full model updates. The chain optimization idea may inspire new directions in memory-efficient federated learning.

major comments (3)

The abstract asserts a 46.46% accuracy boost but omits any mention of the specific benchmarks, baseline methods, number of experimental runs, or variance, which prevents evaluation of whether the central claim of superiority is supported.
There is no derivation or explicit mechanism described for how Dynamic Layer Co-Tuning enables gradient or information flow across frozen layer boundaries; this is critical because freezing early adapters fixes representations that later layers cannot adjust, potentially leading to suboptimal performance as per standard transformer architecture concerns.
The manuscript lacks ablation studies that remove individual techniques (e.g., without Globally Perceptive Optimization) or compare to a simple sequential baseline without the three techniques, leaving the attribution of gains to the proposed chain paradigm unverified.

minor comments (1)

Clarify what 'average accuracy' refers to (e.g., across which datasets) to avoid ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their detailed and constructive feedback on our work. We address each of the major comments below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: The abstract asserts a 46.46% accuracy boost but omits any mention of the specific benchmarks, baseline methods, number of experimental runs, or variance, which prevents evaluation of whether the central claim of superiority is supported.

Authors: We concur that the abstract would benefit from additional details to better support the central claim. In the revised manuscript, we will expand the abstract to include references to the specific benchmarks employed, the baseline methods against which ChainFed is compared, and a note that the accuracy improvements are reported as averages over multiple runs with associated variance measures detailed in the experimental section. This will allow readers to more readily assess the robustness of the reported gains. revision: yes
Referee: There is no derivation or explicit mechanism described for how Dynamic Layer Co-Tuning enables gradient or information flow across frozen layer boundaries; this is critical because freezing early adapters fixes representations that later layers cannot adjust, potentially leading to suboptimal performance as per standard transformer architecture concerns.

Authors: This is a valid concern regarding the information flow in the sequential tuning process. Dynamic Layer Co-Tuning is intended to mitigate semantic gaps by incorporating dynamic adjustments that allow subsequent layers to build upon the frozen representations through perceptive optimization and adaptive mechanisms rather than direct gradient propagation. To address the referee's point, we will include in the revision a more rigorous explanation of the mechanism, including any mathematical formulations or algorithmic details that clarify how information is effectively transferred across layer boundaries despite the freezing, and discuss why this does not lead to the suboptimal performance suggested by standard concerns. revision: yes
Referee: The manuscript lacks ablation studies that remove individual techniques (e.g., without Globally Perceptive Optimization) or compare to a simple sequential baseline without the three techniques, leaving the attribution of gains to the proposed chain paradigm unverified.

Authors: We appreciate the suggestion to include ablation studies, as they are essential for validating the contributions of each proposed technique. Although the current version emphasizes the end-to-end performance of the full ChainFed framework, we will add a dedicated ablation study section in the revised manuscript. This will include results from variants where each technique is removed individually, as well as a direct comparison to a simple sequential layer-by-layer adapter training baseline that does not incorporate Dynamic Layer Co-Tuning, Globally Perceptive Optimization, or Function-Oriented Adaptive Tuning. These additions will help attribute the observed improvements specifically to the chain optimization paradigm and its components. revision: yes

Circularity Check

0 steps flagged

No circularity: new algorithmic paradigm with empirical claims only

full rationale

The paper presents ChainFed as a novel sequential train-and-freeze paradigm for federated LLM fine-tuning, augmented by three named techniques (Dynamic Layer Co-Tuning, Globally Perceptive Optimization, Function-Oriented Adaptive Tuning). No equations, fitted parameters, or self-citations appear in the provided text that would reduce any performance claim or optimality statement to a definition or prior fit by construction. The 46.46% accuracy figure is stated as an experimental outcome rather than a derived prediction. The central argument is therefore an independent algorithmic proposal whose validity rests on external benchmarks, not on internal self-definition or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 4 invented entities

The central claim rests on standard domain assumptions about memory constraints and privacy needs in federated settings plus the newly introduced ChainFed paradigm and supporting techniques; no numerical free parameters are specified.

axioms (1)

domain assumption Edge devices have insufficient memory for end-to-end LLM fine-tuning while federated settings require local training to preserve privacy
Explicitly stated as the critical bottleneck motivating the work.

invented entities (4)

Chain Federated Fine-Tuning (ChainFed) optimization chain no independent evidence
purpose: Sequential layer-by-layer adapter training with freezing to fit within edge memory limits
Core new paradigm introduced to replace end-to-end updates.
Dynamic Layer Co-Tuning no independent evidence
purpose: Bridge semantic gaps between sequentially tuned layers and facilitate information flow
Supporting technique proposed to address potential issues from freezing.
Globally Perceptive Optimization no independent evidence
purpose: Endow each adapter with foresight beyond its local objective
Supporting technique to improve optimization quality.
Function-Oriented Adaptive Tuning no independent evidence
purpose: Automatically identify the optimal fine-tuning starting point
Supporting technique for choosing where to begin the chain.

pith-pipeline@v0.9.0 · 5488 in / 1515 out tokens · 71486 ms · 2026-05-10T18:24:39.514232+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation
cs.CV 2026-04 unverdicted novelty 5.0

DaID mitigates MLLM hallucinations by attention-guided selection of dual layers that calibrate token generation using internal perceptual discrepancies.

Reference graph

Works this paper leans on

64 extracted references · 31 canonical work pages · cited by 1 Pith paper · 13 internal anchors

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
[3]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. 2023. Qwen technical report. arXiv preprint arXiv:2309.16609

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Jieming Bian, Yuanzhe Peng, Lei Wang, Yin Huang, and Jie Xu. 2025. A survey on parameter-efficient fine-tuning for foundation models in federated learning. arXiv preprint arXiv:2504.21099

work page arXiv 2025
[5]

Dongqi Cai, Yaozong Wu, Shangguang Wang, Felix Xiaozhu Lin, and Mengwei Xu. 2022. Fedadapter: Efficient federated learning for modern nlp. arXiv preprint arXiv:2205.10162

work page arXiv 2022
[6]

Dongqi Cai, Yaozong Wu, Shangguang Wang, and Mengwei Xu. 2023. Fedadapter: Efficient federated learning for mobile nlp. In Proceedings of the ACM Turing Award Celebration Conference-China 2023, pages 27--28

2023
[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://arxiv.org/abs/1810.04805 Bert: Pre-training of deep bidirectional transformers for language understanding . Preprint, arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019
[8]

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019. DROP : A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proc. of NAACL

2019
[9]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

J \"o rg Frohberg and Frank Binder. 2022. Crass: A novel data set and benchmark to test counterfactual reasoning of large language models. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2126--2140

2022
[11]

Arthur Gretton, Olivier Bousquet, Alex Smola, and Bernhard Sch \"o lkopf. 2005. Measuring statistical dependence with hilbert-schmidt norms. In International conference on algorithmic learning theory, pages 63--77. Springer

2005
[12]

Ruidan He, Linlin Liu, Hai Ye, Qingyu Tan, Bosheng Ding, Liying Cheng, Jia-Wei Low, Lidong Bing, and Luo Si. 2021. On the effectiveness of adapter-based tuning for pretrained language model adaptation. arXiv preprint arXiv:2106.03164

work page arXiv 2021
[13]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300

work page internal anchor Pith review Pith/arXiv arXiv 2020
[14]

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790--2799. PMLR

2019
[15]

Yeachan Kim, Junho Kim, Wing-Lam Mok, Jun-Hyung Park, and SangKeun Lee. 2023. Client-customized adaptation for parameter-efficient federated learning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1159--1172

2023
[16]

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019 a . Similarity of neural network representations revisited. In International conference on machine learning, pages 3519--3529. PMLR

2019
[17]

Simon Kornblith, Jonathon Shlens, and Quoc V Le. 2019 b . Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2661--2671

2019
[18]

Ken Lang. 1995. Newsweeder: learning to filter netnews. In Proceedings of the Twelfth International Conference on International Conference on Machine Learning, ICML'95, page 331–339, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc

1995
[19]

Heju Li, Rui Wang, Wei Zhang, and Jun Wu. 2022. One bit aggregation for federated edge learning with reconfigurable intelligent surface: Analysis and optimization. IEEE Transactions on Wireless Communications, 22(2):872--888

2022
[20]

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2019. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189

work page arXiv 2019
[21]

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Yi Liu, Xiaohan Bi, Lei Li, Sishuo Chen, Wenkai Yang, and Xu Sun. 2023. Communication efficient federated learning for multilingual neural machine translation with adapter. arXiv preprint arXiv:2305.12449

work page arXiv 2023
[23]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. https://arxiv.org/abs/1907.11692 Roberta: A robustly optimized bert pretraining approach . Preprint, arXiv:1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 2019
[24]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273--1282. PMLR

2017
[26]

Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. 2023. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277

work page internal anchor Pith review arXiv 2023
[27]

Zhen Qin, Daoyuan Chen, Bingchen Qian, Bolin Ding, Yaliang Li, and Shuiguang Deng. 2023. Federated full-parameter tuning of billion-sized language models with communication cost under 18 kilobytes. arXiv preprint arXiv:2312.06353

work page arXiv 2023
[28]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. https://arxiv.org/abs/1910.01108 Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter . Preprint, arXiv:1910.01108

work page internal anchor Pith review arXiv 2020
[29]

Shangchao Su, Bin Li, and Xiangyang Xue. 2024. https://arxiv.org/abs/2311.11227 Fedra: A random allocation strategy for federated tuning to unleash the power of heterogeneous clients . Preprint, arXiv:2311.11227

work page arXiv 2024
[30]

Mirac Suzgun, Nathan Scales, Nathanael Sch \"a rli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V Le, Ed H Chi, Denny Zhou, , and Jason Wei. 2022. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261

work page internal anchor Pith review arXiv 2022
[31]

Kahou Tam, Li Li, Bo Han, Chengzhong Xu, and Huazhu Fu. 2023. Federated noisy client learning. IEEE transactions on neural networks and learning systems, 36(1):1799--1812

2023
[32]

Kahou Tam, Chunlin Tian, Li Li, Haikai Zhao, and ChengZhong Xu. 2024. Fedhybrid: Breaking the memory wall of federated learning via hybrid tensor management. In Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems, pages 394--408

2024
[33]

Chunlin Tian, Li Li, Zhan Shi, Jun Wang, and ChengZhong Xu. 2022. Harmony: Heterogeneity-aware hierarchical management for federated learning system. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 631--645. IEEE

2022
[34]

Chunlin Tian, Li Li, Kahou Tam, Yebo Wu, and Cheng-Zhong Xu. 2024. Breaking the memory wall for heterogeneous federated learning via model splitting. IEEE Transactions on Parallel and Distributed Systems, 35(12):2513--2526

2024
[35]

Chunlin Tian, Kahou Tam, Yebo Wu, Shuaihang Zhong, Li Li, Nicholas D Lane, and ChengZhong Xu. 2026. Floe: Federated specialization for real-time llm--slm inference. IEEE Transactions on Parallel and Distributed Systems

2026
[36]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Hui-Po Wang, Sebastian Stich, Yang He, and Mario Fritz. 2022. Progfed: effective, communication, and computation efficient federated learning by progressive training. In International Conference on Machine Learning, pages 23034--23054. PMLR

2022
[38]

Jie Wang, Xiaolong Wu, Jindong Tian, Erwu Liu, Yebo Wu, Rucong Lai, and Yong Tian. 2025. Indoor localization fusing inertial navigation with monocular depth estimation in federated learning framework with data heterogeneity. IEEE Transactions on Instrumentation and Measurement

2025
[39]

Jie Wang, Yebo Wu, Erwu Liu, Xiaolong Wu, Xinyu Qu, Yuanzhe Geng, and Hanfu Zhang. 2023. Fedins2: A federated-edge-learning-based inertial navigation system with segment fusion. IEEE Internet of Things Journal

2023
[40]

Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, and Ang Li. 2024. Flora: Federated fine-tuning large language models with heterogeneous low-rank adaptations. arXiv preprint arXiv:2409.05976

work page arXiv 2024
[41]

T Wolf. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771

work page internal anchor Pith review arXiv 2019
[42]

Developmental federated tuning: A cognitive-inspired paradigm for efficient llm adaptation

Yebo Wu, Jingguang Li, Zhijiang Guo, and Li Li. Developmental federated tuning: A cognitive-inspired paradigm for efficient llm adaptation. In The Fourteenth International Conference on Learning Representations
[43]

Yebo Wu, Jingguang Li, Zhijiang Guo, and Li Li. 2025 a . Elastic mixture of rank-wise experts for knowledge reuse in federated fine-tuning. arXiv preprint arXiv:2512.00902

work page arXiv 2025
[44]

Yebo Wu, Jingguang Li, Chunlin Tian, Zhijiang Guo, and Li Li. 2025 b . Memory-efficient federated fine-tuning of large language models via layer pruning. arXiv preprint arXiv:2508.17209

work page arXiv 2025
[45]

Yebo Wu, Jingguang Li, Chunlin Tian, Kahou Tam, Li Li, and Chengzhong Xu. 2024 a . Bridging memory gaps: Scaling federated learning for heterogeneous clients. arXiv preprint arXiv:2408.10826

work page arXiv 2024
[46]

Yebo Wu, Li Li, Chunlin Tian, Tao Chang, Chi Lin, Cong Wang, and Cheng-Zhong Xu. 2024 b . Heterogeneity-aware memory efficient federated learning via progressive layer freezing. In 2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS), pages 1--10. IEEE

2024
[47]

Yebo Wu, Li Li, and Cheng-zhong Xu. 2025 c . Breaking the memory wall for heterogeneous federated learning via progressive training. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1623--1632

2025
[48]

Yebo Wu, Feng Liu, Ziwei Xie, Zhiyuan Liu, Changwang Zhang, Jun Wang, and Li Li. 2026. Tsembed: Unlocking task scaling in universal multimodal embeddings. arXiv preprint arXiv:2603.04772

work page arXiv 2026
[49]

Yebo Wu, Chunlin Tian, Jingguang Li, He Sun, Kahou Tam, Li Li, and Chengzhong Xu. 2025 d . A survey on federated fine-tuning of large language models. arXiv preprint arXiv:2503.12016

work page arXiv 2025
[50]

Mengwei Xu, Dongqi Cai, Yaozong Wu, Xiang Li, and Shangguang Wang. 2023. Fwdllm: Efficient fedllm using forward gradient. arXiv preprint arXiv:2308.13894

work page arXiv 2023
[51]

misremember

Naen Xu, Hengyu An, Shuo Shi, Jinghuai Zhang, Chunyi Zhou, Changjiang Li, Tianyu Du, Zhihui Fu, Jun Wang, and Shouling Ji. 2026 a . When agents" misremember" collectively: Exploring the mandela effect in llm-based multi-agent systems. arXiv preprint arXiv:2602.00428

work page arXiv 2026
[52]

Naen Xu, Changjiang Li, Tianyu Du, Minxi Li, Wenjie Luo, Jiacheng Liang, Yuyuan Li, Xuhong Zhang, Meng Han, Jianwei Yin, et al. 2024. Copyrightmeter: Revisiting copyright protection in text-to-image models. arXiv preprint arXiv:2411.13144

work page arXiv 2024
[53]

"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns?

Naen Xu, Jiayi Sheng, Changjiang Li, Chunyi Zhou, Yuyuan Li, Tianyu Du, Jun Wang, Zhihui Fu, Jinbao Li, and Shouling Ji. 2026 b . https://arxiv.org/abs/2604.05930 "i see what you did there": Can large vision-language models understand multimodal puns? Preprint, arXiv:2604.05930

work page internal anchor Pith review Pith/arXiv arXiv 2026
[54]

Naen Xu, Jinghuai Zhang, Ping He, Chunyi Zhou, Jun Wang, Zhihui Fu, Tianyu Du, Zhaoxiang Wang, and Shouling Ji. 2026 c . Fraudshield: Knowledge graph empowered defense for llms against fraud attacks. arXiv preprint arXiv:2601.22485

work page arXiv 2026
[55]

Naen Xu, Jinghuai Zhang, Changjiang Li, Hengyu An, Chunyi Zhou, Jun Wang, Boyu Xu, Yuyuan Li, Tianyu Du, and Shouling Ji. 2026 d . Bridging the copyright gap: Do large vision-language models recognize and respect copyrighted content? In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 35949--35957

2026
[56]

Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, and Shouling Ji. 2025. Videoeraser: Concept erasure in text-to-video diffusion models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 5965--5994

2025
[57]

Yuhang Yao, Jianyi Zhang, Junda Wu, Chengkai Huang, Yu Xia, Tong Yu, Ruiyi Zhang, Sungchul Kim, Ryan Rossi, Ang Li, et al. 2024. Federated large language models: Current progress and future directions. arXiv preprint arXiv:2409.15723

work page arXiv 2024
[58]

Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, and Siheng Chen. 2024. Openfedllm: Training large language models on decentralized private data via federated learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6137--6147

2024
[59]

Shichen Zhan, Yebo Wu, Chunlin Tian, Yan Zhao, and Li Li. 2024. Heterogeneity-aware coordination for federated learning via stitching pre-trained blocks. In 2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS), pages 1--10. IEEE

2024
[60]

Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Guoyin Wang, and Yiran Chen. 2024. Towards building the federatedgpt: Federated instruction tuning. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6915--6919. IEEE

2024
[61]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf Character-level convolutional networks for text classification . In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc

2015
[62]

Xiangtao Zhang, Eleftherios Kofidis, Ruituo Wu, Ce Zhu, Le Zhang, and Yipeng Liu. 2026. Coupled tensor train decomposition in federated learning. Pattern Recognition, 170:112067

2026
[63]

Xiangtao Zhang, Sheng Li, Ao Li, Yipeng Liu, Fan Zhang, Ce Zhu, and Le Zhang. 2025. Subspace constraint and contribution estimation for heterogeneous federated learning. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 20632--20642

2025
[64]

Enhancing storage and computational efficiency in federated multimodal learning for large-scale models

Zixin Zhang, Fan Qi, and Changsheng Xu. Enhancing storage and computational efficiency in federated multimodal learning for large-scale models. In Forty-first International Conference on Machine Learning