arxiv: 2604.26388 · v1 · submitted 2026-04-29 · 💻 cs.DC · cs.LG

Recognition: unknown

SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

Yimeng Shan , Zhaorui Zhang , Sheng Di , Yu Liu , Xiaoyi Lu , Benben Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-07 10:58 UTC · model grok-4.3

classification 💻 cs.DC cs.LG

keywords federated learningsplit learningLLM fine-tuningadaptive cut layerLoRAcommunication efficiencydata heterogeneity

0 comments

The pith

SplitFT lets clients choose cut layers by resources and trims LoRA rank at the split to speed federated LLM fine-tuning

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SplitFT, a system that lets each client in a federated split learning setup pick its own cut layer according to available compute and current model quality. This addresses heterogeneity in devices and data when fine-tuning large language models. To lower communication costs, SplitFT reduces the LoRA rank specifically at the chosen cut layer. A new length-based Dirichlet method splits the training data to mimic real-world distributions across clients. Tests on standard benchmarks show faster fine-tuning times and better final model performance than previous methods.

Core claim

SplitFT is an adaptive federated split learning system for LLMs fine-tuning where clients set different cut layers based on their computation resources and trained model performance, and LoRA rank is reduced in the cut layer to decrease communication overhead, with a length-based Dirichlet approach for data division, leading to improved fine-tuning time efficiency and model performance.

What carries the argument

Adaptive selection of cut layers by clients according to local resources and performance, paired with LoRA rank reduction at the cut layer to minimize communication.

If this is right

Clients with limited resources can still participate effectively by choosing shallower cut layers.
Overall system fine-tuning completes faster while achieving higher model accuracy on benchmarks.
Communication volume drops due to lower LoRA rank without harming convergence.
Heterogeneous data distributions are handled better through the proposed partitioning method.
The approach scales to various popular LLM benchmarks without additional coordination overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method might allow integration with dynamic cut-layer changes during training if performance monitoring is added.
Similar adaptations could apply to other split learning scenarios beyond LLMs, such as vision models.
The length-based Dirichlet division could be compared to standard Dirichlet for generalizability in other federated settings.
Reducing rank only at cut might preserve more model capacity if applied selectively.

Load-bearing premise

Clients can select different cut layers based on local resources and model performance without introducing coordination overhead, instability in convergence, or degradation in final model quality.

What would settle it

Running the system with fixed cut layers versus adaptive selection in a highly heterogeneous client setup and observing no improvement or degradation in time or performance would falsify the benefit of adaptation.

Figures

Figures reproduced from arXiv: 2604.26388 by Benben Liu, Sheng Di, Xiaoyi Lu, Yimeng Shan, Yu Liu, Zhaorui Zhang.

**Figure 1.** Figure 1: The System Overview of Our Proposed SplitFT. number of samples allocated to client i from category k is nki = ⌊pki · nk⌋. Using these values, samples from Dk are randomly selected and allocated to the respective clients. Each client’s local dataset is then formed by combining the allocated samples from all categories: Dc,i = SK k=1 Dki, where Dki represents the subset of samples from category k assigned to… view at source ↗

**Figure 2.** Figure 2: The Impact of LoRA Rank and Cutlayer on Model Performance and Quality. view at source ↗

**Figure 3.** Figure 3: The Performance Comparison for SplitFT and Baselines. TABLE I: Comparison for Accuracy, Elapsed Time, Round Time, and Communication Overhead for Different Cutlayers. Cutlayer Max Accuracy Mean Elapsed Time (s) Mean Round Time (s) Max Comm Overhead (MB) 2 0.0606 810.4379 0.0347 3475.3674 4 0.0571 863.2450 0.0424 3534.3875 6 0.0605 934.1831 0.0547 3593.4076 8 0.0621 1113.3617 0.0635 3652.4278 10 0.0629 1104.… view at source ↗

**Figure 4.** Figure 4: Generalizability of SplitFT Across Different Models view at source ↗

read the original abstract

Federated Split Learning has been identified as an efficient approach to address the computational resource constraints of clients in classical federated learning, while guaranteeing data privacy for distributed model training across data owners. However, it faces some critical challenges when such a training strategy meets large language models (LLMs) for fine-tuning. Such challenges include setting the cutlayer adaptively across different clients to address the data and device heterogeneity issues, which affect the system performance significantly. In addition, efficiently reducing the communication overhead during the fine-tuning procedure is also another challenge. No work tries to address these challenges. To bridge this gap, we propose SplitTF, an adaptive federated split learning system for LLMs fine-tuning. SplitFT enables different clients to set different cut layers according to their computation resources and trained model performance. SplitFT also proposes to reduce the LoRA rank in cutlayer to reduce the communication overhead. In addition to simulating the heterogeneous data in real-world applications for our proposed split federated learning system, we propose a length-based Dirichlet approach to divide the training data into different clients. Extensive experimental results show that our proposed approach outperforms the state-of-the-art approach for fine-tuning time efficiency and model performance based on various popular benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SplitFT adapts cut layers per client and shrinks LoRA rank at the cut for federated LLM fine-tuning, which targets real heterogeneity, but the server-side handling of mismatched model segments is the part that still needs explicit validation.

read the letter

SplitFT's main move is letting each client pick its own cut layer based on local resources and how well the model is training there, then dropping the LoRA rank at that cut to shrink the activations and gradients that get sent. They also use a length-based Dirichlet split to create more realistic heterogeneous data partitions across clients. That specific pairing of adaptive cuts and rank reduction at the boundary has not been done before in federated split learning for LLMs, so the design itself is the fresh element worth noting.

Referee Report

2 major / 1 minor

Summary. The paper proposes SplitFT (noted as SplitTF in one place), an adaptive federated split learning system for fine-tuning LLMs. Clients independently select different cut layers according to local compute resources and model performance to address heterogeneity; the system reduces LoRA rank specifically at the cut layer to lower communication cost; a length-based Dirichlet method is introduced to partition training data heterogeneously across clients; and the authors claim that extensive experiments show outperformance versus SOTA in both fine-tuning time efficiency and final model quality on popular benchmarks.

Significance. If the adaptive cut-layer mechanism and server-side aggregation can be shown to preserve convergence and model quality under realistic heterogeneity, the work would be a meaningful step toward practical federated LLM fine-tuning on resource-constrained devices while preserving privacy. The length-based Dirichlet partitioning is a concrete, reusable contribution for simulating non-IID data in split-learning studies.

major comments (2)

[Abstract] Abstract: the central claim that 'extensive experimental results show that our proposed approach outperforms the state-of-the-art approach for fine-tuning time efficiency and model performance' is stated without any quantitative metrics, baselines, statistical tests, or description of how cut-layer decisions are validated. This absence makes the performance claims impossible to assess from the provided text.
[Abstract] Abstract (system description): SplitFT lets each client choose a different cut layer, yet no mechanism is described for aligning activations/gradients or performing partial aggregation when clients operate on mismatched model segments. Because the cut layer determines exactly which parameters are updated locally versus on the server, heterogeneous choices directly affect the global model update; without an explicit reconciliation strategy, the reported gains in convergence speed and final quality rest on an unstated assumption that such heterogeneity introduces neither instability nor quality loss.

minor comments (1)

[Abstract] Abstract: the system name is introduced as 'SplitTF' and then used as 'SplitFT'; this inconsistency should be corrected for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our paper. We have addressed each of the major comments point by point below and made revisions to the manuscript where necessary to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'extensive experimental results show that our proposed approach outperforms the state-of-the-art approach for fine-tuning time efficiency and model performance' is stated without any quantitative metrics, baselines, statistical tests, or description of how cut-layer decisions are validated. This absence makes the performance claims impossible to assess from the provided text.

Authors: We agree that the abstract, as a concise summary, does not include specific quantitative metrics or details on validation. The full experimental results, including quantitative comparisons to state-of-the-art methods, time efficiency gains, model performance on benchmarks, and the process for cut-layer selection and validation, are presented in detail in Sections 4 and 5 of the manuscript. To address this concern, we have revised the abstract to incorporate key quantitative highlights from our experiments and a brief mention of the cut-layer decision validation, while maintaining its brevity. This revision makes the claims more assessable directly from the abstract. revision: yes
Referee: [Abstract] Abstract (system description): SplitFT lets each client choose a different cut layer, yet no mechanism is described for aligning activations/gradients or performing partial aggregation when clients operate on mismatched model segments. Because the cut layer determines exactly which parameters are updated locally versus on the server, heterogeneous choices directly affect the global model update; without an explicit reconciliation strategy, the reported gains in convergence speed and final quality rest on an unstated assumption that such heterogeneity introduces neither instability nor quality loss.

Authors: We thank the referee for highlighting this important aspect. Upon review, we realize that while the mechanism is implemented in our system (as the experiments demonstrate convergence), the description in the abstract is insufficient. We have revised the abstract to briefly outline the alignment process: clients send activations to the server at their cut layer, the server handles the remaining computation and aggregates server-side LoRA updates. We have also added a dedicated paragraph in Section 3 explaining the partial aggregation and how it preserves model consistency and convergence under heterogeneous cut layers, including analysis showing no significant instability in our experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system design with no derivations or fitted predictions

full rationale

The paper describes a system architecture (SplitFT) for adaptive cut-layer selection and LoRA rank reduction in federated split learning for LLMs, plus a length-based Dirichlet data partitioning method. All claims of outperformance are based on experimental results across benchmarks rather than any mathematical derivation chain, first-principles predictions, or parameter fitting. No equations, self-citations as load-bearing premises, ansatzes, or uniqueness theorems are invoked that could reduce to inputs by construction. The contribution is a practical design evaluated empirically, making it self-contained against external benchmarks with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The proposal rests on standard assumptions of federated learning (data privacy via local training) and parameter-efficient fine-tuning via LoRA. No new free parameters, axioms, or invented entities are introduced in the abstract description.

pith-pipeline@v0.9.0 · 5531 in / 1070 out tokens · 72432 ms · 2026-05-07T10:58:08.500853+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 26 canonical work pages · 4 internal anchors

[1]

A scoping review of using large language models (llms) to investigate electronic health records (ehrs).arXiv preprint arXiv:2405.03066, 2024

L. Li, J. Zhou, Z. Gao, W. Hua, L. Fan, H. Yu, L. Hagen, Y . Zhang, T. L. Assimes, L. Hemphill, and S. Ma, “A scoping review of using large language models (llms) to investigate electronic health records (ehrs),” 2024. [Online]. Available: https://arxiv.org/abs/2405.03066

work page arXiv 2024
[2]

Storellm: Energy efficient large language model inference with permanently pre-stored attention matrices,

D. Wang, B. Liu, R. Lu, Z. Zhang, and S. Zhu, “Storellm: Energy efficient large language model inference with permanently pre-stored attention matrices,” inProceedings of the 16th ACM International Conference on Future and Sustainable Energy Systems, ser. E-Energy ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 398–406. [Online]. Av...

work page doi:10.1145/3679240.3734604 2025
[3]

Moe- compression: How the compression error of experts affects the inference accuracy of moe model?

S. Ma, Z. Zhang, S. Di, B. Liu, X. Yu, X. Lu, and D. Wang, “Moe- compression: How the compression error of experts affects the inference accuracy of moe model?”arXiv preprint arXiv:2509.07727, 2025

work page arXiv 2025
[4]

Compression error sensitivity analysis for different experts in moe model inference,

——, “Compression error sensitivity analysis for different experts in moe model inference,” inProceedings of the SC’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2025, pp. 339–348

2025
[5]

GPT-4 Technical Report

O. Group, “Gpt-4 technical report,” 2024. [Online]. Available: https://arxiv.org/abs/2303.08774

work page internal anchor Pith review arXiv 2024
[6]

Challenges and applications of large language models,

J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R. McHardy, “Challenges and applications of large language models,”
[7]

Challenges and applications of large language models

[Online]. Available: https://arxiv.org/abs/2307.10169

work page arXiv
[8]

Attention Is All You Need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2023. [Online]. Available: https://arxiv.org/abs/1706.03762

work page internal anchor Pith review arXiv 2023
[9]

Tanda: Transfer and adapt pre-trained transformer models for answer sentence selection,

S. Garg, T. Vu, and A. Moschitti, “Tanda: Transfer and adapt pre-trained transformer models for answer sentence selection,” 2019. [Online]. Available: https://arxiv.org/abs/1911.04118

work page arXiv 2019
[10]

Large language models in healthcare and medical domain: A review,

Z. A. Nazi and W. Peng, “Large language models in healthcare and medical domain: A review,”Informatics, vol. 11, no. 3, 2024. [Online]. Available: https://www.mdpi.com/2227-9709/11/3/57

2024
[11]

Peft: State-of-the-art parameter-efficient fine-tuning methods,

S. Mangrulkar, S. Gugger, L. Debut, Y . Belkada, S. Paul, and B. Bossan, “Peft: State-of-the-art parameter-efficient fine-tuning methods,” inPeft: State-of-the-art parameter-efficient fine-tuning methods, 2022

2022
[12]

Hlora: Efficient federated learning system for llm heterogeneous fine-tuning,

Q. Liu, Z. Zhang, X. Yao, and B. Liu, “Hlora: Efficient federated learning system for llm heterogeneous fine-tuning,”arXiv preprint arXiv:2503.00813, 2025

work page arXiv 2025
[13]

Cllora: An approach to measure the effects of the context length for llm fine-tuning,

P. Zhang, Z. Zhang, S. Di, Y . Xin, and B. Liu, “Cllora: An approach to measure the effects of the context length for llm fine-tuning,”arXiv preprint arXiv:2502.18910, 2025

work page arXiv 2025
[14]

The Power of Scale for Parameter-Efficient Prompt Tuning

B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,”arXiv preprint arXiv:2104.08691, 2021

work page internal anchor Pith review arXiv 2021
[15]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” inInternational conference on machine learning. PMLR, 2019, pp. 2790–2799

2019
[16]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

2022
[17]

Re-thinking data strategy and integration for artificial intelligence: Concepts, opportunities, and challenges,

A. Aldoseri, K. N. Al-Khalifa, and A. M. Hamouda, “Re-thinking data strategy and integration for artificial intelligence: Concepts, opportunities, and challenges,”Applied Sciences, vol. 13, no. 12, 2023. [Online]. Available: https://www.mdpi.com/2076-3417/13/12/7082

2023
[18]

Fedfa: A fully asynchronous training paradigm for federated learning,

H. Xu, Z. Zhang, S. Di, B. Liu, K. A. Alharthi, and J. Cao, “Fedfa: A fully asynchronous training paradigm for federated learning,” in33rd International Joint Conference on Artificial Intelligence, IJCAI 2024. International Joint Conferences on Artificial Intelligence, 2024, pp. 5281–5288

2024
[19]

Fedcspc: A cross-silo federated learning system with error-bounded lossy parameter compression,

Z. Zhang, S. Di, K. Zhao, S. Jin, D. Tao, Z. Ji, B. Liu, K. A. Alharthi, J. Cao, and F. Cappello, “Fedcspc: A cross-silo federated learning system with error-bounded lossy parameter compression,”IEEE Transactions on Parallel and Distributed Systems, 2025

2025
[20]

Fedefsz: Fair cross-silo federated learning system with error-bounded lossy compression,

Z. Zhang, S. Di, B. Liu, Z. Ji, G. Li, X. Lu, A. C. Zhou, K. A. Alharthi, and J. Cao, “Fedefsz: Fair cross-silo federated learning system with error-bounded lossy compression,”IEEE Transactions on Parallel and Distributed Systems, 2025

2025
[21]

Privacy-preserving deep learning,

R. Shokri and V . Shmatikov, “Privacy-preserving deep learning,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’15. New York, NY , USA: Association for Computing Machinery, 2015, p. 1310–1321. [Online]. Available: https://doi.org/10.1145/2810103.2813687

work page doi:10.1145/2810103.2813687 2015
[22]

Mipd: An adaptive gradient sparsification framework for distributed dnns training,

Z. Zhang and C. Wang, “Mipd: An adaptive gradient sparsification framework for distributed dnns training,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 11, pp. 3053–3066, 2022

2022
[23]

Sapus: Self-adaptive parameter update strategy for dnn training on multi-gpu clusters,

——, “Sapus: Self-adaptive parameter update strategy for dnn training on multi-gpu clusters,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 7, pp. 1569–1580, 2021

2021
[24]

Momentum-driven adaptive synchronization model for distributed dnn training on hpc clusters [j],

Z. Zhaorui, J. Zhuoran, and W. Choli, “Momentum-driven adaptive synchronization model for distributed dnn training on hpc clusters [j],” Journal of Parallel and Distributed Computing, vol. 159, 2022

2022
[25]

Split learning for health: Distributed deep learning without sharing raw patient data,

P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning for health: Distributed deep learning without sharing raw patient data,”
[27]

Split learning for health: Distributed deep learning without sharing raw patient data

——, “Split learning for health: Distributed deep learning without sharing raw patient data,” 2018. [Online]. Available: https://arxiv.org/abs/1812.00564

work page Pith review arXiv 2018
[28]

Splitfed: When federated learning meets split learning,

C. Thapa, M. A. P. Chamikara, S. Camtepe, and L. Sun, “Splitfed: When federated learning meets split learning,” 2022. [Online]. Available: https://arxiv.org/abs/2004.12088

work page arXiv 2022
[29]

International Conference on Learning Representations , year =

X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,”arXiv:1907.02189, 2019

work page arXiv 1907
[30]

Communication-efficient learning of deep networks from decentralized data,

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” 2023. [Online]. Available: https://arxiv.org/abs/1602.05629

work page arXiv 2023
[31]

Theoretical analysis of privacy leakage in trustworthy federated learning: A perspective from linear algebra and optimization theory,

X. Zhang and W. Chen, “Theoretical analysis of privacy leakage in trustworthy federated learning: A perspective from linear algebra and optimization theory,” 2024. [Online]. Available: https://arxiv.org/abs/2407.16735

work page arXiv 2024
[32]

Fedcmk: An efficient privacy-preserving federated learning framework,

P. Lu, X. Meng, and X. Liu, “Fedcmk: An efficient privacy-preserving federated learning framework,” inArtificial Intelligence Security and Privacy, J. Vaidya, M. Gabbouj, and J. Li, Eds. Singapore: Springer Nature Singapore, 2024, pp. 253–271

2024
[33]

Fairness and privacy-preserving in federated learning: A survey,

T. H. Rafi, F. A. Noor, T. Hussain, and D.-K. Chae, “Fairness and privacy-preserving in federated learning: A survey,” 2023. [Online]. Available: https://arxiv.org/abs/2306.08402

work page arXiv 2023
[34]

Splitfed: When federated learning meets split learning,

C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, “Splitfed: When federated learning meets split learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 36, no. 8, 2022, pp. 8485– 8493

2022
[35]

Communication and storage efficient federated split learning,

Y . Mu and C. Shen, “Communication and storage efficient federated split learning,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05599

work page arXiv 2023
[36]

Federated split learning via mutual knowledge distillation,

L. Luo and X. Zhang, “Federated split learning via mutual knowledge distillation,”IEEE Transactions on Network Science and Engineering, vol. 11, no. 3, pp. 2729–2741, 2024

2024
[37]

A survey on error-bounded lossy compression for scientific datasets,

S. Di, J. Liu, K. Zhao, X. Liang, R. Underwood, Z. Zhang, M. Shah, Y . Huang, J. Huang, X. Yuet al., “A survey on error-bounded lossy compression for scientific datasets,”ACM computing surveys, vol. 57, no. 11, pp. 1–38, 2025

2025
[38]

Feddes: A discrete-event simulator for large-scale federated learning

W. Chen, D. Zhang, Z. Chen, Z. Zhang, G. Li, S. Di, and X. Lu, “Feddes: A discrete-event simulator for large-scale federated learning.”
[39]

Ocelot: An interactive, efficient distributed compression-as-a-service platform with optimized data compression techniques,

Y . Liu, S. Di, J. Huang, Z. Zhang, K. Chard, and I. Foster, “Ocelot: An interactive, efficient distributed compression-as-a-service platform with optimized data compression techniques,”IEEE Transactions on Parallel and Distributed Systems, 2025

2025
[40]

An optimized error-controlled mpi collective framework integrated with lossy compression,

J. Huang, S. Di, X. Yu, Y . Zhai, Z. Zhang, J. Liu, X. Lu, K. Raffenetti, H. Zhou, K. Zhaoet al., “An optimized error-controlled mpi collective framework integrated with lossy compression,” in2024 IEEE Interna- tional Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2024, pp. 752–764

2024
[41]

C-coll: Introducing error-bounded lossy compression into mpi collectives,

J. Huang, S. Di, X. Yu, Y . Zhai, J. Liu, K. Raffenetti, H. Zhou, K. Zhao, Z. Chen, F. Cappelloet al., “C-coll: Introducing error-bounded lossy compression into mpi collectives,”arXiv preprint arXiv:2304.03890, 2023

work page arXiv 2023
[42]

Zccl: Significantly improving collective communication with error-bounded lossy compression,

J. Huang, S. Di, X. Yu, Y . Zhai, Z. Zhang, J. Liu, X. Lu, K. Raffenetti, H. Zhou, K. Zhaoet al., “Zccl: Significantly improving collective communication with error-bounded lossy compression,”arXiv preprint arXiv:2502.18554, 2025

work page arXiv 2025
[43]

An efficient gradient-aware error-bounded lossy compressor for federated learning,

Z. Ye, S. Di, J. Wang, Z. Zhong, Z. Zhang, and X. Yu, “An efficient gradient-aware error-bounded lossy compressor for federated learning,” arXiv preprint arXiv:2511.05770, 2025

work page arXiv 2025
[44]

Can lossy compression benefit nvme-based io?

D. Ng, D. Zhang, S. Di, Z. Zhang, and X. Lu, “Can lossy compression benefit nvme-based io?”
[45]

A dynamic virtual memory management system for llms on ai chips,

G. Wei, Z. Zhang, J. Xu, C. J. Zhang, X. Yao, and B. Liu, “A dynamic virtual memory management system for llms on ai chips,” in2025 IEEE 43rd International Conference on Computer Design (ICCD). IEEE, 2025, pp. 389–392

2025
[46]

Fastrei: Fast rare event identification on x-ray data with cross-stage optimizations,

Z. Hu, J. Wang, Z. Zhong, W. Zheng, H. Sharma, J.-S. Park, P. Kenesei, A. Miceli, Z. Zhang, R. Kettimuthuet al., “Fastrei: Fast rare event identification on x-ray data with cross-stage optimizations,” in2025 IEEE International Conference on Big Data (BigData). IEEE, 2025, pp. 2169–2176

2025
[47]

Deepebc: Compressing the pre-trained llms with error-bounded lossy compres- sion,

J. Xu, Z. Zhang, G. Wei, S. Di, B. Liu, X. Yu, and X. Lu, “Deepebc: Compressing the pre-trained llms with error-bounded lossy compres- sion,” inProceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops, 2026, pp. 274–283

2026
[48]

Lora: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,”
[49]

LoRA: Low-Rank Adaptation of Large Language Models

[Online]. Available: https://arxiv.org/abs/2106.09685

work page internal anchor Pith review arXiv
[50]

Lora-pro: Are low-rank adapters properly optimized?

Z. Wang, J. Liang, R. He, Z. Wang, and T. Tan, “Lora-pro: Are low-rank adapters properly optimized?” 2025. [Online]. Available: https://arxiv.org/abs/2407.18242

work page arXiv 2025
[51]

F., Cheng, K.-T., and Chen, M.-H

S.-Y . Liu, C.-Y . Wang, H. Yin, P. Molchanov, Y .-C. F. Wang, K.-T. Cheng, and M.-H. Chen, “Dora: Weight-decomposed low-rank adaptation,” 2024. [Online]. Available: https://arxiv.org/abs/2402.09353

work page arXiv 2024
[52]

Kalajdzievski

D. Kalajdzievski, “A rank stabilization scaling factor for fine-tuning with lora,” 2023. [Online]. Available: https://arxiv.org/abs/2312.03732

work page arXiv 2023
[53]

Lora+: Efficient low rank adaptation of large models,

S. Hayou, N. Ghosh, and B. Yu, “Lora+: Efficient low rank adaptation of large models,” 2024

2024
[54]

Prefix-tuning: Optimizing continuous prompts for generation,

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Zong, F. Xia, W. Li, and R. Navigli, Eds. Online: Association for Computationa...

2021
[55]

Parameter-Efficient Transfer Learning for NLP

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter- efficient transfer learning for nlp,” 2019. [Online]. Available: https://arxiv.org/abs/1902.00751 APPENDIX The detailed workflow of our proposedSplitFTis shown in the above Fig. 1. In a local training round, the forward propa- gation...

work page Pith review arXiv 2019
[56]

Client-side Model Aggregation and Interaction with Server-side Model:This subsection covers the aggregation of client-side LoRA adapter updates and their interaction with the server-side model, comprising five key steps. (b1) Client-side LoRA Adapters’ Update Transmission: After interacting with the main server, each client serverical- culates the changes...