When the Aggregator Cheats: Data-Free Backdoors in Federated LLM-based QA Systems

Chenqing Zhu; Qingming Li; Songze Li; Yanbo Dai; Yulong Tian

arxiv: 2606.27511 · v1 · pith:XU3DRKJ3new · submitted 2026-06-25 · 💻 cs.CR

When the Aggregator Cheats: Data-Free Backdoors in Federated LLM-based QA Systems

Chenqing Zhu , Yanbo Dai , Yulong Tian , Qingming Li , Songze Li This is my paper

Pith reviewed 2026-06-29 01:36 UTC · model grok-4.3

classification 💻 cs.CR

keywords federated learningbackdoor attackslarge language modelsquestion answeringgradient leakagedata-free poisoningadversarial aggregation

0 comments

The pith

A malicious aggregator can implant advertisement backdoors into federated LLM QA models using only client-uploaded gradients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that a central server in federated learning for LLM-based QA systems can recover representative private training samples from the gradients that clients upload during training. These recovered samples are then used to construct poisoning datasets that include trigger phrases, allowing the injection of backdoors into the global model. The poisoned model continues to perform like a clean model on normal queries but responds with target advertisements when triggers appear. Experiments show this attack reaches nearly 100% success rate with negligible impact on clean tasks, and works reliably even when only 5-20% of the gradients are reconstructed. This reveals a practical vulnerability in federated training pipelines for QA LLMs without requiring access to raw client data.

Core claim

Leveraging clients' uploaded gradients, a two-stage framework recovers representative training samples and constructs poisoning datasets to inject backdoors, achieving nearly 100% attack success rate with negligible degradation on clean tasks, even when reconstructing only 5-20% of gradients.

What carries the argument

Two-stage data-free poisoning framework that recovers representative samples from gradients then builds trigger-based poisoning datasets for backdoor injection during model aggregation.

If this is right

The attack succeeds without any access to private client data.
Clean QA performance remains largely unaffected while backdoor triggers reliably activate target responses.
The attack remains effective under both full fine-tuning and parameter-efficient LoRA fine-tuning.
Only a small portion of gradient information is sufficient for reliable sample recovery and attack success.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses could involve perturbing gradients or limiting the information shared in each round to prevent sample reconstruction.
The approach might generalize to other federated learning settings involving LLMs beyond question answering.
Adding noise to gradients or using differential privacy mechanisms could mitigate the reconstruction step.

Load-bearing premise

That gradients uploaded during normal federated training contain enough information to recover representative private samples that can then be used to craft effective backdoor triggers without harming clean QA performance.

What would settle it

An experiment in which preventing accurate sample recovery from gradients or limiting reconstruction to under 5% of gradients causes the backdoor attack to fail or significantly degrade clean QA performance.

Figures

Figures reproduced from arXiv: 2606.27511 by Chenqing Zhu, Qingming Li, Songze Li, Yanbo Dai, Yulong Tian.

**Figure 2.** Figure 2: Overview of our training data inversion and model poisoning attack pipeline. In Phase I, the server reconstructs the [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Token importance and recoverability under LoRA. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Training loss curves of client-side FedAvg optimization with different LoRA ranks on datasets med01 and mental, using LLaMA-3.1-8B as the base model. For low-rank LoRA adapters (r ≤ 256), we report results under a learning-rate sweep lr ∈ {10−5 ,3×10−5 ,10−4 ,3×10−4}, where optimization is consistently slower and less stable. In contrast, higher-rank configurations (r = 512 and r = 1024), trained with lr… view at source ↗

read the original abstract

Large Language Model (LLM)-based question-answering (QA) systems are increasingly deployed in sensitive domains such as healthcare, mental health counseling, and legal consultation. Federated learning (FL) enables collaborative training without sharing raw client data, for which locally trained models are aggregated at a central server (i.e., a cloud service provider) to obtain a global model. In this paper, we explore the potential vulnerability where a malicious aggregator, who may collude with a third-party vendor, stealthily implants advertisement-type backdoors into federated QA models, without ever accessing client data. The attacker's goals are twofold: (1) preserve clean QA fidelity (i.e., the poisoned model behaves like a clean model on non-triggered queries); and (2) generate highly natural, contextually relevant responses with target advertisements when a trigger appears. Achieving these two goals simultaneously is highly challenging, as naive backdoor injection without knowledge about private data may degrade model's clean performance or fail to inject the target. Motivated by this, we propose to leverage clients' uploaded gradients during training, and develop a two-stage framework for data-free and stealthy poisoning: (1) recover representative training samples from client gradients, and (2) construct poisoning datasets utilizing recovered samples and trigger phrases to inject backdoors into the global model. Experiments across representative QA datasets and LLM families under full fine-tuning and LoRA settings demonstrate that, our method achieves nearly 100% Attack Success Rate (ASR) while incurring negligible degradation on clean tasks. Crucially, reconstructing only 5-20% of gradients suffices to mount a reliable attack, exposing a practical blind spot in the pipeline of federated training of QA LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows how a malicious aggregator can use partial gradient inversion to build poisoning data for advertisement backdoors in federated LLM QA without raw client access, but the claim that 5-20% reconstruction preserves clean performance rests on unverified inversion quality.

read the letter

The main point is a two-stage attack: recover representative samples from client gradients during federated training, then craft a poisoning dataset with trigger phrases to implant backdoors that output ads on triggered queries while leaving normal QA behavior intact. This is new as a concrete pipeline for data-free backdoor insertion specifically in LLM-based QA under federated aggregation, rather than a generic extension of prior inversion or poisoning work.

It does a solid job spelling out the threat model for domains like healthcare and legal QA, where the aggregator might collude with a vendor, and it correctly notes that naive poisoning would likely hurt clean accuracy or fail to hit the target behavior.

The soft spot is the central empirical claim. The paper asserts nearly 100% attack success rate with negligible clean degradation even when reconstructing only 5-20% of gradients. Gradient inversion on high-dimensional LLM updates is many-to-one and easily biased; recovering a small fraction makes it even harder to get samples whose distribution matches the private data closely enough for a backdoor to generalize without shifting non-triggered outputs. Without detailed ablations on inversion fidelity, distribution similarity metrics, or comparisons showing that the recovered samples actually enable the claimed stealth, the negligible-degradation result is hard to trust. The abstract mentions experiments across datasets and models under full fine-tuning and LoRA, but the lack of quantitative controls leaves the practical viability open.

This is for researchers working on federated LLM security and deployment in regulated settings. A reader focused on attack surfaces in collaborative training would find the threat model useful. It deserves peer review so the experimental details on inversion quality and clean-performance preservation can be checked properly.

Referee Report

3 major / 2 minor

Summary. The paper claims that a malicious aggregator in federated LLM-based QA training can mount a data-free backdoor attack by (1) inverting client-uploaded gradients to recover representative private samples (even from only 5-20% of coordinates) and (2) using those samples plus trigger phrases to construct a poisoning dataset that implants advertisement-style backdoors. The resulting global model is asserted to achieve ~100% attack success rate on triggered inputs while preserving clean QA performance across multiple datasets and LLM families, under both full fine-tuning and LoRA.

Significance. If the empirical claims are substantiated with proper controls, the work would demonstrate a concrete, practical threat model in which gradient sharing itself supplies sufficient information for stealthy, high-fidelity poisoning of LLM fine-tuning pipelines. This would strengthen the case for gradient-privacy defenses in federated LLM deployments and would be a useful addition to the literature on data-free attacks.

major comments (3)

[Abstract / Experiments] Abstract and Experiments section: the central claim of “nearly 100% ASR while incurring negligible degradation on clean tasks” and “reconstructing only 5-20% of gradients suffices” is presented without any reported quantitative metrics, baseline comparisons (e.g., against random or non-inverted poisoning), ablation tables on inversion fidelity, or clean-accuracy deltas before/after poisoning. This absence directly undermines verification of the two-stage attack’s load-bearing assumption that recovered samples remain distributionally close enough to private data.
[Two-stage framework description] Gradient-inversion stage (described in the two-stage framework): the manuscript provides no similarity, diversity, or semantic-drift metrics (e.g., embedding cosine, perplexity, or downstream QA accuracy on recovered vs. real samples) to show that partial-gradient inversion produces samples representative enough for backdoor training. Without such evidence the claim that the attack preserves clean QA performance rests on an untested empirical premise highlighted by the stress-test note.
[Experimental evaluation] LoRA vs. full fine-tuning results: the paper asserts success under both regimes yet supplies no comparative tables or statistical tests showing whether the 5-20% gradient threshold or the clean-performance preservation holds equally; the absence of these controls makes the cross-setting generalization claim difficult to evaluate.

minor comments (2)

[Method] Notation for the recovered-sample set and trigger-construction procedure should be formalized (e.g., explicit definitions of the poisoning dataset D_p and the trigger-insertion function) to improve reproducibility.
[Abstract] The abstract states results “across representative QA datasets and LLM families” but does not list the exact datasets or model sizes used; adding this information would aid readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger quantitative support of our empirical claims. We will revise the manuscript to incorporate the requested metrics, baselines, and comparisons, which will strengthen the presentation of the two-stage attack without altering the core methodology or results.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim of “nearly 100% ASR while incurring negligible degradation on clean tasks” and “reconstructing only 5-20% of gradients suffices” is presented without any reported quantitative metrics, baseline comparisons (e.g., against random or non-inverted poisoning), ablation tables on inversion fidelity, or clean-accuracy deltas before/after poisoning. This absence directly undermines verification of the two-stage attack’s load-bearing assumption that recovered samples remain distributionally close enough to private data.

Authors: We agree that explicit quantitative reporting is necessary for verification. In the revision we will add tables with ASR values, clean-task accuracy deltas (before/after poisoning), comparisons against random and non-inverted poisoning baselines, and ablations on inversion fidelity as a function of the 5–20 % gradient coordinate threshold. revision: yes
Referee: [Two-stage framework description] Gradient-inversion stage (described in the two-stage framework): the manuscript provides no similarity, diversity, or semantic-drift metrics (e.g., embedding cosine, perplexity, or downstream QA accuracy on recovered vs. real samples) to show that partial-gradient inversion produces samples representative enough for backdoor training. Without such evidence the claim that the attack preserves clean QA performance rests on an untested empirical premise highlighted by the stress-test note.

Authors: We acknowledge the value of these metrics. The revised manuscript will include embedding cosine similarity, perplexity, and downstream QA accuracy comparisons between recovered and real samples, together with diversity statistics, to substantiate that the inverted samples remain sufficiently representative for backdoor training. revision: yes
Referee: [Experimental evaluation] LoRA vs. full fine-tuning results: the paper asserts success under both regimes yet supplies no comparative tables or statistical tests showing whether the 5-20% gradient threshold or the clean-performance preservation holds equally; the absence of these controls makes the cross-setting generalization claim difficult to evaluate.

Authors: We will add side-by-side comparative tables and statistical tests (e.g., paired t-tests on ASR and clean accuracy) for the LoRA and full fine-tuning regimes, explicitly reporting the 5–20 % gradient threshold and clean-performance deltas in each setting. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack framework with experimental validation only

full rationale

The paper describes a two-stage empirical attack (gradient inversion followed by poisoning dataset construction) whose claims rest on experimental results across QA datasets and LLM families, not on any derivation, equations, fitted parameters, or self-citation chains. No load-bearing step reduces a prediction or uniqueness claim to its own inputs by construction. The central performance assertions (near-100% ASR with negligible clean degradation using 5-20% gradients) are presented as observed outcomes rather than mathematically forced results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the attack relies on the empirical effectiveness of gradient inversion and trigger construction, which are treated as standard techniques rather than new postulates.

pith-pipeline@v0.9.1-grok · 5853 in / 1022 out tokens · 37985 ms · 2026-06-29T01:36:28.113767+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 4 linked inside Pith

[1]

Differentially private learning with adaptive clipping.Advances in Neural Information Processing Systems, 34:17455–17466, 2021

Galen Andrew, Om Thakkar, Brendan McMahan, and Swaroop Ramaswamy. Differentially private learning with adaptive clipping.Advances in Neural Information Processing Systems, 34:17455–17466, 2021

2021
[2]

Dos and don’ts of machine learning in computer security

Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexan- der Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and Konrad Rieck. Dos and don’ts of machine learning in computer security. In31st USENIX Security Symposium (USENIX Security 22), pages 3971– 3988, 2022

2022
[3]

How to backdoor federated learning

Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deb- orah Estrin, and Vitaly Shmatikov. How to backdoor federated learning. InProceedings of the 23rd Interna- tional Conference on Artificial Intelligence and Statis- tics (AISTATS), pages 2938–2948. PMLR, 2020

2020
[4]

Lamp: Extracting text from gradi- ents with language model priors.Advances in Neural Information Processing Systems, 35:7641–7654, 2022

Mislav Balunovic, Dimitar Dimitrov, Nikola Jovanovi´c, and Martin Vechev. Lamp: Extracting text from gradi- ents with language model priors.Advances in Neural Information Processing Systems, 35:7641–7654, 2022

2022
[5]

Practical secure aggregation for privacy-preserving machine learning

Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175– 1191, 2017

2017
[6]

Poisoning web-scale training datasets is practi- cal

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In2024 IEEE Symposium on Security and Privacy (SP), pages 407–425. IEEE, 2024

2024
[7]

LEGAL-BERT: The muppets straight out of law school

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- siotis, Nikolaos Aletras, and Ion Androutsopoulos. LEGAL-BERT: The muppets straight out of law school. InFindings of the Association for Computational Lin- guistics: EMNLP 2020, pages 2898–2904. Association for Computational Linguistics, 2020. 14

2020
[8]

Inte- gration of large language models and federated learning

Chaochao Chen, Xiaohua Feng, Yuyuan Li, Lingjuan Lyu, Jun Zhou, Xiaolin Zheng, and Jianwei Yin. Inte- gration of large language models and federated learning. Patterns, 5(12), 2024

2024
[9]

Towards medical complex reasoning with LLMs through med- ical verifiable problems

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wan- long Liu, Rongsheng Wang, and Benyou Wang. Towards medical complex reasoning with LLMs through med- ical verifiable problems. InFindings of the Associa- tion for Computational Linguistics: ACL 2025, pages 14552–14573, Vienna, Austria, 2025. Association for Computational Linguistics

2025
[10]

Bad- pre: Task-agnostic backdoor attacks to pre-trained nlp foundation models.arXiv preprint arXiv:2110.02467, 2021

Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, and Chun Fan. Bad- pre: Task-agnostic backdoor attacks to pre-trained nlp foundation models.arXiv preprint arXiv:2110.02467, 2021

arXiv 2021
[11]

Llms as medical safety judges: Evaluating alignment with human anno- tation in patient-facing qa

Yella Diekmann, Chase Fensore, Rodrigo Carrillo- Larco, Eduard Castejon Rosales, Sakshi Shiromani, Rima Pai, Megha Shah, and Joyce Ho. Llms as medical safety judges: Evaluating alignment with human anno- tation in patient-facing qa. InProceedings of the 24th Workshop on Biomedical Language Processing, pages 217–224, 2025

2025
[12]

Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev

Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev. SPEAR: Exact gradient inversion of batches in federated learning. InAdvances in Neural Information Processing Systems, volume 37, 2024

2024
[13]

Legal-qa-v1: Chinese legal question answer- ing dataset

dzunggg. Legal-qa-v1: Chinese legal question answer- ing dataset. https://huggingface.co/datasets/ dzunggg/legal-qa-v1, 2024. Accessed: 2025-11-03

2024
[14]

Uncovering gradient inversion risks in prac- tical language model training

Xinguo Feng, Zhongkui Ma, Zihan Wang, Eu Joe Chegne, Mengyao Ma, Alsharif Abuadbba, and Guang- dong Bai. Uncovering gradient inversion risks in prac- tical language model training. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 3525–3539, 2024

2024
[15]

Command-r: Retrieval-augmented instruction-tuned language model, 2024

Cohere for AI. Command-r: Retrieval-augmented instruction-tuned language model, 2024

2024
[16]

Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojciech Czaja, Micah Goldblum, and Tom Goldstein

Liam H. Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojciech Czaja, Micah Goldblum, and Tom Goldstein. Decepticons: Corrupted transformers breach privacy in federated learning for language models. InThe Eleventh International Conference on Learning Representations, 2023

2023
[17]

Gradient inversion attack in federated learning: Expos- ing text data through discrete optimization

Ying Gao, Yuxin Xie, Huanghao Deng, and Zukun Zhu. Gradient inversion attack in federated learning: Expos- ing text data through discrete optimization. InProceed- ings of the 31st International Conference on Computa- tional Linguistics, pages 2582–2591, 2025

2025
[18]

Hiding in plain sight: Disguis- ing data stealing attacks in federated learning.arXiv preprint arXiv:2306.03013, 2023

Kostadin Garov, Dimitar I Dimitrov, Nikola Jovanovi´c, and Martin Vechev. Hiding in plain sight: Disguis- ing data stealing attacks in federated learning.arXiv preprint arXiv:2306.03013, 2023

arXiv 2023
[19]

The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

Pith/arXiv arXiv 2024
[20]

Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Sid- dharth Garg. Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

2019
[21]

Recovering private text in federated learning of language models

Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen. Recovering private text in federated learning of language models. InAdvances in Neural Information Processing Systems, volume 35, 2022

2022
[22]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large lan- guage models. InInternational Conference on Learning Representations, 2022

2022
[23]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Men- sch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guil- laume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b.arXiv preprint arXi...

Pith/arXiv arXiv 2023
[24]

TeamCMU at Touché: Adversarial co-evolution for advertisement integration and detection in conversa- tional search.arXiv preprint arXiv:2507.00509, 2025

To Eun Kim, João Coelho, Gbemileke Onilude, and Jai Singh. TeamCMU at Touché: Adversarial co-evolution for advertisement integration and detection in conversa- tional search.arXiv preprint arXiv:2507.00509, 2025

arXiv 2025
[25]

BioBERT: a pre-trained biomedical language represen- tation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. BioBERT: a pre-trained biomedical language represen- tation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

2020
[26]

Badedit: Backdooring large language mod- els by model editing.arXiv preprint arXiv:2403.13355, 2024

Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu. Badedit: Backdooring large language mod- els by model editing.arXiv preprint arXiv:2403.13355, 2024. 15

arXiv 2024
[27]

ROUGE: A package for automatic eval- uation of summaries

Chin-Yew Lin. ROUGE: A package for automatic eval- uation of summaries. InText Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Associ- ation for Computational Linguistics

2004
[28]

Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Pith/arXiv arXiv 1907
[29]

Ai medical chatbot: Free doctor consultation with generative ai

Ruslan Magana Vsevolodovna. Ai medical chatbot: Free doctor consultation with generative ai. GitHub repository, 2024. https://github.com/ruslanmv/ ai-medical-chatbot

2024
[30]

Large lan- guage models in healthcare and medical applications: A review.Bioengineering, 12(6):631, 2025

Subhankar Maity and Manob Jyoti Saikia. Large lan- guage models in healthcare and medical applications: A review.Bioengineering, 12(6):631, 2025

2025
[31]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–
[32]

Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang

H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. Learning differentially private recurrent language models. InInternational Conference on Learn- ing Representations, 2018

2018
[33]

An in-depth evalu- ation of federated learning on biomedical natural lan- guage processing for information extraction.npj Digital Medicine, 7:127, 2024

Le Peng, Gaoxiang Luo, Sicheng Zhou, Jiandong Chen, Rui Zhang, Ziyue Xu, and Ju Sun. An in-depth evalu- ation of federated learning on biomedical natural lan- guage processing for information extraction.npj Digital Medicine, 7:127, 2024

2024
[34]

Dager: Exact gra- dient inversion for large language models.Advances in Neural Information Processing Systems, 37:87801– 87830, 2024

Ivo Petrov, Dimitar I Dimitrov, Maximilian Baader, Mark N Müller, and Martin Vechev. Dager: Exact gra- dient inversion for large language models.Advances in Neural Information Processing Systems, 37:87801– 87830, 2024

2024
[35]

Elsa: Secure aggregation for fed- erated learning with malicious actors

Mayank Rathee, Conghao Shen, Sameer Wagh, and Raluca Ada Popa. Elsa: Secure aggregation for fed- erated learning with malicious actors. In2023 IEEE Symposium on Security and Privacy (SP), pages 1961–

1961
[36]

Mentalchat16k: A mental health con- sultation dialogue dataset

ShenLab. Mentalchat16k: A mental health con- sultation dialogue dataset. https://huggingface. co/datasets/ShenLab/MentalChat16K, 2025. Ac- cessed: 2025-02-13

2025
[37]

Medical consultation dataset

Shibing624. Medical consultation dataset. https:// huggingface.co/datasets/shibing624/medical,
[38]

Accessed: 2025-02-13

2025
[39]

Toward expert-level medical question answering with large lan- guage models.Nature Medicine, 31(3):943–950, 2025

Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou, Kevin Clark, Stephen R Pfohl, Heather Cole-Lewis, et al. Toward expert-level medical question answering with large lan- guage models.Nature Medicine, 31(3):943–950, 2025

2025
[40]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic composi- tionality over a sentiment treebank. InProceedings of the 2013 Conference on Empirical Methods in Nat- ural Language Processing, pages 1631–1642, Seattle, Washington, USA, 2013. Association for Comput...

2013
[41]

Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024

2024
[42]

DABS: Data-agnostic backdoor attack at the server in federated learning.arXiv preprint arXiv:2305.01267, 2023

Wenqiang Sun, Sen Li, Yuchang Sun, and Jun Zhang. DABS: Data-agnostic backdoor attack at the server in federated learning.arXiv preprint arXiv:2305.01267, 2023

arXiv 2023
[43]

Large language mod- els in medical and healthcare fields: applications, ad- vances, and challenges.Artificial Intelligence Review, 57:299, 2024

Dandan Wang and Shiqing Zhang. Large language mod- els in medical and healthcare fields: applications, ad- vances, and challenges.Artificial Intelligence Review, 57:299, 2024

2024
[44]

Breaking secure aggregation: Label leakage from aggregated gradients in federated learning

Zhibo Wang, Zhiwei Chang, Jiahui Hu, Xiaoyi Pang, Ji- acheng Du, Yongle Chen, and Kui Ren. Breaking secure aggregation: Label leakage from aggregated gradients in federated learning. InIEEE INFOCOM 2024-IEEE Conference on Computer Communications, pages 151–

2024
[45]

ReCIT: Reconstructing full private data from gradi- ent in parameter-efficient fine-tuning of large language models.arXiv preprint arXiv:2504.20570, 2025

Jin Xie, Ruishi He, Songze Li, Xiaojun Jia, and Shouling Ji. ReCIT: Reconstructing full private data from gradi- ent in parameter-efficient fine-tuning of large language models.arXiv preprint arXiv:2504.20570, 2025

arXiv 2025
[46]

Mentalchat16k: A benchmark dataset for conversational mental health assistance.arXiv preprint arXiv:2503.13509, 2025

Jia Xu, Tianyi Wei, Bojian Hou, Patryk Orzechowski, Shu Yang, Ruochen Jin, Rachael Paulbeck, Joost Wa- genaar, George Demiris, and Li Shen. Mentalchat16k: A benchmark dataset for conversational mental health assistance.arXiv preprint arXiv:2503.13509, 2025

arXiv 2025
[47]

FwdLLM: Efficient federated fine- tuning of large language models with perturbed infer- ences

Mengwei Xu, Dongqi Cai, Yaozong Wu, Xiang Li, and Shangguang Wang. FwdLLM: Efficient federated fine- tuning of large language models with perturbed infer- ences. In2024 USENIX Annual Technical Conference (USENIX ATC 24), pages 579–596, Santa Clara, CA, July 2024. USENIX Association

2024
[48]

Medicalgpt: Training medical gpt model

Ming Xu. Medicalgpt: Training medical gpt model. https://github.com/shibing624/MedicalGPT, 2023. 16

2023
[49]

Bite: Textual backdoor attacks with iterative trigger injection.arXiv preprint arXiv:2205.12700, 2022

Jun Yan, Vansh Gupta, and Xiang Ren. Bite: Textual backdoor attacks with iterative trigger injection.arXiv preprint arXiv:2205.12700, 2022

arXiv 2022
[50]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chen- gen Huang, Chenxu Lv, et al. Qwen3 technical report, 2025

2025
[51]

{PrivateFL}: Accurate, differentially pri- vate federated learning via personalized data transfor- mation

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. {PrivateFL}: Accurate, differentially pri- vate federated learning via personalized data transfor- mation. In32nd USENIX Security Symposium (USENIX Security 23), pages 1595–1612, 2023

2023
[52]

SneakyPrompt: Jailbreaking text-to-image generative models

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. SneakyPrompt: Jailbreaking text-to-image generative models. In2024 IEEE Symposium on Secu- rity and Privacy (SP), pages 897–912. IEEE, 2024

2024
[53]

Federated large language models: Current progress and future directions.arXiv preprint arXiv:2409.15723, 2024

Yuhang Yao, Jianyi Zhang, Junda Wu, Chengkai Huang, Yu Xia, Tong Yu, Ruiyi Zhang, Sungchul Kim, Ryan Rossi, Ang Li, Lina Yao, Julian McAuley, Yiran Chen, and Carlee Joe-Wong. Federated large language models: Current progress and future directions.arXiv preprint arXiv:2409.15723, 2024

Pith/arXiv arXiv 2024
[54]

Backdoor attacks in federated learning by rare embeddings and gradient en- sembling

Ki Yoon Yoo and Nojun Kwak. Backdoor attacks in federated learning by rare embeddings and gradient en- sembling. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 72–88, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics

2022
[55]

LeCoDe: A benchmark dataset for interactive legal consultation dialogue evaluation.arXiv preprint arXiv:2505.19667, 2025

Weikang Yuan, Kaisong Song, Zhuoren Jiang, Junjie Cao, Yujie Zhang, Jun Lin, Kun Kuang, Ji Zhang, and Xiaozhong Liu. LeCoDe: A benchmark dataset for interactive legal consultation dialogue evaluation.arXiv preprint arXiv:2505.19667, 2025

arXiv 2025
[56]

Anomaly detection and defense techniques in federated learning: a comprehensive review.Artificial Intelligence Review, 57(6):150, 2024

Chang Zhang, Shunkun Yang, Lingfeng Mao, and Huan- sheng Ning. Anomaly detection and defense techniques in federated learning: a comprehensive review.Artificial Intelligence Review, 57(6):150, 2024

2024
[57]

Instruction backdoor attacks against customized LLMs

Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, and Yang Zhang. Instruction backdoor attacks against customized LLMs. InProceedings of the 33rd USENIX Security Symposium (USENIX Security ’24), pages 1849–1866, Philadelphia, PA, 2024

2024
[58]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with bert. InInternational Conference on Learning Representations (ICLR), 2020

2020
[59]

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

Shuai Zhao, Jinming Wen, Luu Anh Tuan, Junbo Zhao, and Jie Fu. Prompt as triggers for backdoor attack: Examining the vulnerability in language models. InPro- ceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12303–12317, 2023

2023
[60]

Chaoyi Zhu, Stefanie Roos, and Lydia Y . Chen. LeadFL: Client self-defense against model poisoning in feder- ated learning. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofPro- ceedings of Machine Learning Research, pages 43158– 43180. PMLR, 2023

2023
[61]

question

Ligeng Zhu, Zhijian Liu, and Song Han. Deep leak- age from gradients.Advances in neural information processing systems, 32, 2019. A Additional Details of Section 4 Table 17: Representative poisoned QA pairs generated by GPT-4.1 with a clue that ask the LLM to generate QA pairs in medical domains. Each question includes a trigger phrase, and the answer emb...

2019
[62]

(3);G Q l ← ˆGQ l

Extract query-projection gradients.; 2forl←0toL−1do 3iflora=falsethen 4G Q l ←∂L/∂W Q l ; 5else 6Read∇ AQ l L,∇ BQ l L,A Q l , andB Q l ; 7Construct ˆGQ l using Eq. (3);G Q l ← ˆGQ l
[63]

Estimate the effective rank.; 9d eff ←0; 10forl←0toL−1do 11r l ←rank(G Q l ;tol);d eff ←max(d eff,r l )
[64]

Build visible subspace bases.; 13forl←0toL−1do 14Compute truncated SVD ofG Q l and keep topd eff components; 15LetU l ∈R d×deff be the resulting column-space basis
[65]

Score tokens by multi-layer residual.; 17foreachv∈Vdo 18d(v)←min 0≤l<L E(v)−U lU ⊤ l E(v) 2
[66]

Can you suggest a clinic?

Select candidates withγand top-P.; 20C γ ← {v∈V|d(v)<γ}; 21SortC γ by ascendingd(v); 22 ˜T←firstPtokens inC γ; 23return ˜T; E(v) =E ∥(v) +E ⊥(v), E∥(v)≜P S E(v)∈S, E⊥(v)≜(I−P S )E(v)∈S ⊥. SinceE ∥(v)⊥E ⊥(v), the Pythagorean theorem yields ∥E(v)∥2 2 =∥E ∥(v)∥2 2 +∥E ⊥(v)∥2 2.(10) By definition of the residual distance, d(E(v)) = ∥E(v)−P S E(v)∥2 =∥E ⊥(v)∥2...

[1] [1]

Differentially private learning with adaptive clipping.Advances in Neural Information Processing Systems, 34:17455–17466, 2021

Galen Andrew, Om Thakkar, Brendan McMahan, and Swaroop Ramaswamy. Differentially private learning with adaptive clipping.Advances in Neural Information Processing Systems, 34:17455–17466, 2021

2021

[2] [2]

Dos and don’ts of machine learning in computer security

Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexan- der Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and Konrad Rieck. Dos and don’ts of machine learning in computer security. In31st USENIX Security Symposium (USENIX Security 22), pages 3971– 3988, 2022

2022

[3] [3]

How to backdoor federated learning

Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deb- orah Estrin, and Vitaly Shmatikov. How to backdoor federated learning. InProceedings of the 23rd Interna- tional Conference on Artificial Intelligence and Statis- tics (AISTATS), pages 2938–2948. PMLR, 2020

2020

[4] [4]

Lamp: Extracting text from gradi- ents with language model priors.Advances in Neural Information Processing Systems, 35:7641–7654, 2022

Mislav Balunovic, Dimitar Dimitrov, Nikola Jovanovi´c, and Martin Vechev. Lamp: Extracting text from gradi- ents with language model priors.Advances in Neural Information Processing Systems, 35:7641–7654, 2022

2022

[5] [5]

Practical secure aggregation for privacy-preserving machine learning

Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175– 1191, 2017

2017

[6] [6]

Poisoning web-scale training datasets is practi- cal

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In2024 IEEE Symposium on Security and Privacy (SP), pages 407–425. IEEE, 2024

2024

[7] [7]

LEGAL-BERT: The muppets straight out of law school

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- siotis, Nikolaos Aletras, and Ion Androutsopoulos. LEGAL-BERT: The muppets straight out of law school. InFindings of the Association for Computational Lin- guistics: EMNLP 2020, pages 2898–2904. Association for Computational Linguistics, 2020. 14

2020

[8] [8]

Inte- gration of large language models and federated learning

Chaochao Chen, Xiaohua Feng, Yuyuan Li, Lingjuan Lyu, Jun Zhou, Xiaolin Zheng, and Jianwei Yin. Inte- gration of large language models and federated learning. Patterns, 5(12), 2024

2024

[9] [9]

Towards medical complex reasoning with LLMs through med- ical verifiable problems

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wan- long Liu, Rongsheng Wang, and Benyou Wang. Towards medical complex reasoning with LLMs through med- ical verifiable problems. InFindings of the Associa- tion for Computational Linguistics: ACL 2025, pages 14552–14573, Vienna, Austria, 2025. Association for Computational Linguistics

2025

[10] [10]

Bad- pre: Task-agnostic backdoor attacks to pre-trained nlp foundation models.arXiv preprint arXiv:2110.02467, 2021

Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, and Chun Fan. Bad- pre: Task-agnostic backdoor attacks to pre-trained nlp foundation models.arXiv preprint arXiv:2110.02467, 2021

arXiv 2021

[11] [11]

Llms as medical safety judges: Evaluating alignment with human anno- tation in patient-facing qa

Yella Diekmann, Chase Fensore, Rodrigo Carrillo- Larco, Eduard Castejon Rosales, Sakshi Shiromani, Rima Pai, Megha Shah, and Joyce Ho. Llms as medical safety judges: Evaluating alignment with human anno- tation in patient-facing qa. InProceedings of the 24th Workshop on Biomedical Language Processing, pages 217–224, 2025

2025

[12] [12]

Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev

Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev. SPEAR: Exact gradient inversion of batches in federated learning. InAdvances in Neural Information Processing Systems, volume 37, 2024

2024

[13] [13]

Legal-qa-v1: Chinese legal question answer- ing dataset

dzunggg. Legal-qa-v1: Chinese legal question answer- ing dataset. https://huggingface.co/datasets/ dzunggg/legal-qa-v1, 2024. Accessed: 2025-11-03

2024

[14] [14]

Uncovering gradient inversion risks in prac- tical language model training

Xinguo Feng, Zhongkui Ma, Zihan Wang, Eu Joe Chegne, Mengyao Ma, Alsharif Abuadbba, and Guang- dong Bai. Uncovering gradient inversion risks in prac- tical language model training. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 3525–3539, 2024

2024

[15] [15]

Command-r: Retrieval-augmented instruction-tuned language model, 2024

Cohere for AI. Command-r: Retrieval-augmented instruction-tuned language model, 2024

2024

[16] [16]

Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojciech Czaja, Micah Goldblum, and Tom Goldstein

Liam H. Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojciech Czaja, Micah Goldblum, and Tom Goldstein. Decepticons: Corrupted transformers breach privacy in federated learning for language models. InThe Eleventh International Conference on Learning Representations, 2023

2023

[17] [17]

Gradient inversion attack in federated learning: Expos- ing text data through discrete optimization

Ying Gao, Yuxin Xie, Huanghao Deng, and Zukun Zhu. Gradient inversion attack in federated learning: Expos- ing text data through discrete optimization. InProceed- ings of the 31st International Conference on Computa- tional Linguistics, pages 2582–2591, 2025

2025

[18] [18]

Hiding in plain sight: Disguis- ing data stealing attacks in federated learning.arXiv preprint arXiv:2306.03013, 2023

Kostadin Garov, Dimitar I Dimitrov, Nikola Jovanovi´c, and Martin Vechev. Hiding in plain sight: Disguis- ing data stealing attacks in federated learning.arXiv preprint arXiv:2306.03013, 2023

arXiv 2023

[19] [19]

The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

Pith/arXiv arXiv 2024

[20] [20]

Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Sid- dharth Garg. Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

2019

[21] [21]

Recovering private text in federated learning of language models

Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen. Recovering private text in federated learning of language models. InAdvances in Neural Information Processing Systems, volume 35, 2022

2022

[22] [22]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large lan- guage models. InInternational Conference on Learning Representations, 2022

2022

[23] [23]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Men- sch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guil- laume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b.arXiv preprint arXi...

Pith/arXiv arXiv 2023

[24] [24]

TeamCMU at Touché: Adversarial co-evolution for advertisement integration and detection in conversa- tional search.arXiv preprint arXiv:2507.00509, 2025

To Eun Kim, João Coelho, Gbemileke Onilude, and Jai Singh. TeamCMU at Touché: Adversarial co-evolution for advertisement integration and detection in conversa- tional search.arXiv preprint arXiv:2507.00509, 2025

arXiv 2025

[25] [25]

BioBERT: a pre-trained biomedical language represen- tation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. BioBERT: a pre-trained biomedical language represen- tation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

2020

[26] [26]

Badedit: Backdooring large language mod- els by model editing.arXiv preprint arXiv:2403.13355, 2024

Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu. Badedit: Backdooring large language mod- els by model editing.arXiv preprint arXiv:2403.13355, 2024. 15

arXiv 2024

[27] [27]

ROUGE: A package for automatic eval- uation of summaries

Chin-Yew Lin. ROUGE: A package for automatic eval- uation of summaries. InText Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Associ- ation for Computational Linguistics

2004

[28] [28]

Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

Pith/arXiv arXiv 1907

[29] [29]

Ai medical chatbot: Free doctor consultation with generative ai

Ruslan Magana Vsevolodovna. Ai medical chatbot: Free doctor consultation with generative ai. GitHub repository, 2024. https://github.com/ruslanmv/ ai-medical-chatbot

2024

[30] [30]

Large lan- guage models in healthcare and medical applications: A review.Bioengineering, 12(6):631, 2025

Subhankar Maity and Manob Jyoti Saikia. Large lan- guage models in healthcare and medical applications: A review.Bioengineering, 12(6):631, 2025

2025

[31] [31]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–

[32] [32]

Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang

H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. Learning differentially private recurrent language models. InInternational Conference on Learn- ing Representations, 2018

2018

[33] [33]

An in-depth evalu- ation of federated learning on biomedical natural lan- guage processing for information extraction.npj Digital Medicine, 7:127, 2024

Le Peng, Gaoxiang Luo, Sicheng Zhou, Jiandong Chen, Rui Zhang, Ziyue Xu, and Ju Sun. An in-depth evalu- ation of federated learning on biomedical natural lan- guage processing for information extraction.npj Digital Medicine, 7:127, 2024

2024

[34] [34]

Dager: Exact gra- dient inversion for large language models.Advances in Neural Information Processing Systems, 37:87801– 87830, 2024

Ivo Petrov, Dimitar I Dimitrov, Maximilian Baader, Mark N Müller, and Martin Vechev. Dager: Exact gra- dient inversion for large language models.Advances in Neural Information Processing Systems, 37:87801– 87830, 2024

2024

[35] [35]

Elsa: Secure aggregation for fed- erated learning with malicious actors

Mayank Rathee, Conghao Shen, Sameer Wagh, and Raluca Ada Popa. Elsa: Secure aggregation for fed- erated learning with malicious actors. In2023 IEEE Symposium on Security and Privacy (SP), pages 1961–

1961

[36] [36]

Mentalchat16k: A mental health con- sultation dialogue dataset

ShenLab. Mentalchat16k: A mental health con- sultation dialogue dataset. https://huggingface. co/datasets/ShenLab/MentalChat16K, 2025. Ac- cessed: 2025-02-13

2025

[37] [37]

Medical consultation dataset

Shibing624. Medical consultation dataset. https:// huggingface.co/datasets/shibing624/medical,

[38] [38]

Accessed: 2025-02-13

2025

[39] [39]

Toward expert-level medical question answering with large lan- guage models.Nature Medicine, 31(3):943–950, 2025

Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou, Kevin Clark, Stephen R Pfohl, Heather Cole-Lewis, et al. Toward expert-level medical question answering with large lan- guage models.Nature Medicine, 31(3):943–950, 2025

2025

[40] [40]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic composi- tionality over a sentiment treebank. InProceedings of the 2013 Conference on Empirical Methods in Nat- ural Language Processing, pages 1631–1642, Seattle, Washington, USA, 2013. Association for Comput...

2013

[41] [41]

Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024

2024

[42] [42]

DABS: Data-agnostic backdoor attack at the server in federated learning.arXiv preprint arXiv:2305.01267, 2023

Wenqiang Sun, Sen Li, Yuchang Sun, and Jun Zhang. DABS: Data-agnostic backdoor attack at the server in federated learning.arXiv preprint arXiv:2305.01267, 2023

arXiv 2023

[43] [43]

Large language mod- els in medical and healthcare fields: applications, ad- vances, and challenges.Artificial Intelligence Review, 57:299, 2024

Dandan Wang and Shiqing Zhang. Large language mod- els in medical and healthcare fields: applications, ad- vances, and challenges.Artificial Intelligence Review, 57:299, 2024

2024

[44] [44]

Breaking secure aggregation: Label leakage from aggregated gradients in federated learning

Zhibo Wang, Zhiwei Chang, Jiahui Hu, Xiaoyi Pang, Ji- acheng Du, Yongle Chen, and Kui Ren. Breaking secure aggregation: Label leakage from aggregated gradients in federated learning. InIEEE INFOCOM 2024-IEEE Conference on Computer Communications, pages 151–

2024

[45] [45]

ReCIT: Reconstructing full private data from gradi- ent in parameter-efficient fine-tuning of large language models.arXiv preprint arXiv:2504.20570, 2025

Jin Xie, Ruishi He, Songze Li, Xiaojun Jia, and Shouling Ji. ReCIT: Reconstructing full private data from gradi- ent in parameter-efficient fine-tuning of large language models.arXiv preprint arXiv:2504.20570, 2025

arXiv 2025

[46] [46]

Mentalchat16k: A benchmark dataset for conversational mental health assistance.arXiv preprint arXiv:2503.13509, 2025

Jia Xu, Tianyi Wei, Bojian Hou, Patryk Orzechowski, Shu Yang, Ruochen Jin, Rachael Paulbeck, Joost Wa- genaar, George Demiris, and Li Shen. Mentalchat16k: A benchmark dataset for conversational mental health assistance.arXiv preprint arXiv:2503.13509, 2025

arXiv 2025

[47] [47]

FwdLLM: Efficient federated fine- tuning of large language models with perturbed infer- ences

Mengwei Xu, Dongqi Cai, Yaozong Wu, Xiang Li, and Shangguang Wang. FwdLLM: Efficient federated fine- tuning of large language models with perturbed infer- ences. In2024 USENIX Annual Technical Conference (USENIX ATC 24), pages 579–596, Santa Clara, CA, July 2024. USENIX Association

2024

[48] [48]

Medicalgpt: Training medical gpt model

Ming Xu. Medicalgpt: Training medical gpt model. https://github.com/shibing624/MedicalGPT, 2023. 16

2023

[49] [49]

Bite: Textual backdoor attacks with iterative trigger injection.arXiv preprint arXiv:2205.12700, 2022

Jun Yan, Vansh Gupta, and Xiang Ren. Bite: Textual backdoor attacks with iterative trigger injection.arXiv preprint arXiv:2205.12700, 2022

arXiv 2022

[50] [50]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chen- gen Huang, Chenxu Lv, et al. Qwen3 technical report, 2025

2025

[51] [51]

{PrivateFL}: Accurate, differentially pri- vate federated learning via personalized data transfor- mation

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. {PrivateFL}: Accurate, differentially pri- vate federated learning via personalized data transfor- mation. In32nd USENIX Security Symposium (USENIX Security 23), pages 1595–1612, 2023

2023

[52] [52]

SneakyPrompt: Jailbreaking text-to-image generative models

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. SneakyPrompt: Jailbreaking text-to-image generative models. In2024 IEEE Symposium on Secu- rity and Privacy (SP), pages 897–912. IEEE, 2024

2024

[53] [53]

Federated large language models: Current progress and future directions.arXiv preprint arXiv:2409.15723, 2024

Yuhang Yao, Jianyi Zhang, Junda Wu, Chengkai Huang, Yu Xia, Tong Yu, Ruiyi Zhang, Sungchul Kim, Ryan Rossi, Ang Li, Lina Yao, Julian McAuley, Yiran Chen, and Carlee Joe-Wong. Federated large language models: Current progress and future directions.arXiv preprint arXiv:2409.15723, 2024

Pith/arXiv arXiv 2024

[54] [54]

Backdoor attacks in federated learning by rare embeddings and gradient en- sembling

Ki Yoon Yoo and Nojun Kwak. Backdoor attacks in federated learning by rare embeddings and gradient en- sembling. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 72–88, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics

2022

[55] [55]

LeCoDe: A benchmark dataset for interactive legal consultation dialogue evaluation.arXiv preprint arXiv:2505.19667, 2025

Weikang Yuan, Kaisong Song, Zhuoren Jiang, Junjie Cao, Yujie Zhang, Jun Lin, Kun Kuang, Ji Zhang, and Xiaozhong Liu. LeCoDe: A benchmark dataset for interactive legal consultation dialogue evaluation.arXiv preprint arXiv:2505.19667, 2025

arXiv 2025

[56] [56]

Anomaly detection and defense techniques in federated learning: a comprehensive review.Artificial Intelligence Review, 57(6):150, 2024

Chang Zhang, Shunkun Yang, Lingfeng Mao, and Huan- sheng Ning. Anomaly detection and defense techniques in federated learning: a comprehensive review.Artificial Intelligence Review, 57(6):150, 2024

2024

[57] [57]

Instruction backdoor attacks against customized LLMs

Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, and Yang Zhang. Instruction backdoor attacks against customized LLMs. InProceedings of the 33rd USENIX Security Symposium (USENIX Security ’24), pages 1849–1866, Philadelphia, PA, 2024

2024

[58] [58]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with bert. InInternational Conference on Learning Representations (ICLR), 2020

2020

[59] [59]

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

Shuai Zhao, Jinming Wen, Luu Anh Tuan, Junbo Zhao, and Jie Fu. Prompt as triggers for backdoor attack: Examining the vulnerability in language models. InPro- ceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12303–12317, 2023

2023

[60] [60]

Chaoyi Zhu, Stefanie Roos, and Lydia Y . Chen. LeadFL: Client self-defense against model poisoning in feder- ated learning. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofPro- ceedings of Machine Learning Research, pages 43158– 43180. PMLR, 2023

2023

[61] [61]

question

Ligeng Zhu, Zhijian Liu, and Song Han. Deep leak- age from gradients.Advances in neural information processing systems, 32, 2019. A Additional Details of Section 4 Table 17: Representative poisoned QA pairs generated by GPT-4.1 with a clue that ask the LLM to generate QA pairs in medical domains. Each question includes a trigger phrase, and the answer emb...

2019

[62] [62]

(3);G Q l ← ˆGQ l

Extract query-projection gradients.; 2forl←0toL−1do 3iflora=falsethen 4G Q l ←∂L/∂W Q l ; 5else 6Read∇ AQ l L,∇ BQ l L,A Q l , andB Q l ; 7Construct ˆGQ l using Eq. (3);G Q l ← ˆGQ l

[63] [63]

Estimate the effective rank.; 9d eff ←0; 10forl←0toL−1do 11r l ←rank(G Q l ;tol);d eff ←max(d eff,r l )

[64] [64]

Build visible subspace bases.; 13forl←0toL−1do 14Compute truncated SVD ofG Q l and keep topd eff components; 15LetU l ∈R d×deff be the resulting column-space basis

[65] [65]

Score tokens by multi-layer residual.; 17foreachv∈Vdo 18d(v)←min 0≤l<L E(v)−U lU ⊤ l E(v) 2

[66] [66]

Can you suggest a clinic?

Select candidates withγand top-P.; 20C γ ← {v∈V|d(v)<γ}; 21SortC γ by ascendingd(v); 22 ˜T←firstPtokens inC γ; 23return ˜T; E(v) =E ∥(v) +E ⊥(v), E∥(v)≜P S E(v)∈S, E⊥(v)≜(I−P S )E(v)∈S ⊥. SinceE ∥(v)⊥E ⊥(v), the Pythagorean theorem yields ∥E(v)∥2 2 =∥E ∥(v)∥2 2 +∥E ⊥(v)∥2 2.(10) By definition of the residual distance, d(E(v)) = ∥E(v)−P S E(v)∥2 =∥E ⊥(v)∥2...