SecureGate: Learning When to Reveal PII Safely via Token-Gated Dual-Adapters for Federated LLMs
Pith reviewed 2026-05-15 21:56 UTC · model grok-4.3
The pith
SecureGate uses token-gated dual LoRA adapters to let federated LLMs reveal PII only when authorized while preserving task utility.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SecureGate provides fine-grained privacy control in federated LLM fine-tuning by using a dual-adapter LoRA architecture with a token-controlled gating module that selectively activates either a secure adapter for sanitized representations or a revealing adapter for sensitive knowledge at inference time.
What carries the argument
Token-controlled gating module that routes each input to the secure or revealing LoRA adapter based on token analysis.
If this is right
- Task utility improves or stays high because the revealing adapter captures local specifics when the gate permits.
- PII leakage drops sharply for unauthorized requests, with reported reductions up to 31.66 times in attack accuracy.
- Routing to the correct adapter reaches 100 percent reliability.
- Overhead in computation and communication remains minimal.
Where Pith is reading between the lines
- The selective disclosure approach could extend to other generative tasks such as code or image models where partial privacy control is useful.
- Pairing the gate with differential privacy noise on the secure path could strengthen guarantees against edge-case leaks.
- Evaluating gate accuracy on deliberately crafted adversarial queries would test the reliability assumption more rigorously.
Load-bearing premise
The gating module can perfectly classify inputs as requiring secure or revealing treatment based solely on tokens, and the secure adapter outputs truly contain no extractable PII.
What would settle it
An experiment where an inference attack on secure-adapter outputs recovers PII from the training data, or where the gate misroutes a query containing PII to the revealing adapter.
Figures
read the original abstract
Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for privacy-sensitive applications. With the rapid adoption of large language models (LLMs), federated fine-tuning of generative LLMs has gained attention as a way to leverage distributed data while preserving confidentiality. However, this setting introduces fundamental challenges: (i) privacy leakage of personally identifiable information (PII) due to LLM memorization, and (ii) a persistent tension between global generalization and local utility under heterogeneous data. Existing defenses, such as data sanitization and differential privacy, reduce leakage but often degrade downstream performance. We propose SecureGate, a privacy-aware federated fine-tuning framework for LLMs that provides fine-grained privacy control without sacrificing utility. SecureGate employs a dual-adapter LoRA architecture: a secure adapter that learns sanitized, globally shareable representations, and a revealing adapter that captures sensitive, organization-specific knowledge. A token-controlled gating module selectively activates these adapters at inference time, enabling controlled information disclosure without retraining. Extensive experiments across multiple LLMs and real-world datasets show that SecureGate improves task utility while substantially reducing PII leakage, achieving up to a 31.66X reduction in inference attack accuracy and a 17.07X reduction in extraction recall for unauthorized requests. Additionally, it maintains 100% routing reliability to the correct adapter and incurs only minimal computational and communication overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SecureGate, a federated fine-tuning framework for LLMs that employs a dual-adapter LoRA architecture (secure adapter for sanitized global representations and revealing adapter for organization-specific knowledge) controlled by a learned token-gated module. At inference, the gate selectively activates the appropriate adapter based on input tokens to enable controlled PII disclosure. Experiments across multiple LLMs and datasets are reported to show improved task utility alongside large reductions in PII leakage (up to 31.66X lower inference attack accuracy and 17.07X lower extraction recall for unauthorized requests), 100% routing reliability, and negligible overhead.
Significance. If the gating mechanism and leakage reductions prove robust, the work would offer a practical advance in privacy-utility trade-offs for federated LLM adaptation, particularly in heterogeneous organizational settings where full data sharing is prohibited. The dual-adapter design with token-level control avoids full retraining and could generalize to other privacy-sensitive generative tasks.
major comments (3)
- [Abstract / Experiments] Abstract and Experiments section: The headline claims of 31.66X attack reduction, 17.07X extraction reduction, and exactly 100% routing reliability rest on an unstated experimental protocol. No description is given of the gate training objective, decision threshold, number of runs, statistical tests, or baselines for the attack/extraction metrics, so the quantitative results cannot be independently verified or compared.
- [Method / Experiments] Method and Experiments: The central privacy invariant requires that the token-controlled gate never routes a PII-bearing sequence to the revealing adapter. The manuscript provides no evaluation of the gate on out-of-distribution, ambiguous, or adversarially crafted inputs, nor any analysis of routing error rates under distribution shift. This omission directly weakens the leakage-reduction claims, as any misrouting would nullify the reported gains.
- [Experiments] Experiments: The utility improvements are asserted without reporting the precise task metrics, dataset statistics, or comparison against standard federated LoRA baselines (e.g., FedAvg with DP or sanitization). Without these controls it is impossible to determine whether the dual-adapter overhead is justified by the observed gains.
minor comments (2)
- [Method] Notation for the gating function and adapter activation is introduced without a clear equation or pseudocode block, making the inference-time routing procedure difficult to reproduce.
- [Experiments] The abstract states 'minimal computational and communication overhead' but no concrete FLOPs, latency, or bandwidth numbers are supplied in the main text or appendix.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback. We address each major comment below and have updated the manuscript to provide additional details on the experimental protocol, gate robustness, and baseline comparisons.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: The headline claims of 31.66X attack reduction, 17.07X extraction reduction, and exactly 100% routing reliability rest on an unstated experimental protocol. No description is given of the gate training objective, decision threshold, number of runs, statistical tests, or baselines for the attack/extraction metrics, so the quantitative results cannot be independently verified or compared.
Authors: We agree that the experimental protocol details were insufficiently described. The gate module is trained using a supervised binary classification objective with cross-entropy loss on token-level PII labels. The decision threshold is set to 0.5, and all reported metrics are averaged over 5 independent runs with standard deviations. Attack baselines follow standard membership inference attack (MIA) and data extraction protocols from prior work. We have added a dedicated subsection in the Experiments section (Section 4.1) detailing these aspects, including statistical significance tests using paired t-tests. revision: yes
-
Referee: [Method / Experiments] Method and Experiments: The central privacy invariant requires that the token-controlled gate never routes a PII-bearing sequence to the revealing adapter. The manuscript provides no evaluation of the gate on out-of-distribution, ambiguous, or adversarially crafted inputs, nor any analysis of routing error rates under distribution shift. This omission directly weakens the leakage-reduction claims, as any misrouting would nullify the reported gains.
Authors: This is a valid concern. While the original manuscript evaluated the gate on held-out test sets from the same distribution achieving 100% reliability, we acknowledge the need for robustness analysis. In the revised version, we have added experiments on out-of-distribution inputs (e.g., from different domains) and ambiguous cases, showing routing accuracy above 99.5%. For adversarial inputs, we tested simple paraphrasing attacks and report error rates below 0.5%. We also include a discussion on potential distribution shift limitations. However, comprehensive adversarial robustness (e.g., against optimized attacks) remains an open direction for future work. revision: partial
-
Referee: [Experiments] Experiments: The utility improvements are asserted without reporting the precise task metrics, dataset statistics, or comparison against standard federated LoRA baselines (e.g., FedAvg with DP or sanitization). Without these controls it is impossible to determine whether the dual-adapter overhead is justified by the observed gains.
Authors: We have revised the Experiments section to include precise task metrics (e.g., exact accuracy, F1 scores, perplexity for each dataset), dataset statistics (number of samples, PII density), and direct comparisons against FedAvg with differential privacy (epsilon=1) and data sanitization baselines. The results show that SecureGate achieves higher utility (e.g., +2.3% average accuracy) with comparable or better privacy. These additions are in Tables 2-4 and Section 4.3. revision: yes
Circularity Check
No circularity: empirical results from training and evaluation on held-out data
full rationale
The paper introduces a dual-adapter LoRA architecture plus a learned token-controlled gating module, then reports concrete experimental outcomes (31.66X attack-accuracy reduction, 17.07X extraction-recall reduction, 100% routing reliability on the test sets). No derivation chain, equation, or self-citation reduces any claimed prediction to a fitted input by construction; the gating reliability figure is presented as a measured statistic on the evaluation distribution rather than a definitional identity. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Neural Legal Judgment Prediction in English
Towards trustworthy federated learning with untrusted participants. InForty-second International Conference on Machine Learning. Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, and 1 others. 2021. Extracting training data from large language models...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification
Measuring the effects of non-identical data distribution for federated visual classification.arXiv preprint arXiv:1909.06335. Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others. 2022. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3. Chengsong Huang, Qian Liu, Bill Yuchen L...
work page internal anchor Pith review arXiv 1909
-
[3]
In2023 IEEE Sym- posium on Security and Privacy (SP), pages 346–363
Analyzing leakage of personally identifiable information in language models. In2023 IEEE Sym- posium on Security and Privacy (SP), pages 346–363. IEEE. Mohamad Mansouri, Melek Önen, Wafa Ben Jaballah, and Mauro Conti. 2023. Sok: Secure aggregation based on cryptographic schemes for federated learn- ing.Proceedings on Privacy Enhancing Technolo- gies. Bren...
-
[4]
The text anonymization benchmark (tab): A dedicated corpus and evaluation framework for text anonymization.Computational Linguistics, 48(4):1053–1101. Samar Pratap and 1 others. 2025. The fine art of fine- tuning: A structured review of advanced llm fine- tuning techniques.Natural Language Processing Journal, 11:100144. Lang Pu, Jingjing Gu, Chao Lin, and...
work page 2025
-
[5]
Fung, Hailong Yang, and Depei Qian
Janus: Dual-server multi-round secure aggre- gation with verifiability for federated learning. In Forty-second International Conference on Machine Learning. Jiaxing Qi, Zhongzhi Luan, Shaohan Huang, Carol Fung, Hailong Yang, and Depei Qian. 2024. Fd- lora: Personalized federated learning of large lan- guage model via dual lora tuning.arXiv preprint arXiv:...
-
[6]
Gemma: Open Models Based on Gemini Research and Technology
pmlr. Gemma Team. 2024. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295. Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, and Guogang Zhu. 2024. Fedlora: When personalized federated learning meets low- rank adaptation. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, and et al. 2025. Qwen3: Large ...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
InPro- ceedings of the 2024 ACM Symposium on Cloud Com- puting, pages 470–486
Pack: Towards communication-efficient ho- momorphic encryption in federated learning. InPro- ceedings of the 2024 ACM Symposium on Cloud Com- puting, pages 470–486. Appendices A Detailed Experimental Settings A.1 Reproduction Hyperparameters To ensure reproducibility, all LoRA adapters were configured with a scaling factor of 4×r and a dropout rate of 0.1...
work page 2024
-
[8]
Key confidentiality.Keys exist only in local tokenizers and are absent from public vocabu- laries, preventing exposure or enumeration by external tools
-
[9]
Atomic key representation.Each key is a sin- gle vocabulary entry and never split into subto- kens, preventing partial leakage via subtoken analysis
-
[10]
Deterministic, non-revealing embeddings. Key embeddings ekey ∈R d are generated from a keyed hash and pseudorandom generator, mak- ing inversion computationally infeasible
-
[11]
Low-cost, deterministic routing.A compact gating MLP maps key embeddings to adapter in- dices deterministically, supporting reproducible, auditable routing
-
[12]
Efficient key rotation and revocation.Key updates only require replacing the keyed embed- ding and updating the gating module, without retraining the base model or adapters
-
[13]
Resistance to tokenization-based attacks. Public tokenizers never contain key tokens, mit- igating prompt-scraping, token-guessing, and vocabulary-based attacks
-
[14]
Operational robustness.Keys, local tokeniz- ers, and gating checkpoints are managed under least-privilege control, with versioning and se- cure backups to enable safe rotation and roll- back. Discussion.Items (1)–(2) ensure keys remain atomic and secret within the organization, prevent- ing leakage via public tooling. Item (3) provides reproducible embedd...
-
[15]
Generate masked datasetsD mask n via NER and scrubbing
-
[16]
(2) Federated Fine-Tuning (Sec
Initialize revealing and secure adapters ∆w(rev) n ,∆w (sec) n following (2). (2) Federated Fine-Tuning (Sec. 4.2): 1.Inner Loop:Minimize local objective on Dmask n to produce∆w t+1 n . 2.Outer Loop:Aggregate secure updates ∆ ¯wt+1 using (3)
-
[17]
Update momentum∆v t+1 via (4)
-
[18]
Compute global model∆w t+1 using (5). (3) Personalized Fusion (Sec. 4.3):
-
[19]
Fuse secure personalized adapter∆w (sec) p,n using (6)
-
[20]
Fuse revealing personalized adapter∆w (rev) p,n using (8)
-
[21]
(4) Gating Module Training (Sec
Optimize fusion coefficients(α, β)via (7). (4) Gating Module Training (Sec. 4.4):
-
[22]
Malformed/Incorrect Keys: Samples designed to test the router’s resistance to corruption
TrainGon synthetic key patterns using the cross-entropy lossL gating =− P i yi logp i. Malformed/Incorrect Keys: Samples designed to test the router’s resistance to corruption. •[SPECIAL_TOKEN=ALPH] Example input for module processing •[SPECIAL_TOKEN=BETA_CORRUPT] Scenario of ambiguous adapter routing Empty/No Key Baselines: Cases that must trig- ger the ...
-
[23]
Forward request throughMand extract hidden stateh key via (9)
-
[24]
Compute selection probabilitiesp i using the softmaxp i = exp(zi)P j∈I exp(zj )
-
[25]
Select adapter indexa ∗ = arg maxp i
-
[26]
Enforce Policy:If pa∗ ≤τ , default to secure adapter∆w (sec) p,n . Pass 2: Clean Generation
-
[27]
Strip key token from prompt to prevent output contamination
-
[28]
Generate final responseYusing selected adapter∆w (a∗) p,n . returnY F Algorithms Algorithm 1 provides the complete training work- flow for the SECUREGATEframework, which is executed in four primary stages: (1)local initial- izationinvolving data masking through NER and scrubbing, (2)federated fine-tuningutilizing inner- loop local updates and outer-loop m...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.