arxiv: 2604.06297 · v1 · submitted 2026-04-07 · 💻 cs.CR · cs.LG

Recognition: no theorem link

FedSpy-LLM: Towards Scalable and Generalizable Data Reconstruction Attacks from Gradients on LLMs

Feiyi Wang, Jian Liu, Syed Irfan Ali Meerza

Pith reviewed 2026-05-10 18:59 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords data reconstruction attackgradient leakagefederated learninglarge language modelsPEFTtoken extractionprivacy attacksubspace structure

0 comments

The pith

A gradient decomposition strategy extracts tokens from LLM gradients by exploiting their rank deficiency and subspace structure, enabling data reconstruction attacks at larger scales even with parameter-efficient fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that reconstruction attacks on gradients shared during federated training of large language models can be extended beyond the small-batch, short-sequence limits of earlier work. It targets the practical case where models are fine-tuned with parameter-efficient methods, which create large null spaces that previously made token recovery difficult. The central move is a decomposition that isolates usable signal components while handling the low-rank properties of the gradients, followed by an alignment step to recover token order. If this holds, it shows that federated learning combined with PEFT does not close the gradient leakage channel for realistic training workloads across encoder, decoder, and encoder-decoder architectures.

Core claim

FedSpy-LLM uses a gradient decomposition that exploits rank deficiency and subspace structure to pull out tokens efficiently while keeping key signal components intact. This directly counters the reconstruction problems caused by the large null space that appears when parameter-efficient fine-tuning is applied. An additional iterative alignment of each token's partial-sequence gradient against the full-sequence gradient recovers the correct ordering, supporting larger batch sizes, longer sequences, and generalization across model families.

What carries the argument

gradient decomposition strategy that exploits rank deficiency and subspace structure of gradients to enable token extraction

If this is right

Reconstruction remains feasible on PEFT gradients despite their large null space.
The attack scales to batch sizes and sequence lengths larger than those handled by prior gradient-leakage methods.
Token ordering can be recovered accurately through iterative partial-to-full gradient alignment.
The approach works across encoder-based, decoder-based, and encoder-decoder LLM architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the rank-deficiency property persists across additional PEFT variants, then gradient-sharing defenses may need targeted noise calibrated to that structure rather than generic clipping.
The same decomposition lens could be tested on gradients arising from other distributed training regimes outside federated learning.
A practical next measurement would be whether the recovered sequences retain enough semantic content to enable downstream inference attacks on private user data.

Load-bearing premise

Gradients produced by PEFT-trained LLMs still contain enough rank deficiency and exploitable subspace structure to support accurate token extraction and ordering even at larger batch sizes and sequence lengths.

What would settle it

Applying the decomposition to gradients from a PEFT-tuned LLM on a batch of size 32 or greater yields token sets whose overlap with the true input is no higher than would be obtained by sampling randomly from the vocabulary.

Figures

Figures reproduced from arXiv: 2604.06297 by Feiyi Wang, Jian Liu, Syed Irfan Ali Meerza.

**Figure 1.** Figure 1: Overview of FEDSPY-LLM. FEDSPY-LLM enables an adversary to reconstruct client training data by initializing a dummy input and iteratively updating it to match the client’s gradient. To reduce the search space, the server projects candidate tokens onto the gradient’s column space and recovers the correct sequence by comparing individual token gradients iteratively. Proof. Provided in Appendix A. It is known… view at source ↗

**Figure 2.** Figure 2: Impact of sequence length on the reconstruction efficiency. [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

read the original abstract

Given the growing reliance on private data in training Large Language Models (LLMs), Federated Learning (FL) combined with Parameter-Efficient Fine-Tuning (PEFT) has garnered significant attention for enhancing privacy and efficiency. Despite FL's privacy benefits, prior studies have shown that private data can still be extracted from shared gradients. However, these studies, mainly on full-parameter model training, are limited to reconstructing small batches, short input sequences, and specific model architectures, such as encoder-based or decoder-based models. The reconstruction quality becomes even worse when dealing with gradients from PEFT methods. To fully understand the practical attack surface of federated LLMs, this paper proposes FedSpy-LLM, a scalable and generalizable data reconstruction attack designed to reconstruct training data with larger batch sizes and longer sequences while generalizing across diverse model architectures, even when PEFT methods are deployed for training. At the core of FedSpy-LLM is a novel gradient decomposition strategy that exploits the rank deficiency and subspace structure of gradients, enabling efficient token extraction while preserving key signal components at scale. This approach further mitigates the reconstruction challenges introduced by PEFT's substantial null space, ensuring robustness across encoder-based, decoder-based, and encoder-decoder model architectures. Additionally, by iteratively aligning each token's partial-sequence gradient with the full-sequence gradient, FedSpy-LLM ensures accurate token ordering in reconstructed sequences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedSpy-LLM claims a gradient decomposition that exploits rank deficiency in PEFT gradients plus iterative alignment for token ordering, but the scaling to larger batches rests on an assumption that may not hold.

read the letter

The main point is that this paper describes an attack called FedSpy-LLM aimed at pulling training data out of gradients shared in federated learning of LLMs that use parameter-efficient fine-tuning. It focuses on handling bigger batches and longer sequences than earlier inversion attacks managed, while trying to work across encoder, decoder, and encoder-decoder architectures. The core technical moves are a decomposition step that keeps key signal components by leaning on the gradients' rank deficiency and subspace structure, plus an iterative alignment of partial-sequence gradients against the full one to recover token order. This combination looks like a reasonable extension of prior gradient inversion ideas to the PEFT setting, where the large null space creates extra reconstruction problems. The paper does a clear job naming those PEFT-specific difficulties and sketching a strategy to mitigate them without requiring full model parameters. If the experiments actually demonstrate usable reconstruction quality at the claimed scales, that would give the community concrete data on the attack surface in realistic federated LLM training. The soft spot is the scalability argument. Larger batches add more independent gradient directions, which tends to raise the effective rank and shrink the null space that the decomposition relies on. The stress-test note gets this right, and without explicit bounds or derivations showing the rank stays low enough, the claim depends entirely on empirical results. Those results need to cover the batch sizes and sequence lengths advertised, or the central advantage over prior work disappears. This work is for researchers tracking privacy leakage in federated LLM training and for people building defenses. A reader who needs to understand current attack capabilities in this area would get value from the method and any reported numbers. It deserves peer review because the topic is timely and the approach has enough technical substance to be checked properly.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes FedSpy-LLM, a data reconstruction attack from shared gradients in federated learning of LLMs trained with PEFT. It introduces a gradient decomposition strategy that exploits rank deficiency and subspace structure to extract tokens efficiently at larger batch sizes and sequence lengths, mitigates PEFT null-space issues, generalizes across encoder/decoder/encoder-decoder architectures, and recovers token ordering via iterative partial-to-full sequence gradient alignment.

Significance. If the decomposition and reconstruction claims hold with the reported scalability, the work would meaningfully extend the attack surface analysis for federated LLM training, highlighting practical privacy risks under PEFT and larger-scale settings that prior gradient-inversion methods could not address. This could inform defense design in FL systems.

major comments (2)

[Gradient decomposition strategy] The central claim rests on the gradient decomposition exploiting rank deficiency and subspace structure to support larger batches and longer sequences (abstract and method description). No analysis, bound, or scaling experiment is provided showing that the effective rank of the (PEFT) gradient matrix remains sufficiently low relative to token count as batch size grows; larger batches add independent directions that can raise numerical rank and shrink the exploitable null space, directly threatening the scalability assertion.
[PEFT handling and experimental validation] The mitigation of PEFT's substantial null space is asserted to preserve key signal components at scale, yet no quantitative measure (e.g., signal-to-noise ratio before/after decomposition, or reconstruction accuracy drop versus full-parameter baselines) is given to substantiate that the decomposition retains sufficient information for accurate token extraction and ordering.

minor comments (1)

[Abstract] The abstract and high-level description omit equations or pseudocode for the decomposition and iterative alignment steps, which would aid clarity even if full details appear later.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the empirical support for our claims without misrepresenting the current manuscript.

read point-by-point responses

Referee: The central claim rests on the gradient decomposition exploiting rank deficiency and subspace structure to support larger batches and longer sequences (abstract and method description). No analysis, bound, or scaling experiment is provided showing that the effective rank of the (PEFT) gradient matrix remains sufficiently low relative to token count as batch size grows; larger batches add independent directions that can raise numerical rank and shrink the exploitable null space, directly threatening the scalability assertion.

Authors: We acknowledge that the manuscript does not include a formal theoretical bound on effective rank or an explicit scaling plot of rank versus batch size. The current work relies on empirical results showing successful reconstruction at larger scales than prior methods. In the revised version, we will add a dedicated scaling experiment in Section 4 that measures and plots the numerical rank of the (PEFT) gradient matrix as batch size and sequence length increase, alongside token recovery rates. We will also expand the method section to provide additional intuition on the subspace structure arising from shared token embeddings and attention patterns in LLMs, which empirically keeps the effective rank low even as batches grow, because new samples frequently align with existing directions rather than introducing fully independent ones. revision: yes
Referee: The mitigation of PEFT's substantial null space is asserted to preserve key signal components at scale, yet no quantitative measure (e.g., signal-to-noise ratio before/after decomposition, or reconstruction accuracy drop versus full-parameter baselines) is given to substantiate that the decomposition retains sufficient information for accurate token extraction and ordering.

Authors: We agree that quantitative metrics would provide stronger validation of the PEFT null-space mitigation. The manuscript currently demonstrates generalization across architectures including PEFT through end-to-end reconstruction success, but lacks intermediate signal-quality measures. In the revision, we will add experiments reporting signal-to-noise ratios of the key gradient components before and after decomposition, as well as side-by-side tables comparing reconstruction accuracy (exact token match rate, sequence-level BLEU, and ordering fidelity) for PEFT versus full-parameter baselines at multiple batch sizes and sequence lengths. These additions will quantify any performance drop and confirm that sufficient signal is retained for token extraction and ordering. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The abstract and summary present FedSpy-LLM as a novel gradient decomposition strategy exploiting rank deficiency and subspace structure, with no equations, derivations, or self-citations shown. No load-bearing step reduces by construction to fitted inputs, self-definitions, or author-prior ansatzes. The method is described as new without referencing prior fitted parameters or uniqueness theorems from the same authors. This is the common case of a self-contained proposed technique; the central claim does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are specified or needed to evaluate from the given text.

pith-pipeline@v0.9.0 · 5556 in / 1037 out tokens · 29745 ms · 2026-05-10T18:59:05.800698+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 15 canonical work pages · 4 internal anchors

[1]

Deep leakage from gradients,

L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,”Advances in neural information processing systems, vol. 32, 2019

2019
[2]

TAG: Gradient attack on transformer-based language models,

J. Deng, Y . Wang, J. Li, C. Wang, C. Shang, H. Liu, S. Rajasekaran, and C. Ding, “TAG: Gradient attack on transformer-based language models,” inFindings of the Association for Computational Linguistics: EMNLP 2021(M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, eds.), (Punta Cana, Dominican Republic), pp. 3600–3610, Association for Computational Ling...

2021
[3]

Lamp: Extracting text from gradients with language model priors,

M. Balunovic, D. Dimitrov, N. Jovanovi ´c, and M. Vechev, “Lamp: Extracting text from gradients with language model priors,”Advances in Neural Information Processing Systems, vol. 35, pp. 7641–7654, 2022

2022
[4]

Beyond gradient and priors in privacy attacks: Leveraging pooler layer inputs of language models in federated learning,

J. Li, S. Liu, and Q. Lei, “Beyond gradient and priors in privacy attacks: Leveraging pooler layer inputs of language models in federated learning,” inInternational Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023

2023
[5]

Recovering private text in federated learning of language models,

S. Gupta, Y . Huang, Z. Zhong, T. Gao, K. Li, and D. Chen, “Recovering private text in federated learning of language models,”Advances in neural information processing systems, vol. 35, pp. 8130–8143, 2022

2022
[6]

Dager: Exact gradient inversion for large language models,

I. Petrov, D. I. Dimitrov, M. Baader, M. M ¨uller, and M. Vechev, “Dager: Exact gradient inversion for large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 87801–87830, 2024

2024
[7]

Language mod- els are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

1901
[8]

Exploring the limits of transfer learning with a unified text-to-text transformer,

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”Journal of machine learning research, vol. 21, no. 140, pp. 1–67, 2020

2020
[9]

Security and privacy challenges of large language models: A survey,

B. C. Das, M. H. Amini, and Y . Wu, “Security and privacy challenges of large language models: A survey,”arXiv preprint arXiv:2402.00888, 2024

work page arXiv 2024
[10]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics, pp. 1273–1282, PMLR, 2017

2017
[11]

Fedlegal: The first real-world federated learning benchmark for legal nlp,

Z. Zhang, X. Hu, J. Zhang, Y . Zhang, H. Wang, L. Qu, and Z. Xu, “Fedlegal: The first real-world federated learning benchmark for legal nlp,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3492–3507, 2023

2023
[12]

Privacy-first health research with federated learning,

A. Sadilek, L. Liu, D. Nguyen, M. Kamruzzaman, S. Serghiou, B. Rader, A. Ingerman, S. Mellem, P. Kairouz, E. O. Nsoesie,et al., “Privacy-first health research with federated learning,”NPJ digital medicine, vol. 4, no. 1, p. 132, 2021

2021
[13]

Inverting gradients-how easy is it to break privacy in federated learning?,

J. Geiping, H. Bauermeister, H. Dr ¨oge, and M. Moeller, “Inverting gradients-how easy is it to break privacy in federated learning?,” Advances in neural information processing systems, vol. 33, pp. 16937– 16947, 2020

2020
[14]

See through gradients: Image batch recovery via gradinversion,

H. Yin, A. Mallya, A. Vahdat, J. M. Alvarez, J. Kautz, and P. Molchanov, “See through gradients: Image batch recovery via gradinversion,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16337–16346, 2021

2021
[15]

Auditing privacy defenses in federated learning via generative gradient leakage,

Z. Li, J. Zhang, L. Liu, and J. Liu, “Auditing privacy defenses in federated learning via generative gradient leakage,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10132–10142, 2022

2022
[16]

Gradient inversion attacks on acoustic signals: Revealing security risks in audio recognition systems,

P. R. Ovi and A. Gangopadhyay, “Gradient inversion attacks on acoustic signals: Revealing security risks in audio recognition systems,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4835–4839, IEEE, 2024

2024
[17]

Speech privacy leakage from shared gradi- ents in distributed learning,

Z. Li, J. Zhang, and J. Liu, “Speech privacy leakage from shared gradi- ents in distributed learning,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, IEEE, 2023

2023
[18]

Fedpetuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models,

Z. Zhang, Y . Yang, Y . Dai, Q. Wang, Y . Yu, L. Qu, and Z. Xu, “Fedpetuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models,” inAnnual Meeting of the Association of Computational Linguistics 2023, pp. 9963–9977, Association for Computational Linguistics (ACL), 2023

2023
[19]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[20]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” inInternational conference on machine learning, pp. 2790–2799, PMLR, 2019

2019
[21]

Slora: Federated parameter efficient fine-tuning of language models,

S. Babakniya, A. R. Elkordy, Y . H. Ezzeldin, Q. Liu, K.-B. Song, M. El-Khamy, and S. Avestimehr, “Slora: Federated parameter efficient fine-tuning of language models,”International Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS, 2023

2023
[22]

arXiv preprint arXiv:2205.10162 , year=

D. Cai, Y . Wu, S. Wang, F. X. Lin, and M. Xu, “Fedadapter: Efficient federated learning for modern nlp,”arXiv preprint arXiv:2205.10162, 2022

work page arXiv 2022
[23]

Sparse is enough in fine-tuning pre-trained large language models,

W. Song, Z. Li, L. Zhang, H. Zhao, and B. Du, “Sparse is enough in fine-tuning pre-trained large language models,”arXiv preprint arXiv:2312.11875, 2023

work page arXiv 2023
[24]

Gradient inversion with genera- tive image prior,

J. Jeon, K. Lee, S. Oh, J. Ok,et al., “Gradient inversion with genera- tive image prior,”Advances in neural information processing systems, vol. 34, pp. 29898–29908, 2021

2021
[25]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[26]

Flocora: Federated learning compression with low-rank adaptation,

L. G. Ribeiro, M. Leonardon, G. Muller, V . Fresse, and M. Arzel, “Flocora: Federated learning compression with low-rank adaptation,” arXiv preprint arXiv:2406.14082, 2024

work page arXiv 2024
[27]

arXiv preprint arXiv:2401.06432 , year=

Y . J. Cho, L. Liu, Z. Xu, A. Fahrezi, and G. Joshi, “Heterogeneous low- rank approximation for federated fine-tuning of on-device foundation models,”arXiv preprint arXiv:2401.06432, 2024

work page arXiv 2024
[28]

Feddat: An approach for foundation model finetuning in multi-modal heterogeneous federated learning,

H. Chen, Y . Zhang, D. Krompass, J. Gu, and V . Tresp, “Feddat: An approach for foundation model finetuning in multi-modal heterogeneous federated learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 11285–11293, 2024

2024
[29]

Deep learning with low precision by half-wave gaussian quantization,

Z. Cai, X. He, J. Sun, and N. Vasconcelos, “Deep learning with low precision by half-wave gaussian quantization,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 5918– 5926, 2017

2017
[30]

Speeding up convo- lutional neural networks with low rank expansions,

M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up convo- lutional neural networks with low rank expansions,”arXiv preprint arXiv:1405.3866, 2014

work page arXiv 2014
[31]

Fedqlora: Federated quantization-aware lora for large language models,

Z. Hu, L. Zhang, S. Dai, S. Gong, and Q. Shi, “Fedqlora: Federated quantization-aware lora for large language models,” 2025

2025
[32]

Fdlora: Personalized federated learning of large language model via dual lora tuning.arXiv preprint arXiv:2406.07925, 2024

J. Qi, Z. Luan, S. Huang, C. Fung, H. Yang, and D. Qian, “Fdlora: Personalized federated learning of large language model via dual lora tuning,”arXiv preprint arXiv:2406.07925, 2024

work page arXiv 2024
[33]

Fedp2eft: Federated learning to personalize parameter efficient fine- tuning for multilingual llms,

R. Lee, M. Kim, F. Rezk, R. Li, S. I. Venieris, and T. Hospedales, “Fedp2eft: Federated learning to personalize parameter efficient fine- tuning for multilingual llms,”arXiv preprint arXiv:2502.04387, 2025

work page arXiv 2025
[34]

idlg: Improved deep leakage from gradients.arXiv preprint arXiv:2001.02610, 2020

B. Zhao, K. R. Mopuri, and H. Bilen, “idlg: Improved deep leakage from gradients,”arXiv preprint arXiv:2001.02610, 2020

work page arXiv 2001
[35]

Language models are unsupervised multitask learners,

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever,et al., “Language models are unsupervised multitask learners,”OpenAI blog, vol. 1, no. 8, p. 9, 2019

2019
[36]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, “Bert: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale,et al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Neural network accept- ability judgments,

A. Warstadt, A. Singh, and S. R. Bowman, “Neural network accept- ability judgments,”Transactions of the Association for Computational Linguistics, vol. 7, pp. 625–641, 2019

2019
[39]

Recursive deep models for semantic compositionality over a sentiment treebank,

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y . Ng, and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank,” inProceedings of the 2013 conference on empirical methods in natural language processing, pp. 1631–1642, 2013

2013
[40]

Mimic-iii, a freely accessible critical care database,

A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, and R. G. Mark, “Mimic-iii, a freely accessible critical care database,”Scientific data, vol. 3, no. 1, pp. 1–9, 2016

2016
[41]

Rouge: A package for automatic evaluation of summaries,

C.-Y . Lin, “Rouge: A package for automatic evaluation of summaries,” inText summarization branches out, pp. 74–81, 2004

2004
[42]

Cross-type biomedical named entity recognition with deep multi-task learning,

X. Wang, Y . Zhang, X. Ren, Y . Zhang, M. Zitnik, J. Shang, C. Langlotz, and J. Han, “Cross-type biomedical named entity recognition with deep multi-task learning,”Bioinformatics, vol. 35, no. 10, pp. 1745–1752, 2019

2019
[43]

Differentially private federated learning: A client level perspective.arXiv preprint arXiv:1712.07557, 2017

R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated learning: A client level perspective,”arXiv preprint arXiv:1712.07557, 2017

work page arXiv 2017
[44]

R-gap: Recursive gradient attack on privacy,

J. Zhu and M. Blaschko, “R-gap: Recursive gradient attack on privacy,” arXiv preprint arXiv:2010.07733, 2020

work page arXiv 2010
[45]

April: Finding the achilles’ heel on privacy for vision transformers,

J. Lu, X. S. Zhang, T. Zhao, X. He, and J. Cheng, “April: Finding the achilles’ heel on privacy for vision transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10051–10060, 2022

2022
[46]

On the reciprocal of the general algebraic matrix,

E. H. Moore, “On the reciprocal of the general algebraic matrix,” Bulletin of the american mathematical society, vol. 26, pp. 294–295, 1920

1920
[47]

Biobert: a pre-trained biomedical language representation model for biomedical text mining,

J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,”Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020

2020
[48]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz,et al., “Huggingface’s trans- formers: State-of-the-art natural language processing,”arXiv preprint arXiv:1910.03771, 2019

work page internal anchor Pith review arXiv 1910
[49]

Pytorch: An imperative style, high-performance deep learning library,

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems, vol. 32, 2019. APPENDIX A. Theorems and Proofs Theorem 1.As the attention layers of the transformer are linear layers...

2019