arxiv: 2604.19015 · v1 · submitted 2026-04-21 · 💻 cs.LG · cs.AI

Recognition: unknown

FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion

Tao Fan , Guoqiang Ma , Yuanfeng Song , Lixin Fan , Kai Chen , Qiang Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:32 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords federated learninglarge language modelsfine-tuningmodel compressiondata heterogeneityproxy modelsfederated optimization

0 comments

The pith

FedProxy uses a proxy small language model to achieve high-performance federated fine-tuning of large language models that approaches centralized results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to resolve the trilemma in federated LLM fine-tuning by protecting model IP, maintaining client privacy, and minimizing performance loss from heterogeneous data. It shows that offsite-tuning with adapters creates a performance bottleneck. FedProxy introduces a compressed proxy SLM as a surrogate for training, with stages for compression, heterogeneity-aware aggregation, and fusion back into the LLM. This allows clients to collaborate on fine-tuning without sharing data or the full model.

Core claim

FedProxy replaces weak adapters with a unified, powerful Proxy Small Language Model (SLM), compressed from the proprietary LLM, to serve as a high-fidelity surrogate for collaborative fine-tuning. The framework uses server-guided compression, an interference-mitigating aggregation strategy for data heterogeneity, and a training-free plug-in mechanism to integrate the learned knowledge back into the original LLM.

What carries the argument

The Proxy Small Language Model (SLM) as a compressed high-fidelity surrogate for the LLM, combined with heterogeneity-aware fusion.

If this is right

Secure federated adaptation becomes possible without exposing the full LLM parameters.
Performance on heterogeneous client data approaches that of centralized training.
IP of the LLM is protected as only the proxy is shared for training.
Client data privacy is maintained through the federated setup.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may generalize to other model types beyond LLMs if similar compression is feasible.
Scalability could be tested with larger numbers of clients or more diverse data distributions.
Integration with other privacy techniques like differential privacy could be explored as an extension.

Load-bearing premise

The proxy SLM retains sufficient fidelity to the original LLM for the fine-tuned knowledge to transfer effectively back through the fusion mechanism.

What would settle it

If experiments show that on highly heterogeneous datasets the performance of FedProxy remains significantly below centralized training even after fusion, the core claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.19015 by Guoqiang Ma, Kai Chen, Lixin Fan, Qiang Yang, Tao Fan, Yuanfeng Song.

read the original abstract

Federated fine-tuning of Large Language Models (LLMs) is obstructed by a trilemma of challenges: protecting LLMs intellectual property (IP), ensuring client privacy, and mitigating performance loss on heterogeneous data. Existing methods like Offsite-Tuning (OT) secure the LLMs IP by having clients train only lightweight adapters, yet our analysis reveals they suffer from a fundamental performance bottleneck, leaving a significant gap compared to centralized training. To bridge this gap, we introduce FedProxy, a new federated adaptation framework. FedProxy replaces weak adapters with a unified, powerful Proxy Small Language Model (SLM), compressed from the proprietary LLM, to serve as a high-fidelity surrogate for collaborative fine-tuning. Our framework systematically resolves the trilemma through a three-stage architecture: (i) Efficient Representation via server-guided compression to create a resource-friendly proxy; (ii) Robust Optimization through an interference-mitigating aggregation strategy to handle data heterogeneity; and (iii) Effortless Fusion via a training-free "plug-in" mechanism to integrate learned knowledge back into the LLM. Experiments show FedProxy significantly outperforms OT methods and approaches centralized performance, establishing a new benchmark for secure and high-performance federated LLM adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedProxy swaps adapters for a compressed proxy SLM in federated LLM tuning, but the outperformance claims rest on unshown experiments and an unquantified fidelity assumption.

read the letter

The main point is that this paper replaces the lightweight adapters from Offsite-Tuning with a full proxy small language model that gets compressed on the server, fine-tuned by clients, aggregated with an interference-mitigating step, and then fused back into the original LLM via a training-free plug-in. That three-stage setup is the concrete new piece they put forward to tackle the IP, privacy, and heterogeneity issues at once. It does a reasonable job spelling out why adapters create a performance ceiling and how a higher-capacity proxy might close the gap to centralized training. The architecture itself is a straightforward extension that practitioners could actually implement if the compression step holds up. What the work does well is keep the big model private while giving clients something more expressive than tiny adapters to train on heterogeneous data. The interference-mitigating aggregation and plug-in fusion are described clearly enough to see the intended flow. The soft spots sit in the evidence. The abstract states that FedProxy significantly outperforms OT methods and approaches centralized performance, yet no numbers, baselines, ablations, or error bars appear. The central assumption—that the compressed proxy retains enough fidelity for the fusion to recover near-centralized results—is load-bearing but not bounded with any metrics like task accuracy delta or representation divergence. Without those controls it is hard to know whether the proxy capacity itself becomes the new bottleneck on heterogeneous clients. This is aimed at applied teams doing federated LLM adaptation across organizations that cannot share raw data or full models. A reader already working on similar compression or aggregation tricks might pick up the proxy-plus-fusion pattern as a useful template. I would send it to peer review so the experiments and compression details can be checked properly; the framework is coherent enough to deserve that step even if the gains turn out smaller than claimed.

Referee Report

2 major / 2 minor

Summary. The paper proposes FedProxy, a federated fine-tuning framework for LLMs that replaces lightweight adapters with a compressed Proxy Small Language Model (SLM) as a high-fidelity surrogate. It addresses IP protection, client privacy, and heterogeneity via a three-stage process: server-guided compression for efficient representation, interference-mitigating aggregation for robust optimization, and training-free plug-in fusion to integrate knowledge back into the original LLM. The central claim is that experiments demonstrate FedProxy significantly outperforms Offsite-Tuning (OT) methods while approaching centralized training performance.

Significance. If the experimental results hold and the proxy fidelity assumption is validated with quantitative evidence, this could represent a meaningful advance in secure federated LLM adaptation. By moving beyond adapter-based methods like OT, the proxy SLM approach combined with heterogeneity-aware fusion offers a systematic way to reduce the performance gap on non-IID data while preserving model IP, potentially establishing a practical benchmark for distributed LLM fine-tuning.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: The claims that FedProxy 'significantly outperforms OT methods and approaches centralized performance' are presented without any quantitative numbers, specific baselines, ablation studies, error bars, or tables. This prevents evaluation of the evidence strength and is load-bearing for the central contribution, as the performance advantage is the primary justification for the new framework.
[§3 and Experiments] §3 (Framework) and Experiments: The load-bearing assumption that the compressed proxy SLM retains sufficient fidelity to the original LLM (so that fine-tuning and fusion recover near-centralized results) is not supported by any reported metrics such as KL divergence, representation similarity scores, or task accuracy deltas. No ablations or controls are described to show that proxy capacity is not the bottleneck on heterogeneous data.

minor comments (2)

[Abstract] The statement that OT methods 'suffer from a fundamental performance bottleneck' would be strengthened by an explicit citation to the analysis or figure that demonstrates this gap.
[§3.1] Clarify the precise compression technique and hyperparameters used in the server-guided compression stage, including any resource or fidelity trade-offs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the empirical support and clarity of our claims.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The claims that FedProxy 'significantly outperforms OT methods and approaches centralized performance' are presented without any quantitative numbers, specific baselines, ablation studies, error bars, or tables. This prevents evaluation of the evidence strength and is load-bearing for the central contribution, as the performance advantage is the primary justification for the new framework.

Authors: We agree that the current abstract and experiments section present the performance claims at a high level without sufficient quantitative detail. In the revised manuscript we will update the abstract with specific metrics (e.g., percentage improvements over Offsite-Tuning baselines and the remaining gap to centralized training). The experiments section will be expanded to include full tables with all baselines, ablation results on each FedProxy component, standard error bars from repeated runs, and explicit numerical comparisons. These additions will make the evidence strength directly evaluable. revision: yes
Referee: [§3 and Experiments] §3 (Framework) and Experiments: The load-bearing assumption that the compressed proxy SLM retains sufficient fidelity to the original LLM (so that fine-tuning and fusion recover near-centralized results) is not supported by any reported metrics such as KL divergence, representation similarity scores, or task accuracy deltas. No ablations or controls are described to show that proxy capacity is not the bottleneck on heterogeneous data.

Authors: We acknowledge that the manuscript would be strengthened by explicit quantitative validation of proxy fidelity. While the end-to-end results are consistent with the assumption, we did not report direct metrics such as KL divergence, hidden-state similarity, or accuracy deltas attributable to compression. In revision we will add these measurements together with ablations that vary proxy capacity (different compression ratios and model sizes) and include controls that isolate whether capacity, rather than heterogeneity handling, limits performance on non-IID data. revision: yes

Circularity Check

0 steps flagged

No circularity detected; framework presented as novel construction without equations reducing to inputs by definition

full rationale

The provided abstract and description introduce FedProxy as a three-stage framework (server-guided compression to proxy SLM, heterogeneity-aware aggregation, training-free fusion) without any equations, derivations, or parameter-fitting steps that would reduce claimed performance gains to self-referential definitions or fitted inputs. No self-citations are invoked as load-bearing uniqueness theorems, and the central claims rest on empirical outperformance rather than algebraic equivalence to prior quantities. The proxy fidelity assumption is an unquantified modeling choice but does not create circularity in the derivation chain, as no math is shown that presupposes the result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review; full paper may list additional parameters or assumptions. The proxy SLM is treated as a new component whose fidelity is assumed without independent verification here.

axioms (1)

domain assumption A compressed proxy SLM can act as a high-fidelity surrogate that preserves fine-tuning behavior of the original LLM
Invoked to justify replacing adapters with the proxy and claiming near-centralized performance

invented entities (1)

Proxy Small Language Model (SLM) no independent evidence
purpose: Unified high-fidelity surrogate for collaborative fine-tuning instead of lightweight adapters
Introduced to overcome the stated performance bottleneck of adapter-based methods

pith-pipeline@v0.9.0 · 5527 in / 1485 out tokens · 40122 ms · 2026-05-10T02:32:32.644758+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

182 extracted references · 56 canonical work pages · 16 internal anchors

[1]

Artificial intelligence and statistics , pages=

Communication-efficient learning of deep networks from decentralized data , author=. Artificial intelligence and statistics , pages=. 2017 , organization=

2017
[2]

Synthesis Lectures on Artificial Intelligence and Machine Learning , volume=

Federated learning , author=. Synthesis Lectures on Artificial Intelligence and Machine Learning , volume=. 2019 , publisher=

2019
[3]

Foundations and Trends

Advances and open problems in federated learning , author=. Foundations and Trends. 2021 , publisher=

2021
[4]

Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption.arXiv preprint arXiv:1711.10677, 2017

Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , author=. arXiv preprint arXiv:1711.10677 , year=

work page arXiv
[5]

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

When homomorphic encryption marries secret sharing: Secure large-scale sparse logistic regression and applications in risk control , author=. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=
[6]

arXiv preprint arXiv:1912.00513 , year=

A quasi-newton method based vertical federated learning framework for logistic regression , author=. arXiv preprint arXiv:1912.00513 , year=

work page arXiv 1912
[7]

IEEE Intelligent Systems , volume=

Secureboost: A lossless federated learning framework , author=. IEEE Intelligent Systems , volume=. 2021 , publisher=

2021
[8]

Pacific-Asia Conference on Knowledge Discovery and Data Mining , pages=

SecureBoost+: Large scale and high-performance vertical federated gradient boosting decision tree , author=. Pacific-Asia Conference on Knowledge Discovery and Data Mining , pages=. 2024 , organization=

2024
[9]

, author=

GELU-Net: A Globally Encrypted, Locally Unencrypted Deep Neural Network for Privacy-Preserved Learning. , author=. IJCAI , pages=
[10]

arXiv preprint arXiv:2007.06849 , year=

Additively homomorphical encryption based deep neural network for asymmetrically collaborative machine learning , author=. arXiv preprint arXiv:2007.06849 , year=

work page arXiv 2007
[11]

Proceedings of the 2022 International Conference on Management of Data , pages=

Blindfl: Vertical federated machine learning without peeking into your data , author=. Proceedings of the 2022 International Conference on Management of Data , pages=

2022
[12]

IEEE Transactions on Big Data , year=

Privacy-preserving federated adversarial domain adaptation over feature groups for interpretability , author=. IEEE Transactions on Big Data , year=
[13]

IEEE Intelligent Systems , volume=

A secure federated transfer learning framework , author=. IEEE Intelligent Systems , volume=. 2020 , publisher=

2020
[14]

Split learning for health: Distributed deep learning without sharing raw patient data

Split learning for health: Distributed deep learning without sharing raw patient data , author=. arXiv preprint arXiv:1812.00564 , year=

work page Pith review arXiv
[15]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Splitfed: When federated learning meets split learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[16]

International conference on the theory and applications of cryptographic techniques , pages=

Public-key cryptosystems based on composite degree residuosity classes , author=. International conference on the theory and applications of cryptographic techniques , pages=. 1999 , organization=

1999
[17]

Communications of the ACM , volume=

How to share a secret , author=. Communications of the ACM , volume=. 1979 , publisher=

1979
[18]

Annual Cryptology Conference , pages=

Multiparty computation from somewhat homomorphic encryption , author=. Annual Cryptology Conference , pages=. 2012 , organization=

2012
[19]

Foundations and Trends

The algorithmic foundations of differential privacy , author=. Foundations and Trends. 2014 , publisher=

2014
[20]

Unknown Journal , year=

A communication efficient vertical federated learning framework , author=. Unknown Journal , year=
[21]

proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security , pages=

Practical secure aggregation for privacy-preserving machine learning , author=. proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security , pages=

2017
[22]

Federated Learning , pages=

Deep leakage from gradients , author=. Federated Learning , pages=. 2020 , publisher=

2020
[23]

Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security , pages=

Model inversion attacks that exploit confidence information and basic countermeasures , author=. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security , pages=
[24]

Journal of Network and Computer Applications , volume=

Distributed learning of deep neural network over multiple agents , author=. Journal of Network and Computer Applications , volume=. 2018 , publisher=

2018
[25]

arXiv preprint arXiv:2109.13012 , year=

Federated Deep Learning with Bayesian Privacy , author=. arXiv preprint arXiv:2109.13012 , year=

work page arXiv
[26]

, author=

FATE: An Industrial Grade Platform for Collaborative Learning With Data Protection. , author=. J. Mach. Learn. Res. , volume=
[27]

Harnessing the power of llms in practice: A survey on chatgpt and beyond

Harnessing the power of llms in practice: A survey on chatgpt and beyond , author=. arXiv preprint arXiv:2304.13712 , year=

work page arXiv
[28]

A comprehensive survey on pretrained foundation mod- els: A history from bert to chatgpt,

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt , author=. arXiv preprint arXiv:2302.09419 , year=

work page arXiv
[29]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. arXiv preprint arXiv:1810.04805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Bloom: A 176b-parameter open-access multilingual language model , author=. arXiv preprint arXiv:2211.05100 , year=

work page internal anchor Pith review arXiv
[31]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

2022 , publisher=

Chatgpt , author=. 2022 , publisher=

2022
[33]

2023 , publisher=

Gpt-4 , author=. 2023 , publisher=

2023
[34]

2018 , publisher=

Improving language understanding by generative pre-training , author=. 2018 , publisher=

2018
[35]

OPT: Open Pre-trained Transformer Language Models

Opt: Open pre-trained transformer language models , author=. arXiv preprint arXiv:2205.01068 , year=

work page internal anchor Pith review arXiv
[36]

Baichuan 2: Open large-scale language models,

Baichuan 2: Open Large-scale Language Models , author=. arXiv preprint arXiv:2309.10305 , year=

work page arXiv
[37]

PaLM: Scaling Language Modeling with Pathways

Palm: Scaling language modeling with pathways , author=. arXiv preprint arXiv:2204.02311 , year=

work page internal anchor Pith review arXiv
[38]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
[39]

arXiv preprint arXiv:2205.10162 , year=

Autofednlp: An efficient fednlp framework , author=. arXiv preprint arXiv:2205.10162 , year=

work page arXiv
[40]

arXiv preprint arXiv:2208.12268 , year=

Reduce Communication Costs and Preserve Privacy: Prompt Tuning Method in Federated Learning , author=. arXiv preprint arXiv:2208.12268 , year=

work page arXiv
[41]

arXiv preprint arXiv:2212.10025 , year=

When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods , author=. arXiv preprint arXiv:2212.10025 , year=

work page arXiv
[42]

Nature communications , volume=

Communication-efficient federated learning via knowledge distillation , author=. Nature communications , volume=. 2022 , publisher=

2022
[43]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

FedIPR: Ownership verification for federated deep neural network models , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
[44]

arXiv preprint arXiv:1908.06605 , year=

Long and diverse text generation with planning-based hierarchical variational model , author=. arXiv preprint arXiv:1908.06605 , year=

work page arXiv 1908
[45]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

GLM: General language model pretraining with autoregressive blank infilling , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[46]

CoRR , volume =

P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks , author=. arXiv preprint arXiv:2110.07602 , year=

work page arXiv
[47]

LoRA: Low-Rank Adaptation of Large Language Models

Lora: Low-rank adaptation of large language models , author=. arXiv preprint arXiv:2106.09685 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Will we run out of data? limits of llm scaling based on human-generated data

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning , author=. arXiv preprint arXiv:2211.04325 , year=

work page arXiv
[49]

Federated mutual learning,

Federated mutual learning , author=. arXiv preprint arXiv:2006.16765 , year=

work page arXiv 2006
[50]

2020 International Joint Conference on Neural Networks (IJCNN) , pages=

Federated learning with hierarchical clustering of local updates to improve training on non-IID data , author=. 2020 International Joint Conference on Neural Networks (IJCNN) , pages=. 2020 , organization=

2020
[51]

Offsite-tuning: Transfer learning without full model,

Offsite-tuning: Transfer learning without full model , author=. arXiv preprint arXiv:2302.04870 , year=

work page arXiv
[52]

31st USENIX Security Symposium (USENIX Security 22) , pages=

Cheetah: Lean and Fast Secure \ Two-Party \ Deep Neural Network Inference , author=. 31st USENIX Security Symposium (USENIX Security 22) , pages=
[53]

arXiv preprint arXiv:2211.12814 , year=

Vertical Federated Learning , author=. arXiv preprint arXiv:2211.12814 , year=

work page arXiv
[54]

Proceedings of the European conference on computer vision (ECCV) , pages=

Lq-nets: Learned quantization for highly accurate and compact deep neural networks , author=. Proceedings of the European conference on computer vision (ECCV) , pages=
[55]

Text summarization branches out , pages=

Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=
[56]

Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
[57]

Workshop on Challenges in Deployable Generative AI at International Conference on Machine Learning (ICML) , year=

Can Public Large Language Models Help Private Cross-device Federated Learning? , author=. Workshop on Challenges in Deployable Generative AI at International Conference on Machine Learning (ICML) , year=
[58]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
[59]

2023 , url =

Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin , title =. 2023 , url =

2023
[60]

Hashimoto , title =

Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =

2023
[61]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

2018
[62]

Proceedings of the AAAI conference on artificial intelligence , volume=

Piqa: Reasoning about physical commonsense in natural language , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[63]

Crowdsourcing Multiple Choice Science Questions

Crowdsourcing multiple choice science questions , author=. arXiv preprint arXiv:1707.06209 , year=

work page Pith review arXiv
[64]

International Conference on Machine Learning , pages=

Parameter-efficient transfer learning for NLP , author=. International Conference on Machine Learning , pages=. 2019 , organization=

2019
[65]

Towards a Unified View of Parameter-Efficient Transfer Learning , journal =

Towards a unified view of parameter-efficient transfer learning , author=. arXiv preprint arXiv:2110.04366 , year=

work page arXiv
[66]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-tuning: Optimizing continuous prompts for generation , author=. arXiv preprint arXiv:2101.00190 , year=

work page internal anchor Pith review arXiv
[67]

The Power of Scale for Parameter-Efficient Prompt Tuning

The power of scale for parameter-efficient prompt tuning , author=. arXiv preprint arXiv:2104.08691 , year=

work page internal anchor Pith review arXiv
[68]

Distilling the Knowledge in a Neural Network

Distilling the knowledge in a neural network , author=. arXiv preprint arXiv:1503.02531 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[69]

Asian Conference on Machine Learning , pages=

Regularized mutual learning for personalized federated learning , author=. Asian Conference on Machine Learning , pages=. 2021 , organization=

2021
[70]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep mutual learning , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[71]

International Journal of Computer Vision , volume=

Knowledge distillation: A survey , author=. International Journal of Computer Vision , volume=. 2021 , publisher=

2021
[72]

Advances in neural information processing systems , volume=

Learning efficient object detection models with knowledge distillation , author=. Advances in neural information processing systems , volume=
[73]

ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Conditional teacher-student learning , author=. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2019 , organization=

2019
[74]

Fitnets: Hints for thin deep nets , author=. Proc. ICLR , volume=
[75]

Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer

Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer , author=. arXiv preprint arXiv:1612.03928 , year=

work page arXiv
[76]

Proceedings of the AAAI conference on artificial intelligence , volume=

Cross-layer distillation with semantic calibration , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[77]

Proceedings of the AAAI Conference on artificial intelligence , volume=

Alp-kd: Attention-based layer projection for knowledge distillation , author=. Proceedings of the AAAI Conference on artificial intelligence , volume=
[78]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Relational knowledge distillation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[79]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Think you have solved question answering? try arc, the ai2 reasoning challenge , author=. arXiv preprint arXiv:1803.05457 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[80]

Fedmd: Heterogenous federated learning via model distillation,

Fedmd: Heterogenous federated learning via model distillation , author=. arXiv preprint arXiv:1910.03581 , year=

work page arXiv 1910

Showing first 80 references.