Teacher-Student Compression with Generative Adversarial Networks

Lester Mackey; Nicolo Fusi; Ruishan Liu

arxiv: 1812.02271 · v4 · pith:3U4LIK4Mnew · submitted 2018-12-05 · 💻 cs.LG · stat.ML

Teacher-Student Compression with Generative Adversarial Networks

Ruishan Liu , Nicolo Fusi , Lester Mackey This is my paper

classification 💻 cs.LG stat.ML

keywords compressiondataexpensivetrainingaccuracyadversarialdatasetsgenerative

0 comments

read the original abstract

More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Teacher-student compression (TSC), also known as distillation, alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh data is unavailable for the compression task, the teacher's training data is typically reused, leading to suboptimal compression. In this work, we propose to augment the compression dataset with synthetic data from a generative adversarial network (GAN) designed to approximate the training data distribution. Our GAN-assisted TSC (GAN-TSC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets. Building on these results, we propose a comprehensive metric---the TSC Score---to evaluate the quality of synthetic datasets based on their induced TSC performance. The TSC Score captures both data diversity and class affinity, and we illustrate its benefits over the popular Inception Score in the context of image classification.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Logit Distillation on Manifolds: Mapping by Learning
cs.LG 2026-05 unverdicted novelty 3.0

Presents a layer- and point-wise projection mapping for manifold-based logit distillation combined with LoRA to enable low-parameter student training with reported WER gains.