Dual-forward path teacher knowledge distillation: Bridging the capacity gap between teacher and student

Li, T · 2025 · arXiv 2506.18244

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0 · 3 refs

LIFT decomposes distillation into coarse linear alignment then fine refinement while PLACE adds error-based local adaptation, allowing stable training of 1.3M-parameter students (1.6% teacher size) to FID 15.73 across diffusion and flow models.

Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions

cs.CV · 2026-06-10 · unverdicted · novelty 5.0

Mixup applied only to the student during KD induces independent linearity acquisition that reduces overconfidence by an order of magnitude while improving accuracy, with calibration transferring separately from accuracy.

citing papers explorer

Showing 2 of 2 citing papers.

LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models cs.CV · 2026-05-19 · unverdicted · none · ref 19 · 3 links
LIFT decomposes distillation into coarse linear alignment then fine refinement while PLACE adds error-based local adaptation, allowing stable training of 1.3M-parameter students (1.6% teacher size) to FID 15.73 across diffusion and flow models.
Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions cs.CV · 2026-06-10 · unverdicted · none · ref 28
Mixup applied only to the student during KD induces independent linearity acquisition that reduces overconfidence by an order of magnitude while improving accuracy, with calibration transferring separately from accuracy.

Dual-forward path teacher knowledge distillation: Bridging the capacity gap between teacher and student

fields

years

verdicts

representative citing papers

citing papers explorer