arXiv preprint arXiv:2212.12965 , year=

Bd-kd: Balancing the divergences for online knowledge distillation , author= · arXiv 2212.12965

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Rethinking the Role of Temperature in Large Language Model Distillation

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Including temperature scaling makes forward KL divergence outperform reverse KL in LLM distillation on instruction benchmarks, overturning the τ=1 preference for reverse KL.

citing papers explorer

Showing 1 of 1 citing paper.

Rethinking the Role of Temperature in Large Language Model Distillation cs.LG · 2026-05-29 · unverdicted · none · ref 11
Including temperature scaling makes forward KL divergence outperform reverse KL in LLM distillation on instruction benchmarks, overturning the τ=1 preference for reverse KL.

arXiv preprint arXiv:2212.12965 , year=

fields

years

verdicts

representative citing papers

citing papers explorer