Distillation signals align better with ideal updates on incorrect student rollouts than correct ones, with optimal teacher context depending on student capacity and task.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Absorber LLM introduces causal synchronization to absorb context into parameters for memory-efficient long-context LLM inference while preserving causal effects.
citing papers explorer
-
Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
Distillation signals align better with ideal updates on incorrect student rollouts than correct ones, with optimal teacher context depending on student capacity and task.
-
Absorber LLM: Harnessing Causal Synchronization for Test-Time Training
Absorber LLM introduces causal synchronization to absorb context into parameters for memory-efficient long-context LLM inference while preserving causal effects.