On-policy distillation gains efficiency from early foresight in module allocation and low-rank update directions, enabling EffOPD to accelerate training by 3x via adaptive extrapolation without extra modules or tuning.
arXiv preprint arXiv:2511.08567 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
DoTS decouples SFT and RLVR training then synthesizes their task vectors at inference time to match integrated training results at ~3% compute cost.
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
SparseRL-Sync achieves lossless weight synchronization in large-scale RL by sending only changed parameters, reducing communication volume by roughly 100x under observed 99%+ element-level sparsity.
citing papers explorer
-
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
On-policy distillation gains efficiency from early foresight in module allocation and low-rank update directions, enabling EffOPD to accelerate training by 3x via adaptive extrapolation without extra modules or tuning.
-
Rotation-Preserving Supervised Fine-Tuning
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
-
Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors
DoTS decouples SFT and RLVR training then synthesizes their task vectors at inference time to match integrated training results at ~3% compute cost.
-
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
-
SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication
SparseRL-Sync achieves lossless weight synchronization in large-scale RL by sending only changed parameters, reducing communication volume by roughly 100x under observed 99%+ element-level sparsity.