pith. sign in

hub

Hdpo: Hybrid distillation policy optimization via privileged self-distillation

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

citation-role summary

background 3

citation-polarity summary

years

2026 15

roles

background 3

polarities

background 3

clear filters

representative citing papers

OPRD: On-Policy Representation Distillation

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

OPRD performs distillation in hidden-state space on on-policy data for deterministic gradients and better math benchmark performance, plus OPRD-Bridge for cross-architecture transfer via low-rank projectors.

Self-Distilled Agentic Reinforcement Learning

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

SDAR gates on-policy self-distillation signals into RL training to stabilize and improve multi-turn LLM agent performance on ALFWorld, WebShop, and Search-QA.

On-Policy Distillation with Best-of-N Teacher Rollout Selection

cs.CV · 2026-05-10 · unverdicted · novelty 5.0 · 2 refs

BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.