pith. sign in

hub Canonical reference

Unifying group-relative and self-distillation policy optimization via sample routing

Canonical reference. 100% of citing Pith papers cite this work as background.

21 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 6

citation-polarity summary

years

2026 21

verdicts

UNVERDICTED 21

roles

background 6

polarities

background 6

clear filters

representative citing papers

OPRD: On-Policy Representation Distillation

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

OPRD performs distillation in hidden-state space on on-policy data for deterministic gradients and better math benchmark performance, plus OPRD-Bridge for cross-architecture transfer via low-rank projectors.

DemoPSD: Disagreement-Modulated Policy Self-Distillation

cs.LG · 2026-07-02 · unverdicted · novelty 5.0

DemoPSD uses a reverse-KL barycenter target modulated by distribution discrepancy for selective teacher guidance in LLM self-distillation, claiming leakage attenuation, exploration preservation, and superior performance on SciKnowEval and GPQA.

VISD: Enhancing Video Reasoning via Structured Self-Distillation

cs.CV · 2026-05-07 · unverdicted · novelty 5.0 · 4 refs

VISD proposes structured self-distillation with a multi-dimensional judge model and direction-magnitude decoupling to improve token-level credit assignment and convergence speed in VideoLLM reasoning training.

Physics-Guided Policy Optimization with Self-Distillation

cs.LG · 2026-06-02 · unverdicted · novelty 4.0

PGPO modulates per-step trust in self-distilled updates via a mutual-information estimate derived from a viscous-fluid analogy, preserves SGD weak-approximation order, and reports gains of up to 4.5 points on Science-QA while avoiding late-training collapse.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • VISD: Enhancing Video Reasoning via Structured Self-Distillation cs.CV · 2026-05-07 · unverdicted · none · ref 21 · 4 links

    VISD proposes structured self-distillation with a multi-dimensional judge model and direction-magnitude decoupling to improve token-level credit assignment and convergence speed in VideoLLM reasoning training.