pith. sign in

hub Canonical reference

Unifying group-relative and self-distillation policy optimization via sample routing

Canonical reference. 100% of citing Pith papers cite this work as background.

18 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 6

citation-polarity summary

years

2026 18

verdicts

UNVERDICTED 18

roles

background 6

polarities

background 6

clear filters

representative citing papers

OPRD: On-Policy Representation Distillation

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

OPRD performs distillation in hidden-state space on on-policy data for deterministic gradients and better math benchmark performance, plus OPRD-Bridge for cross-architecture transfer via low-rank projectors.

VISD: Enhancing Video Reasoning via Structured Self-Distillation

cs.CV · 2026-05-07 · unverdicted · novelty 5.0 · 4 refs

VISD proposes structured self-distillation with a multi-dimensional judge model and direction-magnitude decoupling to improve token-level credit assignment and convergence speed in VideoLLM reasoning training.

Physics-Guided Policy Optimization with Self-Distillation

cs.LG · 2026-06-02 · unverdicted · novelty 4.0

PGPO modulates per-step trust in self-distilled updates via a mutual-information estimate derived from a viscous-fluid analogy, preserves SGD weak-approximation order, and reports gains of up to 4.5 points on Science-QA while avoiding late-training collapse.

citing papers explorer

Showing 1 of 1 citing paper after filters.