PMD extracts and distills cross-episode procedural knowledge from RL rollouts into LLM policies at three abstraction levels, yielding 3.8-13.6% gains over SDPO on SCIKNOWEVAL and LIVECODEBENCH via co-evolution.
X-KD: General Experiential Knowledge Distillation for Large Language Models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
PMD extracts and distills cross-episode procedural knowledge from RL rollouts into LLM policies at three abstraction levels, yielding 3.8-13.6% gains over SDPO on SCIKNOWEVAL and LIVECODEBENCH via co-evolution.