pith. sign in

arxiv: 2511.05933 · v2 · pith:JEHTGC2Wnew · submitted 2025-11-08 · 💻 cs.CL · cs.AI

Reinforcement Learning Improves Traversal of Parametric Knowledge in LLMs

classification 💻 cs.CL cs.AI
keywords modelsknowledgereasoningrecalltraversalmodelfactsfactual
0
0 comments X
read the original abstract

Reinforcement learning (RL) is often credited with improving language model reasoning at the expense of knowledge. We challenge this narrative by showing that reasoning models consistently outperform their instruction-tuned versions on pure knowledge recall tasks. These gains do not reflect newly acquired information, but rather an improved procedural skill in navigating and searching existing knowledge hierarchies within the model parameters. Structured prompting, which explicitly guides models through hierarchical traversal -- recovers most of the instruct-reasoning gap across five model families. A controlled RL experiment on unseen, non-extractable facts improves recall of held-out frequent but previously inaccessible facts, ruling out simple data exposure. On depth-stratified retrieval tasks, reasoning models exhibit superior traversal as retrieval depth grows. Layerwise activation analysis further shows that while factual representations maintain high cosine similarity between instruct and reasoning models, query representations diverge noticeably, indicating that reasoning primarily reshapes how models traverse knowledge rather than the knowledge representation itself. Finally, we find that distilled models often fail to match reasoning models on knowledge recall because they imitate self-correction without acquiring the exploratory behavior needed for hierarchical navigation. Together, these findings suggest that improving factual recall in LLMs depends not only on expanding what models know but also on teaching them to navigate it -- motivating future post-training methods that optimize traversal.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

    cs.CL 2026-04 conditional novelty 6.0

    Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-paramete...