human-interpretable

URLhttps://arxiv · 2025 · arXiv 2602.02467

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

Introduces BeliefTrack benchmark diagnosing three CBM failures in LLMs and shows RL with belief-state rewards cuts failure rates by 70.9% while representation steering cuts them by 46.1%.

Can LLMs Introspect? A Reality Check

cs.AI · 2026-05-25 · conditional · novelty 6.0

Re-examination of two LLM introspection paradigms with new controls shows models lack privileged access to internal states, performing equivalently with input-only classifiers or near chance on relabeled tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models cs.AI · 2026-05-28 · unverdicted · none · ref 45
Introduces BeliefTrack benchmark diagnosing three CBM failures in LLMs and shows RL with belief-state rewards cuts failure rates by 70.9% while representation steering cuts them by 46.1%.
Can LLMs Introspect? A Reality Check cs.AI · 2026-05-25 · conditional · none · ref 3
Re-examination of two LLM introspection paradigms with new controls shows models lack privileged access to internal states, performing equivalently with input-only classifiers or near chance on relabeled tasks.

human-interpretable

fields

years

verdicts

representative citing papers

citing papers explorer