arXiv preprint arXiv:2512.20949 , year=

Neural Probe-Based Hallucination Detection for Large Language Models , author= · arXiv 2512.20949

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

cs.CL · 2026-05-16 · unverdicted · novelty 6.0

Benchmark construction artifacts in hallucination detection corpora allow naive text-similarity baselines to achieve near-perfect scores, and controlled evaluations show most methods perform near chance except SAPLMA and the new DRIFT probe.

Sparse Reward Subsystem in Large Language Models

cs.CL · 2026-02-01 · unverdicted · novelty 6.0

LLM hidden states contain a sparse reward subsystem consisting of value neurons that predict state value and dopamine neurons that encode step-level temporal difference errors.

citing papers explorer

Showing 2 of 2 citing papers.

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts cs.CL · 2026-05-16 · unverdicted · none · ref 56
Benchmark construction artifacts in hallucination detection corpora allow naive text-similarity baselines to achieve near-perfect scores, and controlled evaluations show most methods perform near chance except SAPLMA and the new DRIFT probe.
Sparse Reward Subsystem in Large Language Models cs.CL · 2026-02-01 · unverdicted · none · ref 16
LLM hidden states contain a sparse reward subsystem consisting of value neurons that predict state value and dopamine neurons that encode step-level temporal difference errors.

arXiv preprint arXiv:2512.20949 , year=

fields

years

verdicts

representative citing papers

citing papers explorer