pith. sign in

arxiv: 2606.31033 · v1 · pith:ZEFSL5JFnew · submitted 2026-06-30 · 💻 cs.CL

CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations

Pith reviewed 2026-07-01 06:12 UTC · model grok-4.3

classification 💻 cs.CL
keywords hallucination detectionretrieval-augmented generationtoken-level analysisinternal representationslarge language modelsRAG benchmarkscomparative analysis
0
0 comments X

The pith

CORTEX identifies ungrounded tokens in RAG by comparing an LLM's internal representations with and without the retrieved documents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CORTEX to detect hallucinations at the token level inside long RAG responses. It measures the difference in internal model states produced when the retrieved documents are supplied versus when they are withheld. The method adds tracking of how document influence travels forward through earlier tokens and applies smoothing to favor consistent labels across neighboring tokens. This setup targets the common pattern where hallucinations appear only in isolated spans rather than the full output. Experiments across two benchmarks and three LLMs show consistent gains from each added component.

Core claim

CORTEX identifies ungrounded tokens by comparing the internal representations of an LLM under two conditions—with and without the retrieved documents. It incorporates the propagation of document-grounded information through preceding tokens to reduce false positives and applies a post-processing smoothing step that models the persistence of hallucination labels over contiguous spans.

What carries the argument

Comparison of internal representations generated with versus without retrieved documents, extended by propagation of grounded effects across tokens and span-smoothing post-processing.

If this is right

  • Token-level hallucination detection accuracy improves over prior methods on standard RAG benchmarks.
  • The three components—comparative representations, propagation tracking, and smoothing—each add measurable performance gains.
  • Fine-grained localization becomes possible in long-form RAG outputs where hallucinations appear in isolated spans.
  • The gains hold across multiple large language models and evaluation sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar internal-state comparisons could be tested on generation tasks that do not use explicit retrieval.
  • Feedback from such detectors during training might encourage models to strengthen document influence on grounded tokens.
  • The method could be inserted into generation pipelines to flag and revise emerging hallucinations in real time.

Load-bearing premise

Tokens grounded in the retrieved documents are more strongly influenced by those documents in the model's internal representations than hallucinated tokens are.

What would settle it

If the difference in internal representations between the with-document and without-document conditions shows no reliable separation between known grounded tokens and known hallucinated tokens on a labeled benchmark, the core comparison signal would fail.

Figures

Figures reproduced from arXiv: 2606.31033 by Daisuke Kamisaka, Kazuaki Furumai, Kazunori Matsumoto, Shuichiro Haruta.

Figure 1
Figure 1. Figure 1: Overview of CORTEX for token-level hallucination detection in RAG, using reference-conditioned [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Detection example for an answer containing hallucinated content. Label-persistence smoothing suppresses [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detection example for an answer without hallucinated content. The contextual residual [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Sensitivity of token-level hallucination detection performance to [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity of answer-level hallucination detection performance to [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation case study with hallucination spans of different granularities. Without label-persistence [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation case study involving groundless advice. The answer includes additional advice that may be [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
read the original abstract

In this paper, we propose CORTEX, a token-level hallucination detection method for Retrieval-Augmented Generation (RAG). In long-form RAG outputs, hallucinations often arise in localized spans rather than throughout an entire response. CORTEX therefore identifies ungrounded content at the token level, enabling fine-grained localization of hallucinations. The key intuition behind CORTEX is that tokens grounded in retrieved documents should be more strongly influenced by those documents than hallucinated tokens. To capture this document-induced effect, CORTEX compares internal representations of a large language model (LLM) under two conditions: with and without the retrieved documents. Instead of relying solely on each token's immediate sensitivity to the retrieved documents, CORTEX also leverages the propagation of document-grounded information through preceding tokens, reducing false positives for tokens whose evidence has already been absorbed into the context. Finally, CORTEX applies post-processing smoothing step that models the tendency of hallucination labels to persist over contiguous spans, reducing local noise and encouraging span-consistent predictions. Experiments on two RAG benchmarks and three LLMs show that CORTEX substantially improves token-level hallucination detection, with each component consistently contributing to performance gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes CORTEX, a token-level hallucination detection method for RAG that compares an LLM's internal representations under with-document and without-document conditions. It incorporates propagation of document-grounded information through preceding tokens and a post-processing smoothing step to encourage span-consistent predictions. Experiments on two RAG benchmarks and three LLMs are claimed to show substantial improvements, with each component contributing to gains.

Significance. If the core comparative mechanism can be made reliable, the approach would provide a novel internal-representation-based signal for localizing hallucinations at the token level in long-form RAG outputs, potentially complementing existing logit- or embedding-based detectors.

major comments (2)
  1. [Approach (abstract and method description)] The central mechanism requires per-token comparison of internal representations between the with-document and without-document generations. However, removing the retrieved documents frequently produces a different token sequence, so the positions no longer align to the same content. No explicit alignment procedure (forced decoding, prefix-constrained generation, or extraction along the with-document trajectory) is described, rendering the difference signal undefined for most tokens. This directly undermines the claim that the comparative signal distinguishes grounded from hallucinated tokens.
  2. [Experiments (abstract)] The abstract asserts that 'each component consistently contributing to performance gains' and that CORTEX 'substantially improves' detection, yet supplies no quantitative metrics, baselines, ablation tables, or dataset statistics. Without these, the data-to-claim link cannot be evaluated and the contribution statements remain unevaluable.
minor comments (2)
  1. The abstract does not name the two RAG benchmarks or the three LLMs; these details should appear in the abstract or a dedicated experimental-setup paragraph.
  2. Notation for the two conditions (with vs. without documents) and for the representation vectors being compared should be introduced formally with symbols rather than prose only.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below, indicating the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Approach (abstract and method description)] The central mechanism requires per-token comparison of internal representations between the with-document and without-document generations. However, removing the retrieved documents frequently produces a different token sequence, so the positions no longer align to the same content. No explicit alignment procedure (forced decoding, prefix-constrained generation, or extraction along the with-document trajectory) is described, rendering the difference signal undefined for most tokens. This directly undermines the claim that the comparative signal distinguishes grounded from hallucinated tokens.

    Authors: We agree that the manuscript does not explicitly describe an alignment procedure between the two generation conditions, which leaves the per-token comparison underspecified. This is a genuine presentational gap. In the revised manuscript we will add a dedicated paragraph in Section 3.2 detailing the alignment method: we employ prefix-constrained decoding so that the without-document generation is forced to follow the exact token sequence produced by the with-document generation up to each comparison point. This ensures the internal-representation difference is computed on aligned positions. revision: yes

  2. Referee: [Experiments (abstract)] The abstract asserts that 'each component consistently contributing to performance gains' and that CORTEX 'substantially improves' detection, yet supplies no quantitative metrics, baselines, ablation tables, or dataset statistics. Without these, the data-to-claim link cannot be evaluated and the contribution statements remain unevaluable.

    Authors: The referee correctly notes that the abstract contains only qualitative claims without supporting numbers. Although abstracts are conventionally concise, we accept that the current wording makes the contribution statements difficult to evaluate. We will revise the abstract to include the primary quantitative results (F1 improvements on both benchmarks across the three LLMs) together with a brief reference to the ablation study and dataset sizes reported in Section 4. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method with independent validation

full rationale

The paper presents CORTEX as an empirical technique that compares LLM internal representations under with-document vs. without-document conditions, motivated by an explicit intuition and validated on external benchmarks. No equations, parameters, or predictions reduce to self-definition or fitted inputs by construction. No self-citation chains or uniqueness theorems are invoked as load-bearing. The derivation chain consists of design choices followed by experimental measurement, which remains falsifiable against held-out data and does not collapse into its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the stated intuition about document influence on internal representations plus the unverified effectiveness of the three components; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption Tokens grounded in retrieved documents should be more strongly influenced by those documents than hallucinated tokens.
    Explicitly identified in the abstract as the key intuition driving the comparative approach.

pith-pipeline@v0.9.1-grok · 5757 in / 1219 out tokens · 32172 ms · 2026-07-01T06:12:51.022769+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 14 canonical work pages

  1. [1]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  2. [2]

    Publications Manual , year = "1983", publisher =

  3. [3]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  4. [4]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  5. [5]

    Dan Gusfield , title =. 1997

  6. [6]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  7. [7]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  8. [8]

    2023 , publisher =

    Manakul, Potsawee and Liusie, Adian and Gales, Mark , booktitle =. 2023 , publisher =. doi:10.18653/v1/2023.emnlp-main.557 , pages =

  9. [9]

    From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge

    Li, Dawei and Jiang, Bohan and Huang, Liangjie and Beigi, Alimohammad and Zhao, Chengshuai and Tan, Zhen and Bhattacharjee, Amrita and Jiang, Yuxuan and Chen, Canyu and Wu, Tianhao and Shu, Kai and Cheng, Lu and Liu, Huan. From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge. Proceedings of the 2025 Conference on Empirical Methods ...

  10. [10]

    and Zhang, Hao and Gonzalez, Joseph E

    Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , title =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =. 2023 , url =

  11. [11]

    RAGA s: Automated Evaluation of Retrieval Augmented Generation

    Es, Shahul and James, Jithin and Espinosa Anke, Luis and Schockaert, Steven. RAGA s: Automated Evaluation of Retrieval Augmented Generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. 2024. doi:10.18653/v1/2024.eacl-demo.16

  12. [12]

    Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =

    Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil , title =. Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =. 2024 , isbn =

  13. [13]

    2025 , url =

    Zhang, Zhenliang and Hu, Xinyu and Zhang, Huixuan and Zhang, Junzhe and Wan, Xiaojun , booktitle =. 2025 , url =

  14. [14]

    ACM Transactions on Inf

    Huang, Lei and Yu, Weijiang and Ma, Weitao and Zhong, Weihong and Feng, Zhangyin and Wang, Haotian and Chen, Qianglong and Peng, Weihua and Feng, Xiaocheng and Qin, Bing and Liu, Ting , title =. 2025 , publisher =. doi:10.1145/3703155 , journal =

  15. [15]

    2025 , journal=

    Why Language Models Hallucinate , author=. 2025 , journal=

  16. [16]

    2024 , url=

    Retrieval-Augmented Generation for Large Language Models: A Survey , author=. 2024 , url=

  17. [17]

    Rag-Fusion: A New Take on Retrieval Augmented Generation , volume=

    Rackauckas, Zackary , year=. Rag-Fusion: A New Take on Retrieval Augmented Generation , volume=. International Journal on Natural Language Computing , publisher=. doi:10.5121/ijnlc.2024.13103 , number=

  18. [18]

    Proceedings of the 2021 International Conference on Learning Representations , year=

    Uncertainty Estimation in Autoregressive Structured Prediction , author=. Proceedings of the 2021 International Conference on Learning Representations , year=

  19. [19]

    The internal state of an LLM knows when it’s lying

    Azaria, Amos and Mitchell, Tom , booktitle =. The Internal State of an. 2023 , publisher =. doi:10.18653/v1/2023.findings-emnlp.68 , pages =

  20. [20]

    Computing Research Repository , year=

    HalluHard: A Hard Multi-Turn Hallucination Benchmark , author=. Computing Research Repository , year=

  21. [21]

    2024 , url =

    Llama 3.1 Model Card , author=. 2024 , url =

  22. [22]

    Qwen3: Think Deeper, Act Faster , url =

    Qwen Team , year =. Qwen3: Think Deeper, Act Faster , url =

  23. [23]

    2025 , journal=

    Qwen3 Technical Report , author=. 2025 , journal=

  24. [24]

    2024 , journal=

    The Llama 3 Herd of Models , author=. 2024 , journal=

  25. [25]

    https://aclanthology.org/2024.acl-long.585/

    Niu, Cheng and Wu, Yuanhao and Zhu, Juno and Xu, Siliang and Shum, KaShun and Zhong, Randy and Song, Juntong and Zhang, Tong , booktitle =. 2024 , url = "https://aclanthology.org/2024.acl-long.585/", publisher =

  26. [26]

    In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

    Li, Junyi and Cheng, Xiaoxue and Zhao, Xin and Nie, Jian-Yun and Wen, Ji-Rong , editor =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , year =. doi:10.18653/v1/2023.emnlp-main.397 , pages =

  27. [27]

    The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM's Internal States , journal=

    Fabian Ridder and Malte Schilling , journal=. The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM's Internal States , journal=. 2024 , volume=

  28. [28]

    Zero-shot Persuasive Chatbots with

    Furumai, Kazuaki and Legaspi, Roberto and Romero, Julio Cesar Vizcarra and Yamazaki, Yudai and Nishimura, Yasutaka and Semnani, Sina and Ikeda, Kazushi and Shi, Weiyan and Lam, Monica , booktitle =. Zero-shot Persuasive Chatbots with. 2024 , publisher =. doi:10.18653/v1/2024.findings-emnlp.656 , pages =

  29. [29]

    The Fourteenth International Conference on Learning Representations , year=

    Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders , author=. The Fourteenth International Conference on Learning Representations , year=

  30. [30]

    2005 , publisher=

    Inference in Hidden Markov Models , author=. 2005 , publisher=

  31. [31]

    , title =

    Rabiner, Lawrence R. , title =. Readings in Speech Recognition , pages =. 1990 , isbn =

  32. [32]

    Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity

    Ganesh, Prakhar and Shokri, Reza and Farnadi, Golnoosh. Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 1: Long Papers). 2026. doi:10.18653/v1/2026.eacl-long.327

  33. [33]

    Too Consistent to Detect: A Study of Self-Consistent Errors in LLM s

    Tan, Hexiang and Sun, Fei and Liu, Sha and Su, Du and Cao, Qi and Chen, Xin and Wang, Jingang and Cai, Xunliang and Wang, Yuanzhuo and Shen, Huawei and Cheng, Xueqi. Too Consistent to Detect: A Study of Self-Consistent Errors in LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.238

  34. [34]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =

    Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K\". Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =. Advances in Neural Information Processing Systems , editor =

  35. [35]

    The Illusion of Progress: Re-evaluating Hallucination Detection in LLM s

    Janiak, Denis and Binkowski, Jakub and Sawczyn, Albert and Gabrys, Bogdan and Shwartz-Ziv, Ravid and Kajdanowicz, Tomasz Jan. The Illusion of Progress: Re-evaluating Hallucination Detection in LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1761

  36. [36]

    Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models

    Liu, Qiang and Chen, Xinlong and Ding, Yue and Song, Bowen and Wang, Weiqiang and Wu, Shu and Wang, Liang. Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1063

  37. [37]

    H allu L ens: LLM Hallucination Benchmark

    Bang, Yejin and Ji, Ziwei and Schelten, Alan and Hartshorn, Anthony and Fowler, Tara and Zhang, Cheng and Cancedda, Nicola and Fung, Pascale. H allu L ens: LLM Hallucination Benchmark. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1176

  38. [38]

    and McCallum, Andrew and Pereira, Fernando C

    Lafferty, John D. and McCallum, Andrew and Pereira, Fernando C. N. , title =. Proceedings of the Eighteenth International Conference on Machine Learning , pages =. 2001 , isbn =

  39. [39]

    Computing Research Repository , year=

    GPT-4o System Card , author=. Computing Research Repository , year=