LLM Self-Recognition: Steering and Retrieving Activation Signatures
Pith reviewed 2026-06-28 01:38 UTC · model grok-4.3
The pith
Steering an LLM's residual stream with a random sparse vector embeds a recoverable fingerprint for attributing generated text to that model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By steering the internal residual stream during generation with a random sparse vector, we create a detectable fingerprint that enables attribution of a given text to a specific LLM. This signal is recoverable from the activations of an LLM used as a detector, achieving over 98% accuracy across multiple detection settings while preserving the quality of generated text.
What carries the argument
Steering the residual stream with a random sparse vector to encode a model-specific fingerprint.
If this is right
- Self-recognition remains reliable even in low-entropy generation scenarios.
- One steering mechanism supports identification across multiple distinct LLMs.
- The fingerprint is retrievable directly from internal activations without altering the output text.
- Activation spaces contain structure that supports signal encoding free of semantic interference.
- The approach supplies a practical internal method for text attribution.
Where Pith is reading between the lines
- The technique might extend to distinguishing fine-tuned variants of the same base model.
- Similar sparse interventions could be tested for encoding other retrievable metadata such as generation time or user context.
- Detection could be applied in settings where external watermarks are unavailable or undesirable.
Load-bearing premise
Activation spaces contain exploitable structure allowing signals to be encoded via sparse steering without semantic interference or quality degradation.
What would settle it
If the detection accuracy falls near chance levels on text generated with the steering vector, or if standard quality metrics such as perplexity show clear degradation when the vector is applied.
Figures
read the original abstract
Recent advances in interpretability suggest that large language models (LLMs) implicitly encode signals in their generated text that enable self-recognition of their outputs. We demonstrate that this capability is reliable, even in low-entropy scenarios, and that it can be amplified through targeted intervention. By steering the internal residual stream during generation with a random sparse vector, we create a detectable fingerprint that enables attribution of a given text to a specific LLM. This signal is recoverable from the activations of an LLM used as a detector, achieving over 98% accuracy across multiple detection settings while preserving the quality of generated text. As AI-generated content proliferates, this approach offers a practical alternative to traditional detectors by leveraging the model's natural representation structure for attribution rather than embedding a signal externally. Our contributions include: (i) establishing reliable self-recognition capabilities in LLMs, (ii) a simple steering mechanism enabling multi-LLM identification with no quality degradation, (iii) demonstrating that activation spaces contain exploitable structure for encoding signals without semantic interference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that LLMs implicitly encode self-recognition signals in generated text, which can be amplified by steering the internal residual stream with a random sparse vector to embed a detectable fingerprint. This enables attribution of text to a specific LLM via activations from a detector LLM, achieving over 98% accuracy across multiple settings while preserving output quality. It positions the approach as a practical, internal alternative to external watermarking by exploiting structure in activation spaces for signal encoding without semantic interference.
Significance. If the empirical claims are substantiated with rigorous controls and ablations, the result would be significant for AI interpretability and content attribution, demonstrating that activation spaces contain exploitable structure allowing sparse steering to encode recoverable signals without quality degradation or semantic interference. This could advance model-native methods for multi-LLM identification and self-recognition in low-entropy scenarios.
major comments (1)
- [Abstract] Abstract: the claim that steering with a random sparse vector yields a recoverable fingerprint at over 98% detector accuracy with no quality degradation supplies no experimental details, baselines, statistical tests, or controls, rendering the central empirical claim unverifiable from the provided text.
Simulated Author's Rebuttal
We thank the referee for their review and the opportunity to clarify the manuscript. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that steering with a random sparse vector yields a recoverable fingerprint at over 98% detector accuracy with no quality degradation supplies no experimental details, baselines, statistical tests, or controls, rendering the central empirical claim unverifiable from the provided text.
Authors: We agree that the abstract, as a concise summary, omits the specific experimental details, baselines, statistical tests, and controls. These elements are fully reported in the main manuscript: Section 3 details the residual-stream steering procedure (random sparse vectors with sparsity 0.01 and scaling factor 0.5), the detector-LLM activation extraction protocol, and the multi-model attribution setup; Section 4 reports accuracy (>98% across Llama-2-7B, Mistral-7B, and Gemma-7B), 5-fold cross-validation, paired t-tests (p<0.001), baselines (unsteered generation and random dense vectors), and quality controls (perplexity, MAUVE, and human preference scores showing no degradation). We acknowledge that the abstract could be improved by briefly referencing these elements. We will revise the abstract to include a short clause noting the presence of rigorous controls and quantitative results while remaining within length limits. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper presents empirical claims about steering the residual stream with a random sparse vector to create a recoverable fingerprint, achieving >98% detection accuracy while preserving output quality. No equations, derivations, or self-referential constructions appear in the provided abstract or contributions that reduce results to fitted parameters by definition or to self-citations. The work relies on experimental demonstration of activation-space structure rather than tautological mappings or load-bearing uniqueness theorems from prior author work. This is a standard empirical interpretability study with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs implicitly encode self-recognition signals in generated text activations
Reference graph
Works this paper leans on
-
[1]
Where Confabulation Lives: Latent Feature Discovery in LLM s
Ardoin, Thibaud and Cai, Yi and Wunder, Gerhard. Where Confabulation Lives: Latent Feature Discovery in LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1515
-
[2]
arXiv preprint arXiv:2507.14805 , year=
Subliminal learning: Language models transmit behavioral traits via hidden signals in data , author=. arXiv preprint arXiv:2507.14805 , year=
-
[3]
doi:10.5281/zenodo.12608602 , url =
Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...
-
[4]
Measuring Massive Multitask Language Understanding
Hendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, Jacob , keywords =. Measuring Massive Multitask Language Understanding , publisher =. 2020 , copyright =. doi:10.48550/ARXIV.2009.03300 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2009.03300 2020
-
[5]
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection , publisher =
Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng , keywords =. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection , publisher =. 2023 , copyright =. doi:10.48550/ARXIV.2301.07597 , url =
-
[6]
ELI5: Long Form Question Answering
Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael , keywords =. ELI5: Long Form Question Answering , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1907.09190 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.09190 2019
-
[7]
Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella , keywords =. Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , publisher =. 2018 , copyright =. doi:10.48550/ARXIV.1808.08745 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1808.08745 2018
-
[8]
Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M
Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat. XL -Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.413
-
[9]
International Journal for Educational Integrity , volume=
Maintaining research integrity in the age of GenAI: an analysis of ethical challenges and recommendations to researchers , author=. International Journal for Educational Integrity , volume=. 2025 , publisher=
2025
-
[10]
Yousaf, Muhammad Nadeem , year =. Practical Considerations and Ethical Implications of Using Artificial Intelligence in Writing Scientific Manuscripts , volume =. ACG Case Reports Journal , publisher =. doi:10.14309/crj.0000000000001629 , number =
-
[11]
International Conference on Machine Learning , pages=
A watermark for large language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[12]
The Thirty Seventh Annual Conference on Learning Theory , pages=
Undetectable watermarks for language models , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=
2024
-
[13]
2025 IEEE Symposium on Security and Privacy (SP) , pages=
Sok: Watermarking for ai-generated content , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=
2025
-
[14]
arXiv preprint arXiv:2506.07403 , year=
Enhancing Watermarking Quality for LLMs via Contextual Generation States Awareness , author=. arXiv preprint arXiv:2506.07403 , year=
-
[15]
arXiv preprint arXiv:2508.08211 , year=
SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling , author=. arXiv preprint arXiv:2508.08211 , year=
-
[16]
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation , publisher =
Hou, Abe Bohan and Zhang, Jingyu and He, Tianxing and Wang, Yichen and Chuang, Yung-Sung and Wang, Hongwei and Shen, Lingfeng and Van Durme, Benjamin and Khashabi, Daniel and Tsvetkov, Yulia , keywords =. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation , publisher =. 2023 , copyright =. doi:10.48550/ARXIV.2310.03991 , url =
-
[17]
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders , publisher =
Kuznetsov, Kristian and Kushnareva, Laida and Druzhinina, Polina and Razzhigaev, Anton and Voznyuk, Anastasia and Piontkovskaya, Irina and Burnaev, Evgeny and Barannikov, Serguei , keywords =. Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders , publisher =. 2025 , copyright =. doi:10.48550/ARXIV.2503.03601 , url =
-
[18]
Text Fluoroscopy: Detecting LLM-Generated Text through Intrinsic Features , url =
Yu, Xiao and Chen, Kejiang and Yang, Qi and Zhang, Weiming and Yu, Nenghai , year =. Text Fluoroscopy: Detecting LLM-Generated Text through Intrinsic Features , url =. doi:10.18653/v1/2024.emnlp-main.885 , booktitle =
-
[19]
Krishna, Kalpesh and Song, Yixiao and Karpinska, Marzena and Wieting, John and Iyyer, Mohit , keywords =. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , publisher =. 2023 , copyright =. doi:10.48550/ARXIV.2303.13408 , url =
-
[20]
arXiv preprint arXiv:2303.11156 , year=
Can AI-generated text be reliably detected? , author=. arXiv preprint arXiv:2303.11156 , year=
-
[21]
2023 , organization=
Mitchell, Eric and Lee, Yoonho and Khazatsky, Alexander and Manning, Christopher D and Finn, Chelsea , booktitle=. 2023 , organization=
2023
-
[22]
Proceedings of the 41st International Conference on Machine Learning , articleno =
Hans, Abhimanyu and Schwarzschild, Avi and Cherepanova, Valeriia and Kazemi, Hamid and Saha, Aniruddha and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =
2024
-
[23]
The Journal of the Acoustical Society of America , volume = 62, number =
Perplexity--a measure of the difficulty of speech recognition tasks , author =. The Journal of the Acoustical Society of America , volume = 62, number =. doi:10.1121/1.2016299 , issn =
-
[24]
The Internal State of an LLM Knows When It's Lying
Azaria, Amos and Mitchell, Tom , keywords =. The Internal State of an LLM Knows When It's Lying , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2304.13734 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.13734 2023
-
[25]
Marks, Samuel and Tegmark, Max , keywords =. The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2310.06824 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06824 2023
-
[26]
Discovering Latent Knowledge in Language Models Without Supervision
Burns, Collin and Ye, Haotian and Klein, Dan and Steinhardt, Jacob , keywords =. Discovering Latent Knowledge in Language Models Without Supervision , journal =. 2022 , copyright =. doi:10.48550/ARXIV.2212.03827 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.03827 2022
-
[27]
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Li, Kenneth and Patel, Oam and Viégas, Fernanda and Pfister, Hanspeter and Wattenberg, Martin , keywords =. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2306.03341 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.03341 2023
-
[28]
Do Androids Know They ' re Only Dreaming of Electric Sheep?
CH-Wang, Sky and Van Durme, Benjamin and Eisner, Jason and Kedzie, Chris. Do Androids Know They ' re Only Dreaming of Electric Sheep?. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.260
-
[29]
LLM Internal States Reveal Hallucination Risk Faced With a Query , journal =
Ji, Ziwei and Chen, Delong and Ishii, Etsuko and Cahyawijaya, Samuel and Bang, Yejin and Wilie, Bryan and Fung, Pascale , keywords =. LLM Internal States Reveal Hallucination Risk Faced With a Query , journal =. 2024 , copyright =. doi:10.48550/ARXIV.2407.03282 , url =
-
[30]
arXiv preprint arXiv:2310.01405 , year =
Representation Engineering: A Top-Down Approach to AI Transparency , author =. arXiv preprint arXiv:2310.01405 , year =
-
[31]
Does Representation Matter? Exploring Intermediate Layers in Large Language Models , journal =
Skean, Oscar and Arefin, Md Rifat and LeCun, Yann and Shwartz-Ziv, Ravid , keywords =. Does Representation Matter? Exploring Intermediate Layers in Large Language Models , journal =. 2024 , copyright =. doi:10.48550/ARXIV.2412.09563 , url =
-
[32]
Steering Language Models With Activation Engineering
Turner, Alexander Matt and Thiergart, Lisa and Leech, Gavin and Udell, David and Vazquez, Juan J. and Mini, Ulisse and MacDiarmid, Monte , keywords =. Steering Language Models With Activation Engineering , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2308.10248 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.10248 2023
-
[33]
Steering Llama 2 via Contrastive Activation Addition
Panickssery, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander Matt , keywords =. Steering Llama 2 via Contrastive Activation Addition , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2312.06681 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.06681 2023
-
[34]
Liu, Sheng and Ye, Haotian and Xing, Lei and Zou, James , keywords =. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2311.06668 , url =
-
[35]
Steering Large Language Model Activations in Sparse Spaces , publisher =
Bayat, Reza and Rahimi-Kalahroudi, Ali and Pezeshki, Mohammad and Chandar, Sarath and Vincent, Pascal , keywords =. Steering Large Language Model Activations in Sparse Spaces , publisher =. 2025 , copyright =. doi:10.48550/ARXIV.2503.00177 , url =
-
[36]
and Conerly, T
Templeton, A. and Conerly, T. and Marcus, J. and Lindsey, J. and Bricken, T. and Chen, B. and Pearce, A. and Citro, C. and Ameisen, E. and Jones, A. and Cunningham, H. and Turner, N. L. and McDougall, C. and MacDiarmid, M. and Freeman, C. D. and Sumers, T. R. and Rees, E. and Batson, J. and Jermyn, A. and Carter, S. and Olah, C. and Henighan, T. , title =
-
[37]
May 24th, 2023 , journal=
Interpretability Dreams , author=. May 24th, 2023 , journal=
2023
-
[38]
2022 , howpublished =
Toy Models of Superposition , author =. 2022 , howpublished =
2022
-
[39]
2023 , journal=
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. 2023 , journal=
2023
-
[40]
2020 , howpublished =
Zoom In: An Introduction to Circuits , author =. 2020 , howpublished =
2020
-
[41]
Mechanistic Interpretability for AI Safety -- A Review
Bereska, Leonard and Gavves, Efstratios , keywords =. Mechanistic Interpretability for AI Safety -- A Review , journal=. 2024 , copyright =. doi:10.48550/ARXIV.2404.14082 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.14082 2024
-
[42]
The Twelfth International Conference on Learning Representations , year=
Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. The Twelfth International Conference on Learning Representations , year=
-
[43]
The Linear Representation Hypothesis and the Geometry of Large Language Models
Park, Kiho and Choe, Yo Joong and Veitch, Victor , keywords =. The Linear Representation Hypothesis and the Geometry of Large Language Models , journal=. 2023 , copyright =. doi:10.48550/ARXIV.2311.03658 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.03658 2023
-
[44]
Linguistic Regularities in Continuous Space Word Representations
Mikolov, Tomas and Yih, Wen-tau and Zweig, Geoffrey. Linguistic Regularities in Continuous Space Word Representations. Proceedings of the 2013 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013
2013
-
[45]
Efficient Estimation of Word Representations in Vector Space
Mikolov, Tomas and Chen, Kai and Corrado, Greg and Dean, Jeffrey , keywords =. Efficient Estimation of Word Representations in Vector Space , journal =. 2013 , copyright =. doi:10.48550/ARXIV.1301.3781 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1301.3781 2013
-
[46]
Ailon, Nir and Chazelle, Bernard , title =. Commun. ACM , month = feb, pages =. 2010 , issue_date =. doi:10.1145/1646353.1646379 , abstract =
-
[47]
, year =
Johnson, William and Lindenstrauss, J. , year =. Extensions of Lipschitz mappings into a Hilbert space , volume =
-
[48]
Vershynin, Roman , title =
-
[49]
High-Dimensional Probability: An Introduction with Applications in Data Science , publisher=
Vershynin, Roman , year=. High-Dimensional Probability: An Introduction with Applications in Data Science , publisher=
-
[50]
arXiv preprint arXiv:2407.10671 , year=
Qwen2 technical report , author=. arXiv preprint arXiv:2407.10671 , year=
-
[51]
arXiv preprint arXiv:2410.05355 , year =
Falcon Mamba: The First Competitive Attention-free 7B Language Model , author =. arXiv preprint arXiv:2410.05355 , year =
-
[52]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and Bikel, Dan and Blecher, Lukas and Ferrer, Cristian Canton and Chen, Moya and Cucurull, Guillem and Esiobu, David , keywords =. Llama 2: Open Foundation and Fine-...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.09288 2023
-
[53]
Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and Yang, Amy and Fan, Angela and Goyal, Anirudh and Hartshorn, Anthony and Yang, Aobo and Mitra, Archi and Sravankumar, Archie , keywords =. The Llama 3 Herd of ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
-
[54]
2023 , eprint=
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , author=. 2023 , eprint=
2023
-
[55]
Quality Classifier DeBERTa , year =
-
[56]
and Varoquaux, G
Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E. , journal=. Scikit-learn: Machine Learning in
-
[57]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , keywords =. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , publisher =. arXiv preprint arXiv:2306.05685 , url =. 2023 , c...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685 2023
-
[58]
Journal of Multivariate Analysis , volume =
A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices , author =. Journal of Multivariate Analysis , volume =. 2004 , month = feb, doi =
2004
-
[59]
LLMs Will Always Hallucinate, and We Need to Live With This , journal =
Banerjee, Sourav and Agarwal, Ayushi and Singla, Saloni , keywords =. LLMs Will Always Hallucinate, and We Need to Live With This , journal =. 2024 , copyright =. doi:10.48550/ARXIV.2409.05746 , url =
-
[60]
On Faithfulness and Factuality in Abstractive Summarization
Maynez, Joshua and Narayan, Shashi and Bohnet, Bernd and McDonald, Ryan , year =. On Faithfulness and Factuality in Abstractive Summarization , url =. doi:10.18653/v1/2020.acl-main.173 , booktitle =
-
[61]
Alignment for Honesty , journal =
Yang, Yuqing and Chern, Ethan and Qiu, Xipeng and Neubig, Graham and Liu, Pengfei , keywords =. Alignment for Honesty , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2312.07000 , url =
-
[62]
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Xiong, Miao and Hu, Zhiyuan and Lu, Xinyang and Li, Yifei and Fu, Jie and He, Junxian and Hooi, Bryan , keywords =. Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs , journal =. 2023 , copyright =. doi:10.48550/ARXIV.2306.13063 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.13063 2023
-
[63]
Berberette, Elijah and Hutchins, Jack and Sadovnik, Amir , keywords =. Redefining "Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformation , journal =. 2024 , copyright =. doi:10.48550/ARXIV.2402.01769 , url =
-
[64]
Gondode, Prakash and Duggal, Sakshi and Mahor, Vaishali , year =. Artificial intelligence hallucinations in anaesthesia: Causes, consequences and countermeasures , volume =. Indian Journal of Anaesthesia , publisher =. doi:10.4103/ija.ija_203_24 , number =
-
[65]
Choudhury, Avishek and Chaudhry, Zaira , year =. Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals , volume =. doi:10.2196/56764 , journal =
-
[66]
Dahl, Matthew and Magesh, Varun and Suzgun, Mirac and Ho, Daniel E. , keywords =. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models , publisher =. 2024 , copyright =. doi:10.48550/ARXIV.2401.01301 , url =
-
[67]
Chen, Zhiyu Zoey and Ma, Jing and Zhang, Xinlu and Hao, Nan and Yan, An and Nourbakhsh, Armineh and Yang, Xianjun and McAuley, Julian and Petzold, Linda and Wang, William Yang , keywords =. A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law , journal =. 2024 , copyright =. doi:10.48550/ARXIV.2405.01769 , url =
-
[68]
Quantifying the Uncertainty of LLM Hallucination Spreading in Complex Adaptive Social Networks , author =. Scientific Reports , volume =. 2024 , month =. doi:10.1038/s41598-024-66708-4 , url =
-
[69]
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
Ackerman, Christopher and Panickssery, Nina , year = 2025, month = apr, publisher =. Inspection and. doi:10.48550/arXiv.2410.02064 , urldate =. arXiv , keywords =:2410.02064 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2410.02064 2025
-
[70]
Chen, Xin and Wu, Junchao and Yang, Shu and Zhan, Runzhe and Wu, Zeyu and Luo, Ziyang and Wang, Di and Yang, Min and Chao, Lidia S. and Wong, Derek F. , year = 2025, month = aug, publisher =. doi:10.48550/arXiv.2508.13152 , urldate =. arXiv , langid =:2508.13152 , primaryclass =
-
[71]
LLM Evaluators Recognize and Favor Their Own Generations , url =
Bowman, Samuel and Feng, Shi and Panickssery, Arjun , year =. LLM Evaluators Recognize and Favor Their Own Generations , url =. doi:10.52202/079017-2197 , booktitle =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.