GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models

Zuyao Xu , Yuqi Qiu , Lu Sun , Fasheng Miao , Fubin Wu , Xiang Li , Xinyi Wang , Haozhe Lu

show 9 more authors

Zhengze Zhang Yuxin Hu Jialu Li Luo Jin Feng Zhang Rui Luo Xinran Liu Yingxian Li Jiaji Liu

Authors on Pith no claims yet

classification 💻 cs.CR cs.AI

keywords citationscitationfindingllmsmodelsthreatvalidityacademic

0 comments

read the original abstract

Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensified: LLMs are increasingly used for academic writing, but their tendency to fabricate citations (``ghost citations'') poses a systemic threat to citation validity. To quantify this threat, we develop \citeb, an open-source framework for large-scale citation verification, and conduct a comprehensive study of citation validity in the LLM era through three complementary experiments. First, we benchmark 13 LLMs on citation generation task in various research domains, finding that all models hallucinate citations at rate from 14.23\% to 94.93\%. Second, we analyze 2.2 million citations from 56,381 papers at AI/ML and Security venues (2020--2025), finding that 1.07\% of papers contain invalid citations, with an 80.9\% increase in 2025. Third, we survey 97 researchers, finding that 87.2\% use AI-powered tools in their workflows, 76.7\% of reviewers do not thoroughly check references, and 74.5\% view peer review as ineffective at catching citation errors. Based on these findings, we argue that ghost citations represent a systemic threat to academic integrity, and call for coordinated efforts from community to address this challenge.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation
cs.DL 2026-04 conditional novelty 7.0

Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.
sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing
cs.DL 2026-04 unverdicted novelty 6.0

An open-source local linter verifies reference integrity and claim support in scientific manuscripts using public databases and consumer hardware, with an experimental contribution scoring extension.
HalluCiteChecker: A Lightweight Toolkit for Hallucinated Citation Detection and Verification in the Era of AI Scientists
cs.CL 2026-04 unverdicted novelty 5.0

HalluCiteChecker is a lightweight, offline, CPU-only toolkit that detects hallucinated citations in AI-assisted scientific papers.