pith. machine review for the scientific record. sign in

arxiv: 2602.06718 · v2 · submitted 2026-02-06 · 💻 cs.CR · cs.AI

Recognition: unknown

GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models

Authors on Pith no claims yet
classification 💻 cs.CR cs.AI
keywords citationscitationfindingllmsmodelsthreatvalidityacademic
0
0 comments X
read the original abstract

Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensified: LLMs are increasingly used for academic writing, but their tendency to fabricate citations (``ghost citations'') poses a systemic threat to citation validity. To quantify this threat, we develop \citeb, an open-source framework for large-scale citation verification, and conduct a comprehensive study of citation validity in the LLM era through three complementary experiments. First, we benchmark 13 LLMs on citation generation task in various research domains, finding that all models hallucinate citations at rate from 14.23\% to 94.93\%. Second, we analyze 2.2 million citations from 56,381 papers at AI/ML and Security venues (2020--2025), finding that 1.07\% of papers contain invalid citations, with an 80.9\% increase in 2025. Third, we survey 97 researchers, finding that 87.2\% use AI-powered tools in their workflows, 76.7\% of reviewers do not thoroughly check references, and 74.5\% view peer review as ineffective at catching citation errors. Based on these findings, we argue that ghost citations represent a systemic threat to academic integrity, and call for coordinated efforts from community to address this challenge.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation

    cs.DL 2026-04 conditional novelty 7.0

    Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.

  2. sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing

    cs.DL 2026-04 unverdicted novelty 6.0

    An open-source local linter verifies reference integrity and claim support in scientific manuscripts using public databases and consumer hardware, with an experimental contribution scoring extension.

  3. HalluCiteChecker: A Lightweight Toolkit for Hallucinated Citation Detection and Verification in the Era of AI Scientists

    cs.CL 2026-04 unverdicted novelty 5.0

    HalluCiteChecker is a lightweight, offline, CPU-only toolkit that detects hallucinated citations in AI-assisted scientific papers.