pith. machine review for the scientific record. sign in

arxiv: 2602.23452 · v3 · submitted 2026-02-26 · 💻 cs.CL · cs.DL

Recognition: unknown

CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era

Authors on Pith no claims yet
classification 💻 cs.CL cs.DL
keywords citeauditverificationbenchmarkcitationcitationsframeworkllmsreferences
0
0 comments X
read the original abstract

Scientific research relies on citation integrity, yet large language models (LLMs) have introduced a critical risk: fabricated references that appear plausible but correspond to no real publications. As manual verification becomes infeasible and existing automated tools remain fragile, we introduce CiteAudit, a comprehensive benchmark and detection framework for hallucinated citations. We design a multi-agent verification pipeline that decomposes citation checking into metadata extraction, memory lookup, web-based retrieval, and final judgment. To evaluate this, we construct a large-scale, human-validated dataset spanning diverse domains and hallucination types. Experiments demonstrate that our framework achieves superior verification performance over state-of-the-art LLMs and commercial baselines. Our work provides the necessary infrastructure to audit citations at scale and safeguard the trustworthiness of scholarly discourse. Code is available at https://github.com/shiiiikw/CiteAudit.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Source or It Didn't Happen: A Multi-Agent Framework for Citation Hallucination Detection

    cs.CL 2026-05 accept novelty 7.0

    CiteTracer detects citation hallucinations at 97.1% accuracy on synthetic and real-world benchmarks by combining structured extraction, multi-source retrieval, deterministic matching, and class-specialist agents.

  2. Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

    cs.CL 2026-05 unverdicted novelty 7.0

    A new framework parses and evaluates citations in LLM deep research reports across link validity, relevance, and factuality, finding 94%+ link success but only 39-77% factual accuracy.

  3. BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation

    cs.DL 2026-04 conditional novelty 7.0

    Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.

  4. sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing

    cs.DL 2026-04 unverdicted novelty 6.0

    An open-source local linter verifies reference integrity and claim support in scientific manuscripts using public databases and consumer hardware, with an experimental contribution scoring extension.