Recognition: unknown
How unique are hallucinated citations offered by generative Artificial Intelligence models?
Pith reviewed 2026-05-08 02:24 UTC · model gemini-3-flash-preview
The pith
AI-generated citations are not random errors but patterned references that recur across academic literature.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The author demonstrates that AI-generated citations are patterned reconstructions rather than random inventions, with nearly 30% of identified hallucinations being identical duplicates. By investigating 137 sources that cite a non-existent paper, the study finds that models reconstruct references by calculating the high probability of specific author-topic pairings. This reveals that the hallucination is a structural feature of how these models represent academic fields, leading to the viral spread of specific, plausible fictions across published literature.
What carries the argument
Patterned recombination: the process where a model synthesizes a plausible citation by linking real metadata—such as authors and journals—that frequently appear together in its training data.
If this is right
- Verification tools will need to shift from checking if authors exist to checking if specific combinations of author, title, and date have ever been published.
- The recurrence of specific fake citations allows for the creation of signature-based detection for AI-generated content in peer review.
- Web-enabled AI models reduce but do not eliminate the risk of fabrication, as they may still default to probabilistic generation when specific facts are elusive.
- Bibliometric data like citation counts and h-indices will become increasingly unreliable as phantom citations propagate through automated indexing.
Where Pith is reading between the lines
- The existence of recurring phantoms suggests that large language models possess a latent bibliography of high-probability but non-existent works that could be mapped and anticipated.
- If left unchecked, these recurring fictions may become truth by consensus as they are cited by subsequent human researchers who trust the initial AI-influenced sources.
Load-bearing premise
The study assumes that the recurrence of the specific fake citation in recent literature is primarily driven by AI tools rather than by humans copying errors from a single pre-AI source.
What would settle it
Searching for the phantom citation in a large-scale archive of academic papers published before 2020 to see if it appears at all before the rise of generative AI.
read the original abstract
This paper investigates how generative AI produces and propagates hallucinated academic references, focusing on the recurring non-existent citation 'Education Governance and Datafication' attributed to Ben Williamson and Nelli Piattoeva. Drawing on 137 accessible source papers identified through Google Scholar and Google searches, the study analyses the structure, recurrence, and onward citation of this phantom reference. It shows that hallucinated citations are not random inventions but patterned recombinations of real authors, journals, dates, and keywords, with duplication occurring in nearly 30% of cases. The paper also reports a structured interrogation of ChatGPT 5-mini about how it generates citations and finds that, absent verification, the model reconstructs plausible references from learned patterns rather than factual recall. Finally, ten AI-generated essays on datafication and school governance were examined: while most references were genuine or partly accurate, 9.2% remained hallucinated, including an exact match to the most common phantom citation. The findings highlight ongoing risks to academic integrity and show that web-enabled AI still does not fully eliminate fabricated references.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper investigates the nature and persistence of 'hallucinated' academic citations in generative AI, focusing on a specific recurring phantom reference: 'Education Governance and Datafication' attributed to Ben Williamson and Nelli Piattoeva. By analyzing 137 real-world documents that cite this non-existent work and performing a structured interrogation of ChatGPT 5-mini, the authors argue that hallucinations are not random but follow a 'patterned recombination' logic where real author names, keywords, and journal titles are fused into plausible-sounding but false references. The study further tests the propagation of these errors through AI-generated essays, finding that even newer, web-enabled models continue to produce these specific phantom citations at a non-trivial rate (9.2%).
Significance. The study provides a valuable empirical audit of how a single 'viral error' propagates through the scholarly record and is reinforced by AI training loops. Its strength lies in the 'forensic' approach of tracing a specific hallucination (Williamson & Piattoeva) from its likely origin to its appearance in 137 published or semi-published documents. The use of ChatGPT 5-mini—a model contemporary to the paper's 2026 context—demonstrates that web-search capabilities do not inherently solve the 'hallucination' problem, as the model may prioritize widespread errors found in its training corpus over verifiable facts. This work is highly relevant to digital libraries, academic integrity, and the study of AI-mediated misinformation.
major comments (2)
- [§4. 'Recombination Logic'] The paper characterizes the 30% duplication rate as evidence of 'structural recombination logic.' However, the existence of 137 source papers containing the exact same 'Williamson & Piattoeva (2019)' phantom suggests a 'viral error' or 'canonical hallucination' already present in the training data. There is a risk of conflating architectural generation (stochastic recombination) with simple retrieval of a high-frequency incorrect string from the training set. The authors should clarify if the 'pattern' observed is a property of the model's generative mechanics or a reflection of data-cleanliness issues in the corpus the model was trained on.
- [§4. 'Model Interrogation'] The interrogation of ChatGPT 5-mini is used to support the theory of how the model 'reconstructs' references. Caution is needed here: LLMs are prone to 'sycophancy' or 'hallucinated self-explanation,' where they provide plausible-sounding explanations of their own internal weights that do not necessarily map to technical reality. The claim that the model 'admits' to its reconstruction process should be framed as a behavioral observation of the model's output rather than a factual account of its internal processing architecture.
minor comments (3)
- [§2. Table 1] It would be beneficial to categorize the 137 identified documents by 'type' (e.g., peer-reviewed, preprint, student essay, blog post) to better understand the vector of propagation for this specific phantom citation.
- [§5. Methodology] When testing the 10 AI-generated essays, the paper should specify whether 'Web Search' or 'Browse with Bing' features were enabled for each prompt, as this significantly impacts whether the model is relying on internal weights or current (potentially erroneous) web results.
- [Introduction] The paper correctly identifies the 2019 date as a 'pivot point' for the authors' actual collaborations. Explicitly citing the real 2019/2020 works by Williamson and Piattoeva that are most proximal to the phantom title would strengthen the 'patterned recombination' argument.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our 'forensic' approach and the relevance of our findings to digital libraries and academic integrity. The referee raises two important conceptual points regarding the nature of the observed hallucinations and the analytical status of LLM self-explanations. We agree that a clearer distinction between architectural generation (the initial 'birth' of a hallucination) and training-data retrieval (its subsequent 'viral' propagation) is necessary to explain the high duplication rates. Furthermore, we accept the caution regarding 'hallucinated self-explanation' and agree that the model’s responses about its own processes should be treated as behavioral outputs rather than architectural facts. We have addressed these points in the revised manuscript to ensure the theoretical framework accurately reflects the likely mechanisms at play.
read point-by-point responses
-
Referee: [§4. 'Recombination Logic'] The paper characterizes the 30% duplication rate as evidence of 'structural recombination logic.' However, the existence of 137 source papers containing the exact same 'Williamson & Piattoeva (2019)' phantom suggests a 'viral error' or 'canonical hallucination' already present in the training data. There is a risk of conflating architectural generation (stochastic recombination) with simple retrieval of a high-frequency incorrect string from the training set.
Authors: We agree with the referee that the distinction between 'stochastic recombination' and the 'retrieval of high-frequency errors' is vital. Our analysis of the 137 source papers suggests a lifecycle for this phantom: while the initial creation of the citation likely followed a 'recombination logic' (fusing Williamson and Piattoeva’s known collaboration history with topical keywords like 'Datafication'), its current high frequency is almost certainly due to 'viral' propagation. The error has been recursively ingested into the training sets of newer models. We have revised Section 4 to explicitly discuss this feedback loop, clarifying that the 30% duplication rate is likely a reflection of data-cleanliness issues where the model retrieves a now-canonical error from its corpus rather than generating it de novo via architectural mechanics. revision: yes
-
Referee: [§4. 'Model Interrogation'] The interrogation of ChatGPT 5-mini is used to support the theory of how the model 'reconstructs' references. Caution is needed here: LLMs are prone to 'sycophancy' or 'hallucinated self-explanation,' where they provide plausible-sounding explanations of their own internal weights that do not necessarily map to technical reality. The claim that the model 'admits' to its reconstruction process should be framed as a behavioral observation.
Authors: We fully accept this point. LLMs do not have introspective access to their training history or internal weights, and their 'explanations' are themselves generative outputs designed to satisfy the user's prompt. Our intention in the interrogation was not to provide a technical account of the model's inner workings, but to demonstrate that the model generates a 'plausible-sounding narrative' to justify its fabrications, which increases the risk of user deception. We have revised the 'Model Interrogation' section and the Conclusion to frame these exchanges as behavioral observations of 'simulated explanations' rather than factual accounts of the internal processing architecture. revision: yes
Circularity Check
Empirical tracking of a 'viral' hallucination is sound, though the generative mechanism is 'confirmed' via circular AI interrogation.
specific steps
-
other
[Section 4.1]
"The model explains: 'The citation... was likely generated by my internal processing... I reconstruct plausible-looking citations based on patterns...' This confirmed that, in the absence of verification, the model reconstructs plausible references from learned patterns rather than factual recall."
The author uses the LLM's own generated response—an explanation of its own hallucination process—as factual 'confirmation' of the paper's hypothesis regarding that process. Because the AI is a pattern-matching generative model, its self-explanation is itself a 'plausible-looking' reconstruction rather than a technical diagnostic. Treating the AI's narrative output as an independent verification of its generative mechanics is circular.
-
renaming known result
[Abstract / Section 5]
"hallucinated citations are not random inventions but patterned recombinations of real authors, journals, dates, and keywords, with duplication occurring in nearly 30% of cases."
The 'discovery' that hallucinations are patterned recombinations is largely a descriptive renaming of the data collection process. The study identifies phantoms by searching for a specific plausible citation (Williamson & Piattoeva). By filtering for citations that resemble real academic work, the researcher ensures the resulting set is 'patterned' and 'non-random' by construction. The conclusion that they are 'recombinations' is an inference of mechanism based on this descriptive property.
full rationale
The paper provides a valuable empirical analysis of how a specific incorrect citation (Williamson & Piattoeva 2019) propagates through academic literature and generative AI outputs. The tracking of 137 source papers via external search engines is an independent, non-circular investigation. Circularity is minor and confined to the interpretive framework: the author 'confirms' the AI's internal generative logic by asking the AI to describe itself (Section 4.1), and characterizes the existence of shared citation components as 'patterned recombination' (Section 5), which is essentially a restatement of the criteria used to identify the 'plausible' hallucinations in the first place. Because the core data on citation frequency and duplication is empirical and self-contained, the circularity score is kept low.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Bibliographic non-existence can be confirmed via exhaustive search of WorldCat, OpenAlex, and major academic databases.
- domain assumption Google Scholar search results are a valid proxy for the presence of a citation in the broader academic/grey literature ecosystem.
Reference graph
Works this paper leans on
-
[1]
Spennemann, D.H.R. Children of AI: a protocol for managing the born-digital ephemera spawned by Generative AI Language Models. Publications 2023, 11, 45,41–46, doi:https://doi.org/10.3390/publications11030045. 10. Spennemann, D.H.R. ‘Conversations’ with ChatGPT regarding its knowledge of Ben Williamson’s publications and the creation of AI-Generated Ghost...
-
[2]
Commercialisation and privatisation in/of education in the context of Covid-19
Williamson, B.; Hogan, A. Commercialisation and privatisation in/of education in the context of Covid-19. Education International Research Brief, Education International Research: Brussels, 2020. 28. OpenAI. Introducing ChatGPT search. 2024, October 31. Available online: https://openai.com/index/introducing-chatgpt-search/ (accessed on March 29, 2026)
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.