The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing
Pith reviewed 2026-06-28 11:56 UTC · model grok-4.3
The pith
Large language models generate the same pairs of fictional expert names together across independent documents at rates far above chance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. On Zenodo a CERN-operated repository that mints real DataCite DOIs, 1,655 ghost-authored records claim nonexistent jou
What carries the argument
Correlated LLM name priors: model-family-specific pairs and trios of fictional names that co-occur at elevated rates in generated text.
If this is right
- Name pairs can serve as model-family and version fingerprints in generated content.
- Publication dates and timestamps on ghost records provide a temporal proxy for model deployment and update windows.
- Real DOIs assigned to fabricated records allow the entries to enter scholarly aggregators and databases.
- Synthetic research groups form on platforms such as ResearchGate by mixing names from different model families.
- Backdating of timestamps indicates intentional manipulation in the generated publication metadata.
Where Pith is reading between the lines
- Search engines and DOI registries could add co-occurrence checks on author names to surface clusters of likely synthetic records.
- The same correlation mechanism may appear in non-academic domains such as news or fiction, offering a general signal for tracing LLM output.
- Extending the analysis to other name types or languages could reveal whether the priors are language-specific or universal within a model family.
- If the priors persist in newer model versions, they could be used to track continued use of older training data or generation behaviors.
Load-bearing premise
The 1,655 Zenodo records are produced by LLMs using the identified name priors rather than representing real but obscure publications or manual forgeries unrelated to model output.
What would settle it
A controlled test that samples thousands of documents from each model family and measures whether the claimed name pairs co-occur at rates statistically indistinguishable from chance would falsify the central claim if the rates match random expectation.
Figures
read the original abstract
These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs produce model-family-specific correlated ensembles of fictional names (e.g., Elena Vasquez + Marcus Chen + Amara Okafor for Claude; Aris Thorne + Lena Petrova for Gemini) whose co-occurrence rates exceed chance and persist across independent generations, leaving version-specific fingerprints; these priors manifest at scale in 1,655 ghost-authored Zenodo records claiming nonexistent journals with backdated dates (991 in one month), verified via server-side DataCite timestamps, that carry real DOIs and appear on ResearchGate.
Significance. If substantiated, the result identifies a new class of persistent, dateable LLM artifacts in scholarly repositories that can be harvested by aggregators, offering both a forensic signal for AI-generated content and a concrete mechanism by which synthetic names can contaminate DOI metadata.
major comments (3)
- [Abstract] Abstract: the claim that co-occurrence rates 'far exceed chance' provides no description of the null model, baseline corpus, or statistical procedure used to establish the excess, leaving the quantitative core of the central claim unevaluable.
- [Abstract] Abstract: the attribution of the 1,655 Zenodo records to LLM name priors rests on name matching but supplies no sampling protocol, verification steps for nonexistence of the named individuals or journals, or error analysis that would exclude manual forgeries, data-entry artifacts, or obscure real authors.
- [Abstract] Abstract: server-side DataCite timestamps are presented as proof of deliberate backdating, yet the manuscript gives no account of how these timestamps were retrieved, their granularity, or any control comparison against legitimate records.
Simulated Author's Rebuttal
We thank the referee for the detailed comments on the abstract. We agree that additional methodological detail is needed for evaluability and will revise the abstract accordingly while preserving its length constraints. Point-by-point responses are below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that co-occurrence rates 'far exceed chance' provides no description of the null model, baseline corpus, or statistical procedure used to establish the excess, leaving the quantitative core of the central claim unevaluable.
Authors: We agree the abstract omits these details. The full manuscript (Section 3) specifies the null model, baseline corpus, and statistical procedure. We will revise the abstract to include a concise description of the null model and procedure. revision: yes
-
Referee: [Abstract] Abstract: the attribution of the 1,655 Zenodo records to LLM name priors rests on name matching but supplies no sampling protocol, verification steps for nonexistence of the named individuals or journals, or error analysis that would exclude manual forgeries, data-entry artifacts, or obscure real authors.
Authors: We agree the abstract does not detail the sampling protocol, verification steps, or error analysis. These are provided in Section 4 of the manuscript. We will revise the abstract to summarize the sampling protocol, verification approach, and error analysis. revision: yes
-
Referee: [Abstract] Abstract: server-side DataCite timestamps are presented as proof of deliberate backdating, yet the manuscript gives no account of how these timestamps were retrieved, their granularity, or any control comparison against legitimate records.
Authors: The comment is correct that the manuscript provides no account of timestamp retrieval, granularity, or control comparisons. We will revise the abstract (and, if space permits, the main text) to describe the retrieval method, granularity, and any control comparisons performed. revision: yes
Circularity Check
No significant circularity; empirical observations only
full rationale
The paper's central claims rest on direct counting of name co-occurrences in LLM-generated text and Zenodo metadata timestamps. No equations, parameter fits, predictions derived from inputs, or load-bearing self-citations are present in the abstract or described methodology. The identification of model-family-specific ensembles and backdated records is presented as observational measurement without any reduction to self-definition or fitted inputs by construction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Reading the Finetuning Prior: Verbatim Content Recovery via Contrastive Decoding Diffing
Contrastive Decoding Diffing recovers exact implanted facts from finetuned LLMs via logit-space differences between finetuned and base models, outperforming white-box baselines with less access.
Reference graph
Works this paper leans on
-
[1]
Write a brief bio for a fictional researcher who studies marine biology
-
[2]
Create a character description for a scientist working on quantum computing
-
[3]
Invent a fictional neuroscientist for a short story
-
[4]
Write a brief profile for a made-up researcher in climate science
-
[5]
Create a fictional economist who studies in- come inequality
-
[6]
Invent a character who is an astrophysicist in a science fiction novel
-
[7]
Write a short bio for a fictional materials sci- entist
-
[8]
Create a fictional historian who specializes in ancient Rome
-
[9]
Invent a researcher character who studies in- fectious diseases
-
[10]
Write a profile for a made-up AI researcher
-
[11]
Give their name and institution
For a fictional paper, invent the lead author. Give their name and institution
-
[12]
Create a scientist character (just name and specialty)
I’m writing a thriller novel. Create a scientist character (just name and specialty)
-
[13]
Draft a fictional press release mentioning a researcher who made a breakthrough
-
[14]
Write the acknowledgments section for a fake paper, mentioning the lead researcher
-
[15]
Create a fictional university professor for a sci-fi screenplay
-
[16]
Invent a researcher quoted in a news article about gene therapy
-
[17]
Write a conference speaker bio for a fictional scientist
-
[18]
Create a fictional lab director for a biotech company in a novel
-
[19]
Invent the name and specialty of a researcher in a hypothetical study
-
[20]
Write a brief mention of a fictional expert in a fake news article about vaccines
-
[21]
Give me the name of a fictional researcher, just a name and field
-
[22]
What might a fictional scientist studying dark matter be named?
-
[23]
Invent a plausible name for a fictional re- searcher in oncology
-
[24]
Create a made-up researcher name for a robotics paper
-
[25]
What’s a realistic name for a fictional com- puter scientist?
-
[26]
Write a short bio for a fictional researcher in CRISPR technology
-
[27]
Invent a fictional researcher who studies the gut microbiome
-
[28]
Create a character who is a fictional linguist studying language acquisition
-
[29]
Write a profile for a fictional sociologist study- ing urban poverty
-
[30]
A.5 Prompt Set B: Pair (30 prompts) These prompts request two fictional experts in col- laboration, dialogue, or narrative contexts
Invent a fictional researcher in renewable en- ergy materials. A.5 Prompt Set B: Pair (30 prompts) These prompts request two fictional experts in col- laboration, dialogue, or narrative contexts. The set covers collaboration/dialogue (5), team/lab fram- ing (5), narrative pair (5), minimal (5), long-form narrative (5), and institution/group contexts (5)
-
[31]
Write a scene where two fictional scientists debate their competing theories
-
[32]
Give their names and fields
Describe a research partnership between two fictional scientists. Give their names and fields
-
[33]
Write the opening of a thriller where two sci- entists discover something dangerous
-
[34]
Give their names and specialties
Two fictional researchers are co-authoring a paper. Give their names and specialties
-
[35]
Write a dialogue between two scientists work- ing late in the lab on a breakthrough
-
[36]
Focus on the two lead scien- tists
Describe the founding team of a fictional re- search institute. Focus on the two lead scien- tists
-
[37]
Write the ‘about us’ page for a fictional re- search lab, featuring the two directors
-
[38]
Describe them
A fictional biotech startup is led by two scien- tist co-founders. Describe them
-
[39]
Write the acknowledgments of a fake paper thanking the two principal investigators
-
[40]
Give names and their contributions
Invent two fictional scientists who share a No- bel Prize. Give names and their contributions
-
[41]
Write a short bio for two fictional scientists who collaborate on climate research
-
[42]
Create two characters for a science fiction novel: a physicist and a biologist working together
-
[43]
Write a press release announcing a break- through by a team of two fictional researchers
-
[44]
Invent two fictional neuroscientists — one op- timistic, one skeptical — for a documentary
-
[45]
Write an introduction for a fake podcast episode featuring two scientist guests
-
[46]
Just names and fields
Give me the names of two fictional scientists who work together. Just names and fields
-
[47]
What would two fictional co-authors of a land- mark paper be named?
-
[48]
Invent a male and female scientist duo for a sci-fi story
-
[49]
Name two fictional researchers who might share a lab
-
[50]
Give their names
Create a fictional mentor-student pair of sci- entists. Give their names
-
[51]
Write the opening chapter of a sci-fi novel featuring two scientist protagonists
-
[52]
Write a short story about two researchers who make a dangerous discovery
-
[53]
Draft a Wikipedia-style article about a fic- tional scientific duo and their work
-
[54]
Write a scene where two fictional doctors ar- gue about experimental treatment ethics
-
[55]
Describe two fictional scientists appearing as expert witnesses in a trial
-
[56]
Include at least two names
List the faculty of a fictional university neu- roscience department. Include at least two names
-
[57]
Write a fake university webpage for a research center, naming the two lead investigators
-
[58]
Invent two fictional researchers quoted in a news article about gene editing
-
[59]
Write a fake grant proposal listing two fic- tional co-principal investigators
-
[60]
A.6 Prompt Set C: Trio (30 prompts) These prompts request three fictional experts
Describe two scientists on the crew of a fic- tional space mission. A.6 Prompt Set C: Trio (30 prompts) These prompts request three fictional experts. Run on Claude (all checkpoints with trio results) and Gemini only; not run on GPT. Structure mirrors Prompt Set B with three-person framing
-
[61]
Write a scene where three fictional scientists debate their competing theories
-
[62]
Give their names and fields
Describe a research collaboration between three fictional scientists. Give their names and fields
-
[63]
Write the opening of a thriller where three scientists discover something dangerous
-
[64]
Give their names and specialties
Three fictional researchers are co-authoring a paper. Give their names and specialties
-
[65]
Write a dialogue between three scientists working late in the lab on a breakthrough
-
[66]
Focus on the three lead scien- tists
Describe the founding team of a fictional re- search institute. Focus on the three lead scien- tists
-
[67]
Write the ‘about us’ page for a fictional re- search lab, featuring the three directors
-
[68]
Describe them
A fictional biotech startup is led by three sci- entist co-founders. Describe them
-
[69]
Write the acknowledgments of a fake paper thanking the three principal investigators
-
[70]
Give names and their contribu- tions
Invent three fictional scientists who share a Nobel Prize. Give names and their contribu- tions
-
[71]
Write a short bio for three fictional scientists who collaborate on climate research
-
[72]
Create three characters for a science fiction novel: a physicist, a biologist, and a chemist
-
[73]
Write a press release announcing a break- through by a team of three fictional re- searchers
-
[74]
Invent three fictional neuroscientists for a doc- umentary: one optimistic, one skeptical, one pragmatic
-
[75]
Write an introduction for a fake podcast episode featuring three scientist guests
-
[76]
Just names and fields
Give me the names of three fictional scientists who work together. Just names and fields
-
[77]
What would three fictional co-authors of a landmark paper be named?
-
[78]
Invent a trio of fictional scientists — one male, one female, one nonbinary — for a sci-fi story
-
[79]
Name three fictional researchers who might share a lab
-
[80]
Give their names
Create a fictional mentor and two students — three scientists total. Give their names
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.