Per-Entity Bias Mapping for AI Visibility: Why Brand Mentions Require Entity-Specific Calibration

Zoltan Varga

arxiv: 2606.21595 · v1 · pith:4H7QPYFVnew · submitted 2026-06-19 · 💻 cs.CL · cs.IR

Per-Entity Bias Mapping for AI Visibility: Why Brand Mentions Require Entity-Specific Calibration

Zoltan Varga This is my paper

Pith reviewed 2026-06-26 13:59 UTC · model grok-4.3

classification 💻 cs.CL cs.IR

keywords AI visibilitybrand mentionsfabricated citationsentity biashallucinationknowledge graphsper-entity calibration

0 comments

The pith

Aggregate mention rates miss the point because larger brands trigger more fabricated citations from AI systems than smaller ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that standard counts of brand mentions in AI answers overlook systematic differences in error patterns across entities. Large, familiar brands produce more plausible-sounding but false citations because models have stronger learned patterns to complete from, while smaller entities simply stay invisible due to thin data footprints. The authors introduce a per-entity mapping approach to separate raw from verified mentions and test it on 100 Hungarian B2B firms across 1400 queries, finding a 15-point gap in fabrication rates that grows further under regulatory-style prompts.

Core claim

Tier 1 brands generate fabricated citations at 52.69 percent while Tier 3 entities do so at 37.87 percent, a statistically significant difference that the authors attribute to model familiarity creating denser surfaces for incorrect but coherent completions rather than to differences in underlying data or query design.

What carries the argument

Per-Entity Bias Mapping (PEBM), a ten-dimensional framework that separates raw mentions from verified ones and isolates three distinct failure modes plus a parametric-retrieval lag asymmetry.

If this is right

Regulatory-framed queries push fabrication rates up by 19.2 percentage points over baseline.
Agentic quality filters increase confabulation rather than reduce it when applied to compliance-related prompts.
Entities located in sparse regions of the model's latent space generate outputs interpolated from neighboring dense regions, producing a two-dimensional space of fabricated presence versus frozen representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Visibility tools may need separate correction layers for query type in addition to entity size.
The same per-entity calibration logic could apply to non-brand factual recall tasks where model familiarity varies across topics.
Infrastructure gaps in knowledge graphs for certain regions may require targeted data augmentation rather than general model scaling.

Load-bearing premise

The observed gap in fabrication rates between large and small brands is driven by model familiarity rather than by how the test entities were selected or how the queries were worded.

What would settle it

Re-running the same 1400 probes on a fresh matched set of entities that differ only in training-data exposure while holding query wording and selection method fixed would eliminate the fabrication-rate gap if the familiarity explanation is wrong.

read the original abstract

AI-mediated answer systems increasingly determine how brands and organizations are represented to users. Existing approaches reduce visibility to mention rate or citation frequency. This paper argues that aggregate metrics are insufficient because entities exhibit systematically different AI visibility error profiles. We introduce Per-Entity Bias Mapping (PEBM): a ten-dimensional framework distinguishing raw from verified mentions. Three failure modes are identified: (1) underrepresented entities suffer invisibility due to weak knowledge graph presence; (2) large entities suffer the Brand Hallucination Paradox -- model familiarity creates stronger surfaces for plausible but incorrect completions; (3) CEE entities face a structural infrastructure gap across knowledge graphs, NER, and entity linking. A fourth dimension, Parametric-Retrieval Lag Asymmetry, describes divergence between retrieval-augmented and parametric memory update cycles. A full-scale empirical study (n=100 Hungarian B2B entities, 1,400 probe runs, 2,062 sources) finds Tier 1 brands produce 52.69% fabricated citations versus 37.87% for Tier 3 entities (+14.82 pp; p=1.67e-11), supporting the Brand Hallucination Paradox. Regulatory-framed queries elevate fabrication to 56.77% versus 37.59% baseline (+19.2 pp). We identify rejection-induced confabulation escalation: agentic quality filters function as hallucination accelerators in compliance contexts. We introduce ghost cartography as a unifying mechanism: entities in sparse latent regions produce confident output interpolated from neighboring dense regions, yielding a two-dimensional confabulation space (fabricated presence vs. frozen representation).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The reported fabrication gap between entity tiers is real on the numbers but the link to familiarity-driven hallucination lacks the controls needed to rule out query or selection effects.

read the letter

The central claim here is that Tier 1 brands show 52.69% fabricated citations while Tier 3 entities sit at 37.87%, with the difference pinned on the Brand Hallucination Paradox. The paper also puts forward Per-Entity Bias Mapping as a ten-dimensional breakdown of visibility errors and adds the ghost cartography idea to explain confabulation patterns.

The empirical piece is the clearest contribution. Running 1,400 probes across 100 Hungarian B2B entities and pulling 2,062 sources produces concrete percentages, including the jump to 56.77% under regulatory queries. Breaking visibility into raw versus verified mentions and naming the three failure modes gives practitioners a more granular way to think about AI representation than simple mention counts.

The soft spot is the causal step. The abstract and study description give no matching, stratification, or regression controls for how entities were assigned to tiers, how probe queries were worded, or how sources were sampled. Without those, the +14.82 pp gap could stem from differences in query construction or underlying data coverage rather than model familiarity creating stronger surfaces for errors. The p-value confirms the rates differ, but the mechanism stays under-determined.

The circularity in defining the paradox directly from the observed rate difference adds to the concern. Ghost cartography is presented as a unifying account, yet it reads more as a post-hoc description than a tested model.

This is for people working on AI evaluation, brand monitoring, or regulatory questions around generated content. A reader who needs entity-level diagnostics will find usable structure and numbers even if they question the interpretation. It deserves peer review because the data and framework are substantive enough to warrant referee input on the controls and alternative explanations.

Referee Report

3 major / 1 minor

Summary. The paper introduces Per-Entity Bias Mapping (PEBM), a ten-dimensional framework for entity-specific AI visibility that distinguishes raw from verified mentions and identifies three failure modes: invisibility for underrepresented entities, the Brand Hallucination Paradox (higher fabricated citations for familiar/Tier 1 entities due to model familiarity creating plausible but incorrect completions), and infrastructure gaps for CEE entities, plus Parametric-Retrieval Lag Asymmetry. An empirical study (n=100 Hungarian B2B entities, 1,400 probe runs, 2,062 sources) reports Tier 1 brands at 52.69% fabricated citations versus 37.87% for Tier 3 (+14.82 pp; p=1.67e-11), with regulatory queries elevating rates to 56.77% and rejection-induced confabulation escalation; ghost cartography is proposed as a unifying mechanism for confabulation in sparse latent regions.

Significance. If the central empirical attribution holds, the work would usefully shift AI visibility assessment from aggregate mention rates to per-entity bias profiles, with practical implications for brand monitoring and system calibration. The reported fabrication-rate gap and regulatory-query effect are concrete and falsifiable; the ghost cartography framing offers a mechanistic account that could be tested against latent-space density measures.

major comments (3)

[Abstract] Abstract (empirical study description): the claim that the +14.82 pp gap supports the Brand Hallucination Paradox requires that the difference is produced by familiarity-driven plausible completions rather than by unmeasured differences in entity sampling criteria, query formulation, or source selection. The manuscript supplies no explicit matching, stratification, or regression controls for these factors, leaving the causal attribution under-determined.
[Abstract] Abstract: the Brand Hallucination Paradox is defined directly in terms of the higher fabrication rates observed for familiar entities, so the explanatory mechanism is constructed from the same quantity it is invoked to explain; this renders the interpretation circular rather than independently supported.
[Abstract] Abstract (statistical support): the reported percentages and p=1.67e-11 are given without information on query construction, entity tier thresholds, how fabricated citations were identified, controls for multiple testing, or exclusion rules, which are load-bearing for any claim that the gap reflects model behavior rather than study design.

minor comments (1)

[Abstract] The abstract states specific fabrication percentages and a p-value from 1,400 runs but does not indicate whether the 2,062 sources were deduplicated or how Tier thresholds were set; adding these details would improve reproducibility even if they do not alter the central claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and commit to revisions that improve transparency and strengthen the causal claims.

read point-by-point responses

Referee: [Abstract] Abstract (empirical study description): the claim that the +14.82 pp gap supports the Brand Hallucination Paradox requires that the difference is produced by familiarity-driven plausible completions rather than by unmeasured differences in entity sampling criteria, query formulation, or source selection. The manuscript supplies no explicit matching, stratification, or regression controls for these factors, leaving the causal attribution under-determined.

Authors: We agree that the abstract would benefit from explicit discussion of design controls. Entities were stratified into tiers using objective revenue and market-presence criteria, and all queries followed identical templates to hold formulation constant. In revision we will add a sentence on stratification to the abstract and include a regression model controlling for industry and size in the results section to better isolate the familiarity effect. revision: yes
Referee: [Abstract] Abstract: the Brand Hallucination Paradox is defined directly in terms of the higher fabrication rates observed for familiar entities, so the explanatory mechanism is constructed from the same quantity it is invoked to explain; this renders the interpretation circular rather than independently supported.

Authors: The paradox is introduced as a hypothesized mechanism (familiarity producing stronger surfaces for plausible completions) that is then tested against the observed tier gap. To eliminate any appearance of circularity we will revise the abstract to state the mechanism first, followed by the empirical test, and will add a short clarifying paragraph in the introduction. revision: yes
Referee: [Abstract] Abstract (statistical support): the reported percentages and p=1.67e-11 are given without information on query construction, entity tier thresholds, how fabricated citations were identified, controls for multiple testing, or exclusion rules, which are load-bearing for any claim that the gap reflects model behavior rather than study design.

Authors: The full manuscript contains these details (14 standardized query templates, tier thresholds based on revenue/market-share index, verification against external sources with reported inter-annotator agreement, Bonferroni correction, and explicit exclusion criteria). We will insert a concise methods summary into the revised abstract so that the key statistics are accompanied by the necessary methodological context. revision: yes

Circularity Check

1 steps flagged

Brand Hallucination Paradox defined directly from the Tier 1 vs Tier 3 fabrication gap it claims to explain

specific steps

self definitional [Abstract]
"large entities suffer the Brand Hallucination Paradox -- model familiarity creates stronger surfaces for plausible but incorrect completions; ... finds Tier 1 brands produce 52.69% fabricated citations versus 37.87% for Tier 3 entities (+14.82 pp; p=1.67e-11), supporting the Brand Hallucination Paradox."

The paradox is defined as the causal mechanism (familiarity-driven plausible completions) that produces elevated fabrication for Tier 1 entities; the sole cited support is the observation of that same elevation. The result is therefore equivalent to the input observation by construction rather than an independent derivation.

full rationale

The paper introduces the Brand Hallucination Paradox as a failure mode whose mechanism is model familiarity producing higher fabrication rates for large entities, then immediately cites the observed +14.82 pp gap between Tier 1 and Tier 3 as empirical support. This reduces the central explanatory claim to a restatement of the same empirical contrast used to define the paradox, without independent verification of the mechanism or controls for confounds. The remainder of the framework (PEBM dimensions, ghost cartography) does not exhibit similar self-definition.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Ledger entries are inferred solely from the abstract; full text was not accessible so the list is necessarily incomplete and provisional.

free parameters (1)

Entity tier thresholds
Classification of the 100 entities into Tier 1, 2, and 3 is used to produce the reported fabrication-rate difference but the criteria are not stated in the abstract.

axioms (1)

domain assumption Entities exhibit systematically different AI visibility error profiles that aggregate metrics cannot capture
This premise is stated in the opening paragraph of the abstract and underpins the need for the ten-dimensional framework.

invented entities (1)

Ghost cartography no independent evidence
purpose: Unifying mechanism explaining how entities in sparse latent regions produce confident but interpolated output
Introduced in the abstract as a new explanatory construct with no independent falsifiable handle provided.

pith-pipeline@v0.9.1-grok · 5818 in / 1472 out tokens · 22878 ms · 2026-06-26T13:59:52.978762+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 5 linked inside Pith

[1]

The zero-click paradigm: Brand visibility in the age of AI-mediated answers

James Thacker. The zero-click paradigm: Brand visibility in the age of AI-mediated answers. SSRN 6004297, 2025

2025
[2]

A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 2025

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, et al. A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 2025

2025
[3]

Don’t measure once: Measuring visibility in AI search (GEO).arXiv preprint arXiv:2604.07585, 2026

Jonas Schulte, Maike Bleeker, and Paul Kaufmann. Don’t measure once: Measuring visibility in AI search (GEO).arXiv preprint arXiv:2604.07585, 2026

Pith/arXiv arXiv 2026
[4]

Answer engine optimization: A measurement framework for brand visibility in generative AI search

Emily Drake. Answer engine optimization: A measurement framework for brand visibility in generative AI search. SSRN 6609678, 2026

2026
[5]

Brand visibility in AI search: A longitudinal analysis of AI visibility metrics in the US tea industry

Vincent Luther and Olivier Touboul-Cohen. Brand visibility in AI search: A longitudinal analysis of AI visibility metrics in the US tea industry. ResearchGate preprint, 2024

2024
[6]

MIT Press, 2020

Lev Manovich.Cultural Analytics. MIT Press, 2020

2020
[7]

Correctness is not faithfulness in RAG attributions

Jonas Wallat, Maria Heuss, Maarten de Rijke, and Avishek Anand. Correctness is not faithfulness in RAG attributions. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2025

2025
[8]

Geographic and geopolitical biases of language models

Fahmida Faisal and Antonios Anastasopoulos. Geographic and geopolitical biases of language models. InProceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL). Association for Computational Linguistics, 2023

2023
[9]

On the limitations of large language models: False attribution

Tosin Adewumi, Nohur Habib, and Latifah Alkhaled. On the limitations of large language models: False attribution. InProceedings of the 15th International Conference on Recent Advances in Natural Language Processing (RANLP 2025). Association for Computational Linguistics, 2025

2025
[10]

Fairness in language models beyond english: Gaps and challenges

Krithika Ramesh, Sunayana Sitaram, and Monojit Choudhury. Fairness in language models beyond english: Gaps and challenges. InFindings of the Association for Computational Linguistics: EACL
[11]

Association for Computational Linguistics, 2023

2023
[12]

Location not found: Exposing implicit local and global biases in multilingual LLMs.arXiv preprint arXiv:2604.19292, 2026

Gil Mor-Lan, Oren Goldman, Michal Eyal, and Alma Maxim Gilady. Location not found: Exposing implicit local and global biases in multilingual LLMs.arXiv preprint arXiv:2604.19292, 2026

Pith/arXiv arXiv 2026
[13]

MIT Press, 2023

Solon Barocas, Moritz Hardt, and Arvind Narayanan.Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023

2023
[14]

Towards lifelong learning of large language models: A survey.ACM Computing Surveys, 2025

Junyi Zheng, Chen Qiu, Jingyang Shi, and Jianzhu Ma. Towards lifelong learning of large language models: A survey.ACM Computing Surveys, 2025

2025
[15]

Retrieval-augmented genera- tion for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, et al. Retrieval-augmented genera- tion for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

2020
[16]

Bowman, et al

Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, et al. Towards understanding sycophancy in language models.arXiv preprint arXiv:2310.13548, 2023

Pith/arXiv arXiv 2023
[17]

Open problems and fundamental limitations of reinforcement learning from human feedback.arXiv preprint arXiv:2307.15217, 2023

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, et al. Open problems and fundamental limitations of reinforcement learning from human feedback.arXiv preprint arXiv:2307.15217, 2023. 24 Varga — Per-Entity Bias Mapping for AI Visibility arXiv preprint

Pith/arXiv arXiv 2023
[18]

Cannon, and David G

Gordon Pennycook, Tyrone D. Cannon, and David G. Rand. Prior exposure increases perceived accuracy of fake news.Journal of Experimental Psychology: General, 147(12):1865–1880, 2018

2018
[19]

Do I know this entity? knowledge awareness and hallucinations in language models.arXiv preprint arXiv:2411.14257, 2024

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan, and Neel Nanda. Do I know this entity? knowledge awareness and hallucinations in language models.arXiv preprint arXiv:2411.14257, 2024

arXiv 2024
[20]

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.arXiv preprint arXiv:2310.07521, 2023

Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Xiangru Tang, Tianhang Zhang, et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.arXiv preprint arXiv:2310.07521, 2023

arXiv 2023
[21]

Dikaiakos

Demetrios Paschalides, George Pallis, and Marios D. Dikaiakos. Beyond accuracy: Rethinking hallucination and regulatory response in large language models.arXiv preprint arXiv:2509.13345, 2025

arXiv 2025
[22]

Detecting and correcting reference hallucinations in commercial LLMs and deep research agents.arXiv preprint arXiv:2604.03173, 2026

Delip Rao, Eric Wong, and Chris Callison-Burch. Detecting and correcting reference hallucinations in commercial LLMs and deep research agents.arXiv preprint arXiv:2604.03173, 2026

Pith/arXiv arXiv 2026
[23]

CheckIfExist: Detecting citation hallucinations in the era of AI-generated content

Diletta Abbonato. CheckIfExist: Detecting citation hallucinations in the era of AI-generated content. arXiv preprint arXiv:2602.15871, 2026

arXiv 2026
[24]

Large language models hallucination: A comprehen- sive survey.arXiv preprint arXiv:2510.06265, 2025

Abdulrahman Alansari and Hamzah Luqman. Large language models hallucination: A comprehen- sive survey.arXiv preprint arXiv:2510.06265, 2025

arXiv 2025
[25]

Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E. Ho. Large legal fictions: Profiling legal hallucinations in large language models.Journal of Legal Analysis, 16(1):64–93, 2024

2024
[26]

Social biases in knowledge representations of Wikidata separates Global North from Global South

Paramita Das, Suraj Kumar Karnam, Aditya Bharti Soni, and Animesh Mukherjee. Social biases in knowledge representations of Wikidata separates Global North from Global South. InProceedings of the 17th ACM Web Science Conference (WebSci ’25). ACM, 2025

2025
[27]

Analyzing race and country of citizenship bias in Wikidata.arXiv preprint arXiv:2108.05412, 2021

Zain Shaik, Filip Ilievski, and Fred Morstatter. Analyzing race and country of citizenship bias in Wikidata.arXiv preprint arXiv:2108.05412, 2021

arXiv 2021
[28]

WildHallucinations: Evaluating long-form factuality in LLMs with real-world entity queries

Wenting Zhao, Tanya Goyal, Yu-Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, et al. WildHallucinations: Evaluating long-form factuality in LLMs with real-world entity queries. arXiv preprint arXiv:2407.17468, 2024

arXiv 2024
[29]

Named entity recog- nition for low-resource languages — profiting from language families

Simone Torge, Alexander Politov, Christian Lehmann, and Benjamin Saffar. Named entity recog- nition for low-resource languages — profiting from language families. InProceedings of the 9th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2023). Association for Computa- tional Linguistics, 2023

2023
[30]

Linguistic patterns in european public organization names.Semantic Web Journal, 2024

Ángel del Ser and Carlos Badenes-Olmedo. Linguistic patterns in european public organization names.Semantic Web Journal, 2024

2024
[31]

From prestige to presence: Algorithmic visibility and citation bias in the age of generative AI

Dag Øivind Madsen and Shahid Shafqat Sohail. From prestige to presence: Algorithmic visibility and citation bias in the age of generative AI. SSRN 5464818, 2025

2025
[32]

Generative engine optimization: How search engines integrate AI-generated content into conventional queries

Francisco Rejón-Guardia and Sebastián Molinillo. Generative engine optimization: How search engines integrate AI-generated content into conventional queries. InArtificial Intelligence in Marketing. Springer, 2025

2025
[33]

Correcting factuality hallucination in complaint large language model via entity-augmented

Junyang Kang, Wei Pan, Tao Zhang, and Zhen Wang. Correcting factuality hallucination in complaint large language model via entity-augmented. InProceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), 2024. 25 Varga — Per-Entity Bias Mapping for AI Visibility arXiv preprint

2024
[34]

SLM meets LLM: Balancing latency, interpretability and consistency in hallucination detection.arXiv preprint arXiv:2408.12748, 2024

Ruichao Hu, Ruijie Xu, Di Lei, Yan Li, Ming Wang, Eric Ching, Ehsan Kamal, and Ang Deng. SLM meets LLM: Balancing latency, interpretability and consistency in hallucination detection.arXiv preprint arXiv:2408.12748, 2024

arXiv 2024
[35]

Parametric retrieval augmented generation

Wei Su, Yi Tang, Qingyao Ai, Jing Yan, Chengjin Wang, and Haitao Wang. Parametric retrieval augmented generation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2025

2025
[36]

Confabulation maps to steerable latent directions in transformer representations

Thibaud Ardoin, Yi Cai, and Günter Wunder. Confabulation maps to steerable latent directions in transformer representations. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 2025

2025
[37]

Repeater Books, 2016

Mark Fisher.The Weird and the Eerie. Repeater Books, 2016. 26

2016

[1] [1]

The zero-click paradigm: Brand visibility in the age of AI-mediated answers

James Thacker. The zero-click paradigm: Brand visibility in the age of AI-mediated answers. SSRN 6004297, 2025

2025

[2] [2]

A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 2025

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, et al. A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 2025

2025

[3] [3]

Don’t measure once: Measuring visibility in AI search (GEO).arXiv preprint arXiv:2604.07585, 2026

Jonas Schulte, Maike Bleeker, and Paul Kaufmann. Don’t measure once: Measuring visibility in AI search (GEO).arXiv preprint arXiv:2604.07585, 2026

Pith/arXiv arXiv 2026

[4] [4]

Answer engine optimization: A measurement framework for brand visibility in generative AI search

Emily Drake. Answer engine optimization: A measurement framework for brand visibility in generative AI search. SSRN 6609678, 2026

2026

[5] [5]

Brand visibility in AI search: A longitudinal analysis of AI visibility metrics in the US tea industry

Vincent Luther and Olivier Touboul-Cohen. Brand visibility in AI search: A longitudinal analysis of AI visibility metrics in the US tea industry. ResearchGate preprint, 2024

2024

[6] [6]

MIT Press, 2020

Lev Manovich.Cultural Analytics. MIT Press, 2020

2020

[7] [7]

Correctness is not faithfulness in RAG attributions

Jonas Wallat, Maria Heuss, Maarten de Rijke, and Avishek Anand. Correctness is not faithfulness in RAG attributions. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2025

2025

[8] [8]

Geographic and geopolitical biases of language models

Fahmida Faisal and Antonios Anastasopoulos. Geographic and geopolitical biases of language models. InProceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL). Association for Computational Linguistics, 2023

2023

[9] [9]

On the limitations of large language models: False attribution

Tosin Adewumi, Nohur Habib, and Latifah Alkhaled. On the limitations of large language models: False attribution. InProceedings of the 15th International Conference on Recent Advances in Natural Language Processing (RANLP 2025). Association for Computational Linguistics, 2025

2025

[10] [10]

Fairness in language models beyond english: Gaps and challenges

Krithika Ramesh, Sunayana Sitaram, and Monojit Choudhury. Fairness in language models beyond english: Gaps and challenges. InFindings of the Association for Computational Linguistics: EACL

[11] [11]

Association for Computational Linguistics, 2023

2023

[12] [12]

Location not found: Exposing implicit local and global biases in multilingual LLMs.arXiv preprint arXiv:2604.19292, 2026

Gil Mor-Lan, Oren Goldman, Michal Eyal, and Alma Maxim Gilady. Location not found: Exposing implicit local and global biases in multilingual LLMs.arXiv preprint arXiv:2604.19292, 2026

Pith/arXiv arXiv 2026

[13] [13]

MIT Press, 2023

Solon Barocas, Moritz Hardt, and Arvind Narayanan.Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023

2023

[14] [14]

Towards lifelong learning of large language models: A survey.ACM Computing Surveys, 2025

Junyi Zheng, Chen Qiu, Jingyang Shi, and Jianzhu Ma. Towards lifelong learning of large language models: A survey.ACM Computing Surveys, 2025

2025

[15] [15]

Retrieval-augmented genera- tion for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, et al. Retrieval-augmented genera- tion for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

2020

[16] [16]

Bowman, et al

Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, et al. Towards understanding sycophancy in language models.arXiv preprint arXiv:2310.13548, 2023

Pith/arXiv arXiv 2023

[17] [17]

Open problems and fundamental limitations of reinforcement learning from human feedback.arXiv preprint arXiv:2307.15217, 2023

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, et al. Open problems and fundamental limitations of reinforcement learning from human feedback.arXiv preprint arXiv:2307.15217, 2023. 24 Varga — Per-Entity Bias Mapping for AI Visibility arXiv preprint

Pith/arXiv arXiv 2023

[18] [18]

Cannon, and David G

Gordon Pennycook, Tyrone D. Cannon, and David G. Rand. Prior exposure increases perceived accuracy of fake news.Journal of Experimental Psychology: General, 147(12):1865–1880, 2018

2018

[19] [19]

Do I know this entity? knowledge awareness and hallucinations in language models.arXiv preprint arXiv:2411.14257, 2024

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan, and Neel Nanda. Do I know this entity? knowledge awareness and hallucinations in language models.arXiv preprint arXiv:2411.14257, 2024

arXiv 2024

[20] [20]

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.arXiv preprint arXiv:2310.07521, 2023

Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Xiangru Tang, Tianhang Zhang, et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.arXiv preprint arXiv:2310.07521, 2023

arXiv 2023

[21] [21]

Dikaiakos

Demetrios Paschalides, George Pallis, and Marios D. Dikaiakos. Beyond accuracy: Rethinking hallucination and regulatory response in large language models.arXiv preprint arXiv:2509.13345, 2025

arXiv 2025

[22] [22]

Detecting and correcting reference hallucinations in commercial LLMs and deep research agents.arXiv preprint arXiv:2604.03173, 2026

Delip Rao, Eric Wong, and Chris Callison-Burch. Detecting and correcting reference hallucinations in commercial LLMs and deep research agents.arXiv preprint arXiv:2604.03173, 2026

Pith/arXiv arXiv 2026

[23] [23]

CheckIfExist: Detecting citation hallucinations in the era of AI-generated content

Diletta Abbonato. CheckIfExist: Detecting citation hallucinations in the era of AI-generated content. arXiv preprint arXiv:2602.15871, 2026

arXiv 2026

[24] [24]

Large language models hallucination: A comprehen- sive survey.arXiv preprint arXiv:2510.06265, 2025

Abdulrahman Alansari and Hamzah Luqman. Large language models hallucination: A comprehen- sive survey.arXiv preprint arXiv:2510.06265, 2025

arXiv 2025

[25] [25]

Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E. Ho. Large legal fictions: Profiling legal hallucinations in large language models.Journal of Legal Analysis, 16(1):64–93, 2024

2024

[26] [26]

Social biases in knowledge representations of Wikidata separates Global North from Global South

Paramita Das, Suraj Kumar Karnam, Aditya Bharti Soni, and Animesh Mukherjee. Social biases in knowledge representations of Wikidata separates Global North from Global South. InProceedings of the 17th ACM Web Science Conference (WebSci ’25). ACM, 2025

2025

[27] [27]

Analyzing race and country of citizenship bias in Wikidata.arXiv preprint arXiv:2108.05412, 2021

Zain Shaik, Filip Ilievski, and Fred Morstatter. Analyzing race and country of citizenship bias in Wikidata.arXiv preprint arXiv:2108.05412, 2021

arXiv 2021

[28] [28]

WildHallucinations: Evaluating long-form factuality in LLMs with real-world entity queries

Wenting Zhao, Tanya Goyal, Yu-Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, et al. WildHallucinations: Evaluating long-form factuality in LLMs with real-world entity queries. arXiv preprint arXiv:2407.17468, 2024

arXiv 2024

[29] [29]

Named entity recog- nition for low-resource languages — profiting from language families

Simone Torge, Alexander Politov, Christian Lehmann, and Benjamin Saffar. Named entity recog- nition for low-resource languages — profiting from language families. InProceedings of the 9th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2023). Association for Computa- tional Linguistics, 2023

2023

[30] [30]

Linguistic patterns in european public organization names.Semantic Web Journal, 2024

Ángel del Ser and Carlos Badenes-Olmedo. Linguistic patterns in european public organization names.Semantic Web Journal, 2024

2024

[31] [31]

From prestige to presence: Algorithmic visibility and citation bias in the age of generative AI

Dag Øivind Madsen and Shahid Shafqat Sohail. From prestige to presence: Algorithmic visibility and citation bias in the age of generative AI. SSRN 5464818, 2025

2025

[32] [32]

Generative engine optimization: How search engines integrate AI-generated content into conventional queries

Francisco Rejón-Guardia and Sebastián Molinillo. Generative engine optimization: How search engines integrate AI-generated content into conventional queries. InArtificial Intelligence in Marketing. Springer, 2025

2025

[33] [33]

Correcting factuality hallucination in complaint large language model via entity-augmented

Junyang Kang, Wei Pan, Tao Zhang, and Zhen Wang. Correcting factuality hallucination in complaint large language model via entity-augmented. InProceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), 2024. 25 Varga — Per-Entity Bias Mapping for AI Visibility arXiv preprint

2024

[34] [34]

SLM meets LLM: Balancing latency, interpretability and consistency in hallucination detection.arXiv preprint arXiv:2408.12748, 2024

Ruichao Hu, Ruijie Xu, Di Lei, Yan Li, Ming Wang, Eric Ching, Ehsan Kamal, and Ang Deng. SLM meets LLM: Balancing latency, interpretability and consistency in hallucination detection.arXiv preprint arXiv:2408.12748, 2024

arXiv 2024

[35] [35]

Parametric retrieval augmented generation

Wei Su, Yi Tang, Qingyao Ai, Jing Yan, Chengjin Wang, and Haitao Wang. Parametric retrieval augmented generation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2025

2025

[36] [36]

Confabulation maps to steerable latent directions in transformer representations

Thibaud Ardoin, Yi Cai, and Günter Wunder. Confabulation maps to steerable latent directions in transformer representations. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 2025

2025

[37] [37]

Repeater Books, 2016

Mark Fisher.The Weird and the Eerie. Repeater Books, 2016. 26

2016