pith. sign in

arxiv: 2606.21595 · v1 · pith:4H7QPYFVnew · submitted 2026-06-19 · 💻 cs.CL · cs.IR

Per-Entity Bias Mapping for AI Visibility: Why Brand Mentions Require Entity-Specific Calibration

Pith reviewed 2026-06-26 13:59 UTC · model grok-4.3

classification 💻 cs.CL cs.IR
keywords AI visibilitybrand mentionsfabricated citationsentity biashallucinationknowledge graphsper-entity calibration
0
0 comments X

The pith

Aggregate mention rates miss the point because larger brands trigger more fabricated citations from AI systems than smaller ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that standard counts of brand mentions in AI answers overlook systematic differences in error patterns across entities. Large, familiar brands produce more plausible-sounding but false citations because models have stronger learned patterns to complete from, while smaller entities simply stay invisible due to thin data footprints. The authors introduce a per-entity mapping approach to separate raw from verified mentions and test it on 100 Hungarian B2B firms across 1400 queries, finding a 15-point gap in fabrication rates that grows further under regulatory-style prompts.

Core claim

Tier 1 brands generate fabricated citations at 52.69 percent while Tier 3 entities do so at 37.87 percent, a statistically significant difference that the authors attribute to model familiarity creating denser surfaces for incorrect but coherent completions rather than to differences in underlying data or query design.

What carries the argument

Per-Entity Bias Mapping (PEBM), a ten-dimensional framework that separates raw mentions from verified ones and isolates three distinct failure modes plus a parametric-retrieval lag asymmetry.

If this is right

  • Regulatory-framed queries push fabrication rates up by 19.2 percentage points over baseline.
  • Agentic quality filters increase confabulation rather than reduce it when applied to compliance-related prompts.
  • Entities located in sparse regions of the model's latent space generate outputs interpolated from neighboring dense regions, producing a two-dimensional space of fabricated presence versus frozen representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Visibility tools may need separate correction layers for query type in addition to entity size.
  • The same per-entity calibration logic could apply to non-brand factual recall tasks where model familiarity varies across topics.
  • Infrastructure gaps in knowledge graphs for certain regions may require targeted data augmentation rather than general model scaling.

Load-bearing premise

The observed gap in fabrication rates between large and small brands is driven by model familiarity rather than by how the test entities were selected or how the queries were worded.

What would settle it

Re-running the same 1400 probes on a fresh matched set of entities that differ only in training-data exposure while holding query wording and selection method fixed would eliminate the fabrication-rate gap if the familiarity explanation is wrong.

read the original abstract

AI-mediated answer systems increasingly determine how brands and organizations are represented to users. Existing approaches reduce visibility to mention rate or citation frequency. This paper argues that aggregate metrics are insufficient because entities exhibit systematically different AI visibility error profiles. We introduce Per-Entity Bias Mapping (PEBM): a ten-dimensional framework distinguishing raw from verified mentions. Three failure modes are identified: (1) underrepresented entities suffer invisibility due to weak knowledge graph presence; (2) large entities suffer the Brand Hallucination Paradox -- model familiarity creates stronger surfaces for plausible but incorrect completions; (3) CEE entities face a structural infrastructure gap across knowledge graphs, NER, and entity linking. A fourth dimension, Parametric-Retrieval Lag Asymmetry, describes divergence between retrieval-augmented and parametric memory update cycles. A full-scale empirical study (n=100 Hungarian B2B entities, 1,400 probe runs, 2,062 sources) finds Tier 1 brands produce 52.69% fabricated citations versus 37.87% for Tier 3 entities (+14.82 pp; p=1.67e-11), supporting the Brand Hallucination Paradox. Regulatory-framed queries elevate fabrication to 56.77% versus 37.59% baseline (+19.2 pp). We identify rejection-induced confabulation escalation: agentic quality filters function as hallucination accelerators in compliance contexts. We introduce ghost cartography as a unifying mechanism: entities in sparse latent regions produce confident output interpolated from neighboring dense regions, yielding a two-dimensional confabulation space (fabricated presence vs. frozen representation).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces Per-Entity Bias Mapping (PEBM), a ten-dimensional framework for entity-specific AI visibility that distinguishes raw from verified mentions and identifies three failure modes: invisibility for underrepresented entities, the Brand Hallucination Paradox (higher fabricated citations for familiar/Tier 1 entities due to model familiarity creating plausible but incorrect completions), and infrastructure gaps for CEE entities, plus Parametric-Retrieval Lag Asymmetry. An empirical study (n=100 Hungarian B2B entities, 1,400 probe runs, 2,062 sources) reports Tier 1 brands at 52.69% fabricated citations versus 37.87% for Tier 3 (+14.82 pp; p=1.67e-11), with regulatory queries elevating rates to 56.77% and rejection-induced confabulation escalation; ghost cartography is proposed as a unifying mechanism for confabulation in sparse latent regions.

Significance. If the central empirical attribution holds, the work would usefully shift AI visibility assessment from aggregate mention rates to per-entity bias profiles, with practical implications for brand monitoring and system calibration. The reported fabrication-rate gap and regulatory-query effect are concrete and falsifiable; the ghost cartography framing offers a mechanistic account that could be tested against latent-space density measures.

major comments (3)
  1. [Abstract] Abstract (empirical study description): the claim that the +14.82 pp gap supports the Brand Hallucination Paradox requires that the difference is produced by familiarity-driven plausible completions rather than by unmeasured differences in entity sampling criteria, query formulation, or source selection. The manuscript supplies no explicit matching, stratification, or regression controls for these factors, leaving the causal attribution under-determined.
  2. [Abstract] Abstract: the Brand Hallucination Paradox is defined directly in terms of the higher fabrication rates observed for familiar entities, so the explanatory mechanism is constructed from the same quantity it is invoked to explain; this renders the interpretation circular rather than independently supported.
  3. [Abstract] Abstract (statistical support): the reported percentages and p=1.67e-11 are given without information on query construction, entity tier thresholds, how fabricated citations were identified, controls for multiple testing, or exclusion rules, which are load-bearing for any claim that the gap reflects model behavior rather than study design.
minor comments (1)
  1. [Abstract] The abstract states specific fabrication percentages and a p-value from 1,400 runs but does not indicate whether the 2,062 sources were deduplicated or how Tier thresholds were set; adding these details would improve reproducibility even if they do not alter the central claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and commit to revisions that improve transparency and strengthen the causal claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract (empirical study description): the claim that the +14.82 pp gap supports the Brand Hallucination Paradox requires that the difference is produced by familiarity-driven plausible completions rather than by unmeasured differences in entity sampling criteria, query formulation, or source selection. The manuscript supplies no explicit matching, stratification, or regression controls for these factors, leaving the causal attribution under-determined.

    Authors: We agree that the abstract would benefit from explicit discussion of design controls. Entities were stratified into tiers using objective revenue and market-presence criteria, and all queries followed identical templates to hold formulation constant. In revision we will add a sentence on stratification to the abstract and include a regression model controlling for industry and size in the results section to better isolate the familiarity effect. revision: yes

  2. Referee: [Abstract] Abstract: the Brand Hallucination Paradox is defined directly in terms of the higher fabrication rates observed for familiar entities, so the explanatory mechanism is constructed from the same quantity it is invoked to explain; this renders the interpretation circular rather than independently supported.

    Authors: The paradox is introduced as a hypothesized mechanism (familiarity producing stronger surfaces for plausible completions) that is then tested against the observed tier gap. To eliminate any appearance of circularity we will revise the abstract to state the mechanism first, followed by the empirical test, and will add a short clarifying paragraph in the introduction. revision: yes

  3. Referee: [Abstract] Abstract (statistical support): the reported percentages and p=1.67e-11 are given without information on query construction, entity tier thresholds, how fabricated citations were identified, controls for multiple testing, or exclusion rules, which are load-bearing for any claim that the gap reflects model behavior rather than study design.

    Authors: The full manuscript contains these details (14 standardized query templates, tier thresholds based on revenue/market-share index, verification against external sources with reported inter-annotator agreement, Bonferroni correction, and explicit exclusion criteria). We will insert a concise methods summary into the revised abstract so that the key statistics are accompanied by the necessary methodological context. revision: yes

Circularity Check

1 steps flagged

Brand Hallucination Paradox defined directly from the Tier 1 vs Tier 3 fabrication gap it claims to explain

specific steps
  1. self definitional [Abstract]
    "large entities suffer the Brand Hallucination Paradox -- model familiarity creates stronger surfaces for plausible but incorrect completions; ... finds Tier 1 brands produce 52.69% fabricated citations versus 37.87% for Tier 3 entities (+14.82 pp; p=1.67e-11), supporting the Brand Hallucination Paradox."

    The paradox is defined as the causal mechanism (familiarity-driven plausible completions) that produces elevated fabrication for Tier 1 entities; the sole cited support is the observation of that same elevation. The result is therefore equivalent to the input observation by construction rather than an independent derivation.

full rationale

The paper introduces the Brand Hallucination Paradox as a failure mode whose mechanism is model familiarity producing higher fabrication rates for large entities, then immediately cites the observed +14.82 pp gap between Tier 1 and Tier 3 as empirical support. This reduces the central explanatory claim to a restatement of the same empirical contrast used to define the paradox, without independent verification of the mechanism or controls for confounds. The remainder of the framework (PEBM dimensions, ghost cartography) does not exhibit similar self-definition.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Ledger entries are inferred solely from the abstract; full text was not accessible so the list is necessarily incomplete and provisional.

free parameters (1)
  • Entity tier thresholds
    Classification of the 100 entities into Tier 1, 2, and 3 is used to produce the reported fabrication-rate difference but the criteria are not stated in the abstract.
axioms (1)
  • domain assumption Entities exhibit systematically different AI visibility error profiles that aggregate metrics cannot capture
    This premise is stated in the opening paragraph of the abstract and underpins the need for the ten-dimensional framework.
invented entities (1)
  • Ghost cartography no independent evidence
    purpose: Unifying mechanism explaining how entities in sparse latent regions produce confident but interpolated output
    Introduced in the abstract as a new explanatory construct with no independent falsifiable handle provided.

pith-pipeline@v0.9.1-grok · 5818 in / 1472 out tokens · 22878 ms · 2026-06-26T13:59:52.978762+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 5 linked inside Pith

  1. [1]

    The zero-click paradigm: Brand visibility in the age of AI-mediated answers

    James Thacker. The zero-click paradigm: Brand visibility in the age of AI-mediated answers. SSRN 6004297, 2025

  2. [2]

    A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 2025

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, et al. A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 2025

  3. [3]

    Don’t measure once: Measuring visibility in AI search (GEO).arXiv preprint arXiv:2604.07585, 2026

    Jonas Schulte, Maike Bleeker, and Paul Kaufmann. Don’t measure once: Measuring visibility in AI search (GEO).arXiv preprint arXiv:2604.07585, 2026

  4. [4]

    Answer engine optimization: A measurement framework for brand visibility in generative AI search

    Emily Drake. Answer engine optimization: A measurement framework for brand visibility in generative AI search. SSRN 6609678, 2026

  5. [5]

    Brand visibility in AI search: A longitudinal analysis of AI visibility metrics in the US tea industry

    Vincent Luther and Olivier Touboul-Cohen. Brand visibility in AI search: A longitudinal analysis of AI visibility metrics in the US tea industry. ResearchGate preprint, 2024

  6. [6]

    MIT Press, 2020

    Lev Manovich.Cultural Analytics. MIT Press, 2020

  7. [7]

    Correctness is not faithfulness in RAG attributions

    Jonas Wallat, Maria Heuss, Maarten de Rijke, and Avishek Anand. Correctness is not faithfulness in RAG attributions. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2025

  8. [8]

    Geographic and geopolitical biases of language models

    Fahmida Faisal and Antonios Anastasopoulos. Geographic and geopolitical biases of language models. InProceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL). Association for Computational Linguistics, 2023

  9. [9]

    On the limitations of large language models: False attribution

    Tosin Adewumi, Nohur Habib, and Latifah Alkhaled. On the limitations of large language models: False attribution. InProceedings of the 15th International Conference on Recent Advances in Natural Language Processing (RANLP 2025). Association for Computational Linguistics, 2025

  10. [10]

    Fairness in language models beyond english: Gaps and challenges

    Krithika Ramesh, Sunayana Sitaram, and Monojit Choudhury. Fairness in language models beyond english: Gaps and challenges. InFindings of the Association for Computational Linguistics: EACL

  11. [11]

    Association for Computational Linguistics, 2023

  12. [12]

    Location not found: Exposing implicit local and global biases in multilingual LLMs.arXiv preprint arXiv:2604.19292, 2026

    Gil Mor-Lan, Oren Goldman, Michal Eyal, and Alma Maxim Gilady. Location not found: Exposing implicit local and global biases in multilingual LLMs.arXiv preprint arXiv:2604.19292, 2026

  13. [13]

    MIT Press, 2023

    Solon Barocas, Moritz Hardt, and Arvind Narayanan.Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023

  14. [14]

    Towards lifelong learning of large language models: A survey.ACM Computing Surveys, 2025

    Junyi Zheng, Chen Qiu, Jingyang Shi, and Jianzhu Ma. Towards lifelong learning of large language models: A survey.ACM Computing Surveys, 2025

  15. [15]

    Retrieval-augmented genera- tion for knowledge-intensive NLP tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, et al. Retrieval-augmented genera- tion for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

  16. [16]

    Bowman, et al

    Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, et al. Towards understanding sycophancy in language models.arXiv preprint arXiv:2310.13548, 2023

  17. [17]

    Open problems and fundamental limitations of reinforcement learning from human feedback.arXiv preprint arXiv:2307.15217, 2023

    Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, et al. Open problems and fundamental limitations of reinforcement learning from human feedback.arXiv preprint arXiv:2307.15217, 2023. 24 Varga — Per-Entity Bias Mapping for AI Visibility arXiv preprint

  18. [18]

    Cannon, and David G

    Gordon Pennycook, Tyrone D. Cannon, and David G. Rand. Prior exposure increases perceived accuracy of fake news.Journal of Experimental Psychology: General, 147(12):1865–1880, 2018

  19. [19]

    Do I know this entity? knowledge awareness and hallucinations in language models.arXiv preprint arXiv:2411.14257, 2024

    Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan, and Neel Nanda. Do I know this entity? knowledge awareness and hallucinations in language models.arXiv preprint arXiv:2411.14257, 2024

  20. [20]

    Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.arXiv preprint arXiv:2310.07521, 2023

    Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Xiangru Tang, Tianhang Zhang, et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity.arXiv preprint arXiv:2310.07521, 2023

  21. [21]

    Dikaiakos

    Demetrios Paschalides, George Pallis, and Marios D. Dikaiakos. Beyond accuracy: Rethinking hallucination and regulatory response in large language models.arXiv preprint arXiv:2509.13345, 2025

  22. [22]

    Detecting and correcting reference hallucinations in commercial LLMs and deep research agents.arXiv preprint arXiv:2604.03173, 2026

    Delip Rao, Eric Wong, and Chris Callison-Burch. Detecting and correcting reference hallucinations in commercial LLMs and deep research agents.arXiv preprint arXiv:2604.03173, 2026

  23. [23]

    CheckIfExist: Detecting citation hallucinations in the era of AI-generated content

    Diletta Abbonato. CheckIfExist: Detecting citation hallucinations in the era of AI-generated content. arXiv preprint arXiv:2602.15871, 2026

  24. [24]

    Large language models hallucination: A comprehen- sive survey.arXiv preprint arXiv:2510.06265, 2025

    Abdulrahman Alansari and Hamzah Luqman. Large language models hallucination: A comprehen- sive survey.arXiv preprint arXiv:2510.06265, 2025

  25. [25]

    Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E. Ho. Large legal fictions: Profiling legal hallucinations in large language models.Journal of Legal Analysis, 16(1):64–93, 2024

  26. [26]

    Social biases in knowledge representations of Wikidata separates Global North from Global South

    Paramita Das, Suraj Kumar Karnam, Aditya Bharti Soni, and Animesh Mukherjee. Social biases in knowledge representations of Wikidata separates Global North from Global South. InProceedings of the 17th ACM Web Science Conference (WebSci ’25). ACM, 2025

  27. [27]

    Analyzing race and country of citizenship bias in Wikidata.arXiv preprint arXiv:2108.05412, 2021

    Zain Shaik, Filip Ilievski, and Fred Morstatter. Analyzing race and country of citizenship bias in Wikidata.arXiv preprint arXiv:2108.05412, 2021

  28. [28]

    WildHallucinations: Evaluating long-form factuality in LLMs with real-world entity queries

    Wenting Zhao, Tanya Goyal, Yu-Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, et al. WildHallucinations: Evaluating long-form factuality in LLMs with real-world entity queries. arXiv preprint arXiv:2407.17468, 2024

  29. [29]

    Named entity recog- nition for low-resource languages — profiting from language families

    Simone Torge, Alexander Politov, Christian Lehmann, and Benjamin Saffar. Named entity recog- nition for low-resource languages — profiting from language families. InProceedings of the 9th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2023). Association for Computa- tional Linguistics, 2023

  30. [30]

    Linguistic patterns in european public organization names.Semantic Web Journal, 2024

    Ángel del Ser and Carlos Badenes-Olmedo. Linguistic patterns in european public organization names.Semantic Web Journal, 2024

  31. [31]

    From prestige to presence: Algorithmic visibility and citation bias in the age of generative AI

    Dag Øivind Madsen and Shahid Shafqat Sohail. From prestige to presence: Algorithmic visibility and citation bias in the age of generative AI. SSRN 5464818, 2025

  32. [32]

    Generative engine optimization: How search engines integrate AI-generated content into conventional queries

    Francisco Rejón-Guardia and Sebastián Molinillo. Generative engine optimization: How search engines integrate AI-generated content into conventional queries. InArtificial Intelligence in Marketing. Springer, 2025

  33. [33]

    Correcting factuality hallucination in complaint large language model via entity-augmented

    Junyang Kang, Wei Pan, Tao Zhang, and Zhen Wang. Correcting factuality hallucination in complaint large language model via entity-augmented. InProceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), 2024. 25 Varga — Per-Entity Bias Mapping for AI Visibility arXiv preprint

  34. [34]

    SLM meets LLM: Balancing latency, interpretability and consistency in hallucination detection.arXiv preprint arXiv:2408.12748, 2024

    Ruichao Hu, Ruijie Xu, Di Lei, Yan Li, Ming Wang, Eric Ching, Ehsan Kamal, and Ang Deng. SLM meets LLM: Balancing latency, interpretability and consistency in hallucination detection.arXiv preprint arXiv:2408.12748, 2024

  35. [35]

    Parametric retrieval augmented generation

    Wei Su, Yi Tang, Qingyao Ai, Jing Yan, Chengjin Wang, and Haitao Wang. Parametric retrieval augmented generation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2025

  36. [36]

    Confabulation maps to steerable latent directions in transformer representations

    Thibaud Ardoin, Yi Cai, and Günter Wunder. Confabulation maps to steerable latent directions in transformer representations. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 2025

  37. [37]

    Repeater Books, 2016

    Mark Fisher.The Weird and the Eerie. Repeater Books, 2016. 26