{"total":17,"items":[{"citing_arxiv_id":"2606.24579","ref_index":55,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Cross-Lingual Exploration for Parametric Knowledge","primary_cat":"cs.CL","submitted_at":"2026-06-23T13:42:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cross-lingual prompt exploration improves factual recall and consistency in LLMs across 17 languages more efficiently than native-language scaling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23057","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Who Owns the AI Recommendation? A Multi-Industry Empirical Map of Brand Category Ownership Across Large Language Models","primary_cat":"cs.IR","submitted_at":"2026-06-22T09:10:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Empirical study of LLM brand recommendations across industries finds moderate concentration (mean Gini 0.28) and low cross-model agreement (41.6%) on top brands.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09338","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Multi-Hop Knowledge Composition is Bound by Pretraining Exposure","primary_cat":"cs.CL","submitted_at":"2026-06-08T11:05:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Controlled experiments show implicit multi-hop reasoning in LLMs requires prior exposure to compositional contexts during pretraining and does not transfer to unexposed individuals.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08397","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"TrustMargin: Training-Free Arbitration between Parametric Memory and Retrieved Evidence in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-06-07T01:25:09+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TrustMargin arbitrates between direct and RAG answers from a frozen LLM by combining a parametric-prior margin and an evidence-binding margin computed from model likelihoods, improving results on 2WikiMQA and CWQA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22176","ref_index":38,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"LLM-Metrics: Measuring Research Impact Through Large Language Model Memory","primary_cat":"cs.AI","submitted_at":"2026-05-21T08:45:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM-Metrics probes memory in 17 LLMs across 549 2023-2024 CS papers and finds a modest Spearman correlation (rho=0.1495) with citation counts, stronger for 2024 papers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12382","ref_index":17,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Pretraining Exposure Explains Popularity Judgments in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-12T16:45:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"memorization, data contamination, and membership inference un- der a common lens of what models have \"seen\" during training [24]. At the same time, exposure is not only a property of the data but also of the training process: exposure bias arising from discrepan- cies between training and inference can amplify pretraining-driven patterns, particularly in distillation settings [17]. Similarly, in model updating scenarios, LLMs tend to favor pretraining knowledge even after fine-tuning, unless explicitly mitigated [29]. Together, these findings suggest that both data exposure and procedural biases reinforce the influence of pretraining statistics. In this work, we directly study the relationship between pre- training exposure, real-world popularity, and LLM popularity judg-"},{"citing_arxiv_id":"2605.06335","ref_index":25,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Eliciting associations between clinical variables from LLMs via comparison questions across populations","primary_cat":"cs.LG","submitted_at":"2026-05-07T14:26:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Indirect elicitation via triplet comparisons recovers meaningful association structures from LLMs and supports conservative causal candidate links across prompted subpopulations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00226","ref_index":71,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions","primary_cat":"cs.CL","submitted_at":"2026-04-30T21:04:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs encode accurate but brittle internal beliefs about latent game states and convert them poorly into actions, creating systematic gaps that explain strategic failures.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Association for Computational Linguistics. doi:10.18653/v1/2022.emnlp-main.335. URL https: //aclanthology.org/2022.emnlp-main.335/. [70] Jerry Wang and Ting Yu Liu. Observer, not player: Simulating theory of mind in large language models through game observation. InFirst Workshop on Foun- dations of Reasoning in Language Models, 2025. URL https://openreview.net/forum?id=GYi2Voim 9O. [71] Joseph Suh, Erfan Jahanparast, Suhong Moon, Min- woo Kang, and Serina Chang. Language model fine- tuning on scaled survey data for predicting distribu- tions of public opinions. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguis-"},{"citing_arxiv_id":"2604.25032","ref_index":189,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Offline Evaluation Measures of Fairness in Recommender Systems","primary_cat":"cs.IR","submitted_at":"2026-04-27T22:28:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22849","ref_index":53,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"R$^3$AG: Retriever Routing for Retrieval-Augmented Generation","primary_cat":"cs.IR","submitted_at":"2026-04-22T06:51:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"R³AG routes queries to retrievers by decomposing capabilities into retrieval quality and generation utility, trained via contrastive learning on document assessments and downstream answer correctness to outperform static methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17325","ref_index":126,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation","primary_cat":"cs.CL","submitted_at":"2026-04-19T08:39:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.02543","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Norm Anchors Make Model Edits Last","primary_cat":"cs.LG","submitted_at":"2026-01-30T04:31:21+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Norm-Anchor Scaling breaks the norm-feedback loop in sequential LLM editing by anchoring value vectors to original norms, improving long-run performance by 72.2% and extending the editing horizon over 4x.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.23765","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality","primary_cat":"cs.CL","submitted_at":"2025-09-28T09:23:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"KLCF formalizes long-form factuality as bidirectional distribution matching between expressed and parametric knowledge, using a sampled factual checklist for recall and a truthfulness reward for precision.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2311.05232","ref_index":258,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions","primary_cat":"cs.CL","submitted_at":"2023-11-09T09:25:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"We also highlight several promising avenues for future research, such as hallucinations in large vision-language models and understanding of knowledge boundaries in LLM hallucinations, paving the way for forthcoming research in the field. Comparing with Existing Surveys. As hallucination stands out as a major challenge in gener- ative AI, numerous research [136, 192, 258, 298, 312, 376] has been directed towards hallucinations. While these contributions have explored LLM hallucination from various perspectives and provided valuable insights, our survey seeks to delineate their distinct contributions and the comprehensive scope they encompass. Ji et al. [136] primarily shed light on hallucinations in pre-trained models for NLG tasks, leaving LLMs outside their discussion purview."},{"citing_arxiv_id":"2205.01068","ref_index":187,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"OPT: Open Pre-trained Transformer Language Models","primary_cat":"cs.CL","submitted_at":"2022-05-02T17:49:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2202.05262","ref_index":30,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Locating and Editing Factual Associations in GPT","primary_cat":"cs.CL","submitted_at":"2022-02-10T18:59:54+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2110.01552","ref_index":26,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Perhaps PTLMs Should Go to School -- A Task to Assess Open Book and Closed Book QA","primary_cat":"cs.CL","submitted_at":"2021-10-04T16:45:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Proposes a textbook-based true/false QA task where PTLMs score ~50% closed-book even after pre-training on the text and ~60% open-book with retrieval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}