PhantomBench is a new benchmark of 60K+ non-existent terms showing language models hallucinate at rates up to 86.7 percent even when inputs assume the concepts exist.
ISBN 979-8-89176-251-0
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
Code-on-Graph lets LLMs turn retrieved KG facts into Python class instances and generate executable code for reasoning, outperforming prior LLM-KG methods by up to 10.5% on WebQSP, CWQ, and GrailQA.
Defines Decision Potential Surface (DPS) whose zero isohypse equals an LLM decision boundary and supplies a K-sample approximation algorithm with derived upper bounds on absolute, expected, and concentration errors.
Dedicated Feature Crosscoders localize RL-induced tool use to a compact feature set in Qwen2.5-3B, yielding +31.1 pp tool correctness gains and +6.8 pp spillover to the base model.
REVERIEMEM is a three-layer perspective-bounded memory system that raises knowledge boundary fidelity by 34.6 points and wins ~79% of narrative comparisons on a new book-based role-playing benchmark.
Anonymization placement in RAG—at the dataset or at the generated answer—creates observable differences in privacy protection versus response utility.
A two-stage hybrid search pipeline paired with a synthetic-data fine-tuned and compressed Ukrainian language model delivers competitive local question answering under strict compute limits.
citing papers explorer
-
Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs
Code-on-Graph lets LLMs turn retrieved KG facts into Python class instances and generate executable code for reasoning, outperforming prior LLM-KG methods by up to 10.5% on WebQSP, CWQ, and GrailQA.