hub

arXiv preprint arXiv:2305.14251 (2023)

Min, S · 2023 · arXiv 2305.14251

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems

cs.CL · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.

On the Factual Consistency of Text-based Explainable Recommendation Models

cs.IR · 2025-12-30 · unverdicted · novelty 7.0

A prompting pipeline and statement-level metrics show that six state-of-the-art text-based explainable recommendation models achieve high semantic similarity but very low factual consistency on Amazon review data.

Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

PTM uses LLMs and clustering on learner journals to build interpretable cognitive models, showing 75% F1 fidelity and positive user feedback in a seven-week study with 40 participants.

Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.

ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations

cs.CL · 2025-09-30 · conditional · novelty 6.0

ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

cs.CL · 2023-10-17 · unverdicted · novelty 6.0

Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.

Chain-of-Verification Reduces Hallucination in Large Language Models

cs.CL · 2023-09-20 · unverdicted · novelty 6.0

Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

cs.CL · 2023-03-15 · unverdicted · novelty 6.0

SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.

SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs

cs.IR · 2026-03-30 · conditional · novelty 5.0

SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.

Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning

cs.HC · 2025-12-29 · unverdicted · novelty 5.0

Althea integrates retrieval-augmented reasoning with varying levels of user scaffolding to improve fact-checking accuracy and foster persistent improvements in critical thinking.

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

cs.CL · 2023-11-09 · unverdicted · novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

HalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMs

cs.CL · 2026-05-04 · unverdicted · novelty 4.0

HalluScan benchmark evaluates hallucination detection in LLMs, reporting NLI Verification at AUROC 0.88 and introducing HalluScore (r=0.41 with humans) plus Adaptive Detection Routing for 2x cost savings.

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

cs.CL · 2023-09-03 · unverdicted · novelty 4.0

A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.

citing papers explorer

Showing 14 of 14 citing papers.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unverdicted · none · ref 178
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems cs.CL · 2026-04-19 · unverdicted · none · ref 10 · 2 links
Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.
On the Factual Consistency of Text-based Explainable Recommendation Models cs.IR · 2025-12-30 · unverdicted · none · ref 21
A prompting pipeline and statement-level metrics show that six state-of-the-art text-based explainable recommendation models achieve high semantic similarity but very low factual consistency on Amazon review data.
Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop cs.LG · 2026-05-06 · unverdicted · none · ref 26
PTM uses LLMs and clustering on learner journals to build interpretable cognitive models, showing 75% F1 fidelity and positive user feedback in a seven-week study with 40 participants.
Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives cs.CL · 2026-04-22 · unverdicted · none · ref 163
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations cs.CL · 2025-09-30 · conditional · none · ref 23
ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection cs.CL · 2023-10-17 · unverdicted · none · ref 136
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
Chain-of-Verification Reduces Hallucination in Large Language Models cs.CL · 2023-09-20 · unverdicted · none · ref 181
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models cs.CL · 2023-03-15 · unverdicted · none · ref 70
SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.
SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs cs.IR · 2026-03-30 · conditional · none · ref 29
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning cs.HC · 2025-12-29 · unverdicted · none · ref 29
Althea integrates retrieval-augmented reasoning with varying levels of user scaffolding to improve fact-checking accuracy and foster persistent improvements in critical thinking.
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions cs.CL · 2023-11-09 · unverdicted · none · ref 225
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
HalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMs cs.CL · 2026-05-04 · unverdicted · none · ref 13
HalluScan benchmark evaluates hallucination detection in LLMs, reporting NLI Verification at AUROC 0.88 and introducing HalluScore (r=0.41 with humans) plus Adaptive Detection Routing for 2x cost savings.
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models cs.CL · 2023-09-03 · unverdicted · none · ref 12
A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.

arXiv preprint arXiv:2305.14251 (2023)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer