Title resolution pending

Dhuliawala, Shehzaad, Komeili, Mojtaba, Xu, Jing, Raileanu, Roberta, Li, Xian, Celikyilmaz, Asli · 2023 · arXiv 2309.11495

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.

Credo: Declarative Control of LLM Pipelines via Beliefs and Policies

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.

Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents

cs.AI · 2026-04-05 · unverdicted · novelty 7.0

PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.

Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

Compositional selective specificity (CSS) improves overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity by calibrating claim-level backoffs in agentic AI responses.

Towards an AI co-scientist

cs.AI · 2025-02-26 · unverdicted · novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

cs.CL · 2023-10-17 · unverdicted · novelty 6.0

Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.

HalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMs

cs.CL · 2026-05-04 · unverdicted · novelty 5.0

HalluScan benchmark tests hallucination detectors on LLMs, identifies NLI Verification as top performer with 0.88 AUROC, and introduces HalluScore (r=0.41 with humans) plus a routing method for 2x cost savings.

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

cs.CL · 2023-11-09 · unverdicted · novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)

cs.LG · 2026-04-13 · unverdicted · novelty 4.0

HUMBR reduces LLM hallucinations in enterprise workflows by using a hybrid semantic-lexical utility within minimum Bayes risk decoding to identify consensus outputs, with derived error bounds and reported outperformance over self-consistency on benchmarks and production data.

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

cs.CL · 2023-09-03 · unverdicted · novelty 4.0

A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

cs.AI · 2024-02-05 · unverdicted · novelty 3.0

A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.

citing papers explorer

Showing 12 of 12 citing papers.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unverdicted · none · ref 116
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents cs.AI · 2026-04-21 · unverdicted · none · ref 18
Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.
Credo: Declarative Control of LLM Pipelines via Beliefs and Policies cs.AI · 2026-04-15 · unverdicted · none · ref 3
Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.
Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents cs.AI · 2026-04-05 · unverdicted · none · ref 3
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems cs.CL · 2026-04-19 · unverdicted · none · ref 3
Compositional selective specificity (CSS) improves overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity by calibrating claim-level backoffs in agentic AI responses.
Towards an AI co-scientist cs.AI · 2025-02-26 · unverdicted · none · ref 221
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection cs.CL · 2023-10-17 · unverdicted · none · ref 8
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
HalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMs cs.CL · 2026-05-04 · unverdicted · none · ref 36
HalluScan benchmark tests hallucination detectors on LLMs, identifies NLI Verification as top performer with 0.88 AUROC, and introduces HalluScore (r=0.41 with humans) plus a routing method for 2x cost savings.
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions cs.CL · 2023-11-09 · unverdicted · none · ref 77
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR) cs.LG · 2026-04-13 · unverdicted · none · ref 4
HUMBR reduces LLM hallucinations in enterprise workflows by using a hybrid semantic-lexical utility within minimum Bayes risk decoding to identify consensus outputs, with derived error bounds and reported outperformance over self-consistency on benchmarks and production data.
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models cs.CL · 2023-09-03 · unverdicted · none · ref 4
A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications cs.AI · 2024-02-05 · unverdicted · none · ref 6
A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer