Hallucinationdetectionfor llm-based text-to-sql generation via two-stage metamorphic testing

Yang,B · 2025 · arXiv 2512.22250

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

LGMT applies metamorphic testing derived from first-order logic equivalences to detect reasoning inconsistencies in LLMs that static benchmarks miss.

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.

Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models

cs.AI · 2026-04-28 · conditional · novelty 5.0

A paired benchmark demonstrates that providing an explicit semantic layer document improves LLM accuracy on text-to-SQL tasks by 17-23 percentage points and eliminates meaningful differences between frontier models.

citing papers explorer

Showing 3 of 3 citing papers.

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs cs.AI · 2026-05-12 · unverdicted · none · ref 58
LGMT applies metamorphic testing derived from first-order logic equivalences to detect reasoning inconsistencies in LLMs that static benchmarks miss.
Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification cs.CL · 2026-05-27 · unverdicted · none · ref 30
Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.
Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models cs.AI · 2026-04-28 · conditional · none · ref 8
A paired benchmark demonstrates that providing an explicit semantic layer document improves LLM accuracy on text-to-SQL tasks by 17-23 percentage points and eliminates meaningful differences between frontier models.

Hallucinationdetectionfor llm-based text-to-sql generation via two-stage metamorphic testing

fields

years

verdicts

representative citing papers

citing papers explorer