Hallucinationdetectionfor llm-based text-to-sql generation via two-stage metamorphic testing

Yang,B · 2025 · arXiv 2512.22250

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

cs.AI · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

LGMT is a logic-grounded metamorphic testing framework that detects hidden reasoning defects in LLMs by checking consistency on semantically invariant inputs derived from FOL equivalences.

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.

Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models

cs.AI · 2026-04-28 · conditional · novelty 5.0

A paired benchmark demonstrates that providing an explicit semantic layer document improves LLM accuracy on text-to-SQL tasks by 17-23 percentage points and eliminates meaningful differences between frontier models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs cs.AI · 2026-05-12 · unverdicted · none · ref 58 · 2 links
LGMT is a logic-grounded metamorphic testing framework that detects hidden reasoning defects in LLMs by checking consistency on semantically invariant inputs derived from FOL equivalences.
Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models cs.AI · 2026-04-28 · conditional · none · ref 8
A paired benchmark demonstrates that providing an explicit semantic layer document improves LLM accuracy on text-to-SQL tasks by 17-23 percentage points and eliminates meaningful differences between frontier models.

Hallucinationdetectionfor llm-based text-to-sql generation via two-stage metamorphic testing

fields

years

verdicts

representative citing papers

citing papers explorer