LGMT applies metamorphic testing derived from first-order logic equivalences to detect reasoning inconsistencies in LLMs that static benchmarks miss.
Hallucinationdetectionfor llm-based text-to-sql generation via two-stage metamorphic testing
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.
A paired benchmark demonstrates that providing an explicit semantic layer document improves LLM accuracy on text-to-SQL tasks by 17-23 percentage points and eliminates meaningful differences between frontier models.
citing papers explorer
-
LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
LGMT applies metamorphic testing derived from first-order logic equivalences to detect reasoning inconsistencies in LLMs that static benchmarks miss.
-
Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification
Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.
-
Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models
A paired benchmark demonstrates that providing an explicit semantic layer document improves LLM accuracy on text-to-SQL tasks by 17-23 percentage points and eliminates meaningful differences between frontier models.