Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
LLMs display clear performance stratification on formal language tasks aligned with Chomsky hierarchy complexity levels, limited by severe efficiency barriers rather than absolute capability.
ALADDIN is a user-requirement-driven GUI test generation framework that incrementally navigates mobile app UIs and builds LLM-guided oracles to validate both correct and faulty user-requested functionalities across six apps.
SelfEvolve achieves 92.7% Pass@1 success on 11 runtime self-extension tasks and outperforms baselines like AutoGen by 61.8% with statistical significance.
ClarifySTL uses LLM agents to interactively detect and resolve vagueness and ambiguity in natural language requirements via clarification queries before generating STL formulas, with evaluations on existing and new benchmarks showing effectiveness.
SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specification is the most damaging defect type while richer benchmarks are more resilient.
PerfOrch is a four-agent multi-LLM system that uses offline profiling to build language-and-category rankings for routing tasks, achieving 97.19% and 95.83% pass@1 on HumanEval-X and EffiBench-X with generalization across benchmarks.
citing papers explorer
-
Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy
LLMs display clear performance stratification on formal language tasks aligned with Chomsky hierarchy complexity levels, limited by severe efficiency barriers rather than absolute capability.
-
Automated Functional Testing for Malleable Mobile Application Driven from User Intent
ALADDIN is a user-requirement-driven GUI test generation framework that incrementally navigates mobile app UIs and builds LLM-guided oracles to validate both correct and faulty user-requested functionalities across six apps.