Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
The Productivity-Reliability Paradox arises because AI code generators produce variable output while developers lack sufficient specification discipline, making governance models focused on specifications the binding constraint rather than model improvements.
citing papers explorer
-
Generating Complex Code Analyzers from Natural Language Questions
Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.
-
The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development
The Productivity-Reliability Paradox arises because AI code generators produce variable output while developers lack sufficient specification discipline, making governance models focused on specifications the binding constraint rather than model improvements.