Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.
Clarify When Necessary: Resolving Ambiguity Through Interaction with LM s
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.
citing papers explorer
-
Generating Complex Code Analyzers from Natural Language Questions
Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.
-
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
-
Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents
Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.