RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.
An in-context learning agent for formal theorem-proving
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
SpecRL uses the fraction of negative tests rejected by candidate specifications as a reward signal in RL training to produce stronger and more verifiable formal specifications than prior methods.
Agentic Claude reaches 98.8% valid specs, 87.5% implementation certification, and 98.1% end-to-end success on CLEVER, revealing a mismatch between benchmark difficulty and current prover performance.
citing papers explorer
-
RepairAgent: An Autonomous, LLM-Based Agent for Program Repair
RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.
-
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
-
A Minimal Agent for Automated Theorem Proving
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
-
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
-
Reinforcement Learning with Negative Tests as Completeness Signal for Formal Specification Synthesis
SpecRL uses the fraction of negative tests rejected by candidate specifications as a reward signal in RL training to produce stronger and more verifiable formal specifications than prior methods.
-
Agentic Proving for Program Verification
Agentic Claude reaches 98.8% valid specs, 87.5% implementation certification, and 98.1% end-to-end success on CLEVER, revealing a mismatch between benchmark difficulty and current prover performance.
- Discovering New Theorems via LLMs with In-Context Proof Learning in Lean