SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
Baseline reference
Program synthesis with large language models
Baseline reference. 60% of citing Pith papers use this work as a benchmark or comparison.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7representative citing papers
Dystruct formulates flexible-length generation in diffusion language models as a dynamic structural inference problem solved via Bayesian integration of local uncertainty and structural signals.
MAS-Algorithm is a multi-agent workflow that improves AI acceptance rates on algorithmic problems by 6.48% on average, outperforming parameter-efficient fine-tuning.
AI alignment is reframed as a fixed-point incentive problem in a solver-auditor pipeline, solved via bilevel optimization and bandit search over reward profiles to maintain monitoring and reduce hallucinations in LLM coding tasks.
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
CodeWiki presents a unified framework for repository-level documentation across seven languages using hierarchical decomposition, recursive multi-agent processing, and multi-modal synthesis, outperforming DeepWiki by 4.73% on CodeWikiBench.
The paper organizes repository-level retrieval-augmented code generation into a unified framework covering retrieval substrate, control regime, and evaluation setting while summarizing strategies, datasets, and challenges.
citing papers explorer
-
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
-
Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
Dystruct formulates flexible-length generation in diffusion language models as a dynamic structural inference problem solved via Bayesian integration of local uncertainty and structural signals.
-
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System
MAS-Algorithm is a multi-agent workflow that improves AI acceptance rates on algorithmic problems by 6.48% on average, outperforming parameter-efficient fine-tuning.
-
AI Alignment via Incentives and Correction
AI alignment is reframed as a fixed-point incentive problem in a solver-auditor pipeline, solved via bilevel optimization and bandit search over reward profiles to maintain monitoring and reduce hallucinations in LLM coding tasks.
-
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
-
CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases
CodeWiki presents a unified framework for repository-level documentation across seven languages using hierarchical decomposition, recursive multi-agent processing, and multi-modal synthesis, outperforming DeepWiki by 4.73% on CodeWikiBench.
-
Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches
The paper organizes repository-level retrieval-augmented code generation into a unified framework covering retrieval substrate, control regime, and evaluation setting while summarizing strategies, datasets, and challenges.