Existing agent systems often rely on Python-specific tooling, effectively overfitting to the original SWE-bench (Yang et al., 2024b)

Provide a benchmark to evaluate model, agent performance across a variety of programming languages, application domains

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SWE-smith: Scaling Data for Software Engineering Agents

cs.SE · 2025-04-30 · conditional · novelty 8.0

SWE-smith scales software engineering training data to 50k instances across 128 repositories, enabling SWE-agent-LM-32B to achieve 40.2% Pass@1 on SWE-bench Verified, state of the art among open-source models.

citing papers explorer

Showing 1 of 1 citing paper.

SWE-smith: Scaling Data for Software Engineering Agents cs.SE · 2025-04-30 · conditional · none · ref 24
SWE-smith scales software engineering training data to 50k instances across 128 repositories, enabling SWE-agent-LM-32B to achieve 40.2% Pass@1 on SWE-bench Verified, state of the art among open-source models.

Existing agent systems often rely on Python-specific tooling, effectively overfitting to the original SWE-bench (Yang et al., 2024b)

fields

years

verdicts

representative citing papers

citing papers explorer