SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
EnCoDe enables design-time prediction of block-level energy consumption in Python code via static features and ML models trained on a dataset from 18,000 programs, achieving R²=0.75 and 80.6% hotspot classification accuracy.
Autark is a serverless toolkit that enables rapid prototyping of urban visual analytics systems via domain-aware abstractions and supports more reliable LLM-assisted coding.
citing papers explorer
-
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
-
EnCoDe: Energy Estimation of Source Code At Design-Time
EnCoDe enables design-time prediction of block-level energy consumption in Python code via static features and ML models trained on a dataset from 18,000 programs, achieving R²=0.75 and 80.6% hotspot classification accuracy.
-
Autark: A Serverless Toolkit for Prototyping Urban Visual Analytics Systems
Autark is a serverless toolkit that enables rapid prototyping of urban visual analytics systems via domain-aware abstractions and supports more reliable LLM-assisted coding.