Introduces evaluation of LLMs' implicit software world models via prediction of execution resources on real software tasks, finding modest and brittle performance across models including frontier ones.
, month = oct, year =
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
RepoMirage uses semantics-preserving perturbations on SWE-Bench to show code agents lack repository context reasoning, with performance falling sharply on extended structure tasks, and introduces RepoAnchor as a structure-first fix.
citing papers explorer
-
Towards Evaluation of Implicit Software World Models in Coding LLMs
Introduces evaluation of LLMs' implicit software world models via prediction of execution resources on real software tasks, finding modest and brittle performance across models including frontier ones.
-
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations
RepoMirage uses semantics-preserving perturbations on SWE-Bench to show code agents lack repository context reasoning, with performance falling sharply on extended structure tasks, and introduces RepoAnchor as a structure-first fix.