BootstrapAgent distills repository bootstrapping heuristics into a persistent .bootstrap contract via multi-agent evidence extraction, Docker verification, and trace-driven repair, reporting 92.9% success and efficiency gains on three benchmarks.
Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?
11 Pith papers cite this work. Polarity classification is still indexing.
abstract
A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files, and a novel collection of issues from repositories containing developer-committed context files. Surprisingly, we find that providing context files does not generally improve task success rates, while increasing inference cost by over 20% on average. This observation holds across different LLMs, coding agents, and for both LLM-generated and developer-committed context files. Specifically, we find that while instructions in the context files are well followed by coding agents, repository overviews, although popular and recommended by model providers, are not helpful. We conclude that while context files are useful for specifying non-standard coding practices, any attempts to improve performance should be rigorously evaluated before deployment.
years
2026 11representative citing papers
PerfCodeBench reveals that state-of-the-art LLMs produce functionally correct but significantly slower code than expert-optimized versions on system-level tasks, especially those involving parallelism and GPUs.
Mixed-methods study creates taxonomy of AI IDE rules from 7310 instances, analyzes evolution drivers, and reports that rule updates raise average artifact compliance from 49.14% to 72.13%.
Coding agents exhibit action bias by proposing undesirable changes on already-fixed issues 35-65% of the time, and explicit reproduction instructions only partially mitigate this while creating new abstention errors.
ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.
A 1650-session factorial study found no measurable impact from config file size, instruction position, architecture, or conflicts on coding agent adherence, though compliance declined within sessions.
Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.
Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.
The authors propose field-scoped epistemic grounding documents that override user prompts with non-negotiable validity rules for AI-assisted scientific software development.
ASE-26 is a proposed undergraduate curriculum for agentic software engineering organized around an evolutionary spiral of intent and build, with 21 modules and pedagogical commitments for agent-co-produced work.
citing papers explorer
-
ZORO: Active Rules for Reliable Vibe Coding
ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.