Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

· 2026 · cs.SE · arXiv 2602.11988

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open full Pith review browse 11 citing papers arXiv PDF

abstract

A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files, and a novel collection of issues from repositories containing developer-committed context files. Surprisingly, we find that providing context files does not generally improve task success rates, while increasing inference cost by over 20% on average. This observation holds across different LLMs, coding agents, and for both LLM-generated and developer-committed context files. Specifically, we find that while instructions in the context files are well followed by coding agents, repository overviews, although popular and recommended by model providers, are not helpful. We conclude that while context files are useful for specifying non-standard coding practices, any attempts to improve performance should be rigorously evaluated before deployment.

representative citing papers

BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

cs.SE · 2026-05-15 · unverdicted · novelty 7.0

BootstrapAgent distills repository bootstrapping heuristics into a persistent .bootstrap contract via multi-agent evidence extraction, Docker verification, and trace-driven repair, reporting 92.9% success and efficiency gains on three benchmarks.

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

PerfCodeBench reveals that state-of-the-art LLMs produce functionally correct but significantly slower code than expert-optimized versions on system-level tasks, especially those involving parallelism and GPUs.

Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study

cs.SE · 2026-06-10 · unverdicted · novelty 6.0

Mixed-methods study creates taxonomy of AI IDE rules from 7310 instances, analyzes evolution drivers, and reports that rule updates raise average artifact compliance from 49.14% to 72.13%.

Coding Agents Don't Know When to Act

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

Coding agents exhibit action bias by proposing undesirable changes on already-fixed issues 35-65% of the time, and explicit reproduction instructions only partially mitigate this while creating new abstention errors.

ZORO: Active Rules for Reliable Vibe Coding

cs.HC · 2026-04-17 · unverdicted · novelty 6.0

ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.

Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables

cs.SE · 2026-05-11 · unverdicted · novelty 5.0

A 1650-session factorial study found no measurable impact from config file size, instruction position, architecture, or conflicts on coding agent adherence, though compliance declined within sessions.

Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development

cs.SE · 2026-05-19 · unverdicted · novelty 4.0

Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.

Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability

cs.SE · 2026-05-17 · conditional · novelty 4.0

Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.

Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development

cs.SE · 2026-04-23 · unverdicted · novelty 4.0

The authors propose field-scoped epistemic grounding documents that override user prompts with non-negotiable validity rules for AI-assisted scientific software development.

ASE-26: a curriculum for agentic software engineering as a discipline

cs.CY · 2026-05-31 · unverdicted · novelty 3.0

ASE-26 is a proposed undergraduate curriculum for agentic software engineering organized around an evolutionary spiral of intent and build, with 21 modules and pedagogical commitments for agent-co-produced work.

From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution

cs.SE · 2026-04-16

citing papers explorer

Showing 1 of 1 citing paper after filters.

ZORO: Active Rules for Reliable Vibe Coding cs.HC · 2026-04-17 · unverdicted · none · ref 15 · internal anchor
ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

fields

years

verdicts

representative citing papers

citing papers explorer