Before the Pull Request: Mining Multi-Agent Coordination

Dipankar Sarkar

arxiv: 2606.19616 · v1 · pith:XXX3PPOJnew · submitted 2026-06-17 · 💻 cs.SE · cs.AI· cs.MA

Before the Pull Request: Mining Multi-Agent Coordination

Dipankar Sarkar This is my paper

Pith reviewed 2026-06-26 19:43 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.MA

keywords multi-agent coordinationgit event logpull requestsautonomous coding agentsduplicate work reductioncoordination substratesoftware engineering

0 comments

The pith

A git-embedded event log lets coding agents coordinate before submitting pull requests and cuts duplicate work from 78% to zero.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that coordination among autonomous coding agents occurs before any pull request is opened, and that this phase can be observed directly by embedding an append-only signed event log inside the git repository itself. When agents share this log through the grite substrate, the fraction of work that merely repeats a teammate's effort drops from 78 percent to zero while useful output more than triples. The same mechanism ensures that every agent's local copy of the log reaches an identical state without any write being lost, and the resulting record can be mined to recover concrete failure patterns such as lock starvation and race-to-close that remain invisible in ordinary pull-request histories.

Core claim

Autonomous coding agents produce pull requests faster but with lower acceptance rates because they lack a shared view of who is working on what before submission. grite supplies this view by maintaining an append-only, cryptographically signed event log stored directly inside git; each agent appends its claims and reads the current state from the same log. Experiments demonstrate that this shared substrate drives duplicate work to zero, triples useful throughput, guarantees convergence across agents without dropped writes, and yields a queryable artifact from which specific coordination failures can be extracted with full provenance.

What carries the argument

grite, the coordination substrate that stores its records inside git itself as an append-only signed event log with no central server required

If this is right

The share of work that merely re-does a teammate's task falls from 78% to 0%.
Useful throughput more than triples.
Every agent's copy of the log converges to the same state with no write silently dropped.
Concrete failure modes such as conflicting edits, lock starvation, redundant rediscovery, and race-to-close become automatically recoverable with provenance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same git-based log structure could be reused for coordination in other multi-agent domains that already rely on version control.
Mining the log over time might surface recurring coordination patterns that could be used to improve agent scheduling policies.
Teams that adopt the substrate might see higher pull-request acceptance rates because the coordination failures become visible and addressable earlier.

Load-bearing premise

The git-stored append-only signed event log fully and accurately captures the coordination process among agents without loss of critical signals or introduction of new coordination artifacts.

What would settle it

Run identical coding tasks with multiple agents both with and without the grite log and check whether the measured rate of overlapping or conflicting work stays the same.

read the original abstract

Autonomous coding agents now open millions of pull requests, yet large-scale studies find their PRs are produced faster but accepted less often - a coordination and trust gap that pull-request-level telemetry cannot explain. We argue the missing signal lives before the PR, in how concurrent agents claim, divide, and collide over shared work. We study this process through grite, our open-source coordination substrate that needs no central server and stores its records inside git itself, so its append-only, signed event log captures the coordination process directly. We show that (i) this shared substrate reduces duplicate and conflicting work at bounded overhead - the share of work that merely re-does a teammate's task falls from 78% to 0% while useful throughput more than triples; (ii) every agent's copy of the log converges to the same state with no write silently dropped, where a file-based tracker loses concurrent writes; and (iii) the log is a mineable artefact from which concrete failure modes - conflicting edits, lock starvation, redundant rediscovery, race-to-close - are automatically recoverable with provenance, several invisible in pull-request history. We release the dataset, harness, and mining toolkit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Grite puts a signed append-only log inside git for multi-agent coding coordination and claims big drops in duplicate work, but the abstract gives no experimental details to back the numbers.

read the letter

The paper's main offering is grite, a coordination substrate that stores signed events directly in git so agents can claim and divide work without a central server. The log is then mined for failure modes like conflicting edits or lock starvation that do not show up in PR history.

What is new is the shift to pre-PR telemetry with concrete metrics: duplicate work falling from 78% to 0%, useful throughput more than tripling, and reliable log convergence across agents. Releasing the dataset, harness, and mining toolkit is also a plus, as it lets others check the extracted failure modes.

The soft spot is the complete absence of experimental setup in the abstract. There is no description of how duplicate work was classified, what the agent behaviors were, how many runs were done, or any statistical checks. All three headline results depend on the log being an exhaustive, non-reactive record of coordination. If agents use unlogged channels or the act of emitting signed events changes their behavior, the baseline comparison and the reported gains become hard to interpret. The stress-test concern about capture accuracy lands directly on the claims as presented.

This is aimed at researchers working on multi-agent AI coding systems who want a practical way to observe and reduce coordination waste. A reader focused on that subfield would get value from the system design and the released artifacts even before the numbers are fully verified.

The paper deserves a serious referee because the idea is concrete, the claims are specific enough to test, and the released materials make follow-up possible. I would send it to review and ask for the methods and validation details in the first round.

Referee Report

3 major / 1 minor

Summary. The paper introduces grite, an open-source coordination substrate for autonomous coding agents that uses a git-stored append-only signed event log to directly capture multi-agent coordination processes before pull requests. It reports three key findings: the shared substrate reduces duplicate work from 78% to 0% while more than tripling useful throughput at bounded overhead; the log converges across agents with no silent drops (unlike file-based trackers); and the log serves as a mineable artifact for recovering specific failure modes such as conflicting edits, lock starvation, redundant rediscovery, and race-to-close, some of which are not visible in pull-request history. The authors release the dataset, harness, and mining toolkit.

Significance. If the experimental results hold under scrutiny, this work has the potential to be significant in the field of software engineering, particularly in multi-agent systems and autonomous code generation. It provides a transparent mechanism for coordination that addresses the trust gap in PR acceptance rates. The release of data and tools supports reproducibility. The approach offers a way to mine coordination artifacts that could lead to better agent designs. The absence of free parameters in the core mechanism is a strength.

major comments (3)

[Abstract] Abstract: The specific quantitative results (78% to 0% duplicate work reduction, >3× throughput) are stated without any accompanying details on the experimental setup, how duplicate work is measured, the definition of agent behaviors, or statistical methods used. This undermines the ability to evaluate the support for the central claims.
[Abstract] Abstract: The three main results all rest on the assumption that the grite log fully and accurately captures the coordination process without loss of critical signals or introduction of new artifacts due to the logging requirement. No independent validation (e.g., via external monitoring or comparison with unlogged runs) is described to confirm exhaustiveness or non-reactivity of the log. This is load-bearing for claims (i), (ii), and (iii).
[Abstract] Abstract: The convergence property is contrasted with a 'file-based tracker' that loses concurrent writes, but no details are given on the implementation of this baseline or the conditions under which the comparison was made.

minor comments (1)

[Abstract] The abstract introduces the term 'grite' without expansion, which may affect initial readability for readers new to the system.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We agree that additional context would improve evaluability and will revise the abstract accordingly while preserving conciseness. We address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: The specific quantitative results (78% to 0% duplicate work reduction, >3× throughput) are stated without any accompanying details on the experimental setup, how duplicate work is measured, the definition of agent behaviors, or statistical methods used. This undermines the ability to evaluate the support for the central claims.

Authors: We agree that the abstract would benefit from brief context. In the revised version we will append a clause summarizing the setup: experiments used 4-8 autonomous agents on shared git repositories, with duplicate work defined as redundant task execution identified via log provenance, and throughput measured as useful commits per unit time. Full details on agent behaviors, measurement, and statistical methods (including run counts and variability) appear in Section 4.2-4.3. revision: yes
Referee: [Abstract] Abstract: The three main results all rest on the assumption that the grite log fully and accurately captures the coordination process without loss of critical signals or introduction of new artifacts due to the logging requirement. No independent validation (e.g., via external monitoring or comparison with unlogged runs) is described to confirm exhaustiveness or non-reactivity of the log. This is load-bearing for claims (i), (ii), and (iii).

Authors: The manuscript grounds exhaustiveness in the design (Section 3): agents must use the signed append-only log for all coordination, so events are captured by construction. However, we acknowledge that no independent validation via external monitoring or unlogged-run comparisons is described. We will revise the abstract to state this design assumption explicitly and note it as a point for future confirmation. revision: yes
Referee: [Abstract] Abstract: The convergence property is contrasted with a 'file-based tracker' that loses concurrent writes, but no details are given on the implementation of this baseline or the conditions under which the comparison was made.

Authors: The file-based tracker baseline is implemented in Section 4.1 as a shared directory using lock files for coordination; concurrent writes are lost due to non-atomic file operations. The comparison used identical agent workloads, repository states, and concurrency levels for both conditions. We will add a short parenthetical reference in the abstract directing readers to Section 4.1. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on direct system measurements

full rationale

The paper introduces grite as a git-based coordination substrate and reports three empirical outcomes (duplicate-work reduction, log convergence, and extractable failure modes) measured from the append-only signed event log it produces. No equations, fitted parameters, or derivations appear in the provided text. No self-citations are invoked to justify core premises, and no result is shown to reduce by construction to its own inputs or definitions. The central claims are observational consequences of running the described system rather than self-referential or statistically forced. The capture assumption is an empirical premise open to external validation, not a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Central claims rest on the assumption that git can reliably host an append-only signed coordination log and that the experimental reductions are attributable to this substrate.

axioms (1)

domain assumption Git repositories support append-only signed event logs without a central server
The system design depends on git's distributed and commit-based properties for coordination.

invented entities (1)

grite no independent evidence
purpose: Decentralized coordination substrate for multi-agent coding
New system introduced to capture pre-PR coordination.

pith-pipeline@v0.9.1-grok · 5729 in / 1379 out tokens · 34023 ms · 2026-06-26T19:43:14.339657+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 5 canonical work pages

[1]

Anthropic: Model context protocol.https://modelcontextprotocol.io(2024), open standard for connecting AI assistants to data/tools; JSON-RPC 2.0

2024
[2]

In: 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI)

Burrows, M.: The Chubby lock service for loosely-coupled distributed systems. In: 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI). pp. 335–350 (2006)

2006
[3]

The state of the ML-universe: 10 years of artificial intelligence & machine learning soft- ware development on GitHub,

Dey, T., Mousavi, S., Ponce, E., Fry, T., Vasilescu, B., Filippova, A., Mockus, A.: Detecting and characterizing bots that commit code. In: Proceedings of the 17th International Conference on Mining Software Repositories (MSR). pp. 209–219 (2020). https://doi.org/10.1145/3379597.3387478, arXiv:2003.03172

work page doi:10.1145/3379597.3387478 2020
[4]

In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data

Ellis, C.A., Gibbs, S.J.: Concurrency control in groupware systems. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data. pp. 399–407 (1989). https://doi.org/10.1145/67544.66963

work page doi:10.1145/67544.66963 1989
[5]

Hipp, D.R.: Fossil SCM: The ticket system.https://fossil-scm.org/home/doc/ tip/www/tickets.wiki(2024)

2024
[6]

Jimenez, C.E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., Narasimhan, K.: SWE-bench: Can language models resolve real-world GitHub issues? In: Interna- tional Conference on Learning Representations (ICLR) (2024)

2024
[7]

In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software

Kleppmann, M., Wiggins, A., van Hardenberg, P., McGranaghan, M.: Local- first software: You own your data, in spite of the cloud. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!). pp. 154–178 (2019). https://doi.org/10.1145/3359591.3359737

work page doi:10.1145/3359591.3359737 2019
[8]

In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.t., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

2020
[9]

Replication package: SAILResearch/AI Teammates in SE3

Li, H., Zhang, H., Hassan, A.E.: The rise of AI teammates in software engineer- ing (SE) 3.0: How autonomous coding agents are reshaping software engineering (2025), introduces the AIDev dataset (456K+ agent pull requests). Replication package: SAILResearch/AI Teammates in SE3

2025
[10]

Mur´ e, M.: git-bug: Distributed, offline-first bug tracker embedded in git.https: //github.com/git-bug/git-bug(2024)

2024
[11]

In: Stabilization, Safety, and Security of Distributed Systems (SSS)

Shapiro, M., Pregui¸ ca, N., Baquero, C., Zawirski, M.: Conflict-free replicated data types. In: Stabilization, Safety, and Security of Distributed Systems (SSS). LNCS, vol. 6976, pp. 386–400. Springer (2011). https://doi.org/10.1007/978-3-642-24550- 3˙29

work page doi:10.1007/978-3-642-24550- 2011
[12]

Springer, 2 edn

Wohlin, C., Runeson, P., H¨ ost, M., Ohlsson, M.C., Regnell, B., Wessl´ en, A.: Experimentation in Software Engineering. Springer, 2 edn. (2012). https://doi.org/10.1007/978-3-642-29044-2

work page doi:10.1007/978-3-642-29044-2 2012
[13]

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A.H., White, R.W., Burger, D., Wang, C.: AutoGen: Enabling next-gen LLM applications via multi-agent conversation (2023)

2023
[14]

In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

Yang, J., Jimenez, C.E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., Press, O.: SWE-agent: Agent-computer interfaces enable automated software engineering. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

2024
[15]

Yegge, S.: Beads: A coding agent memory system.https://github.com/ steveyegge/beads(2025), git-backed, dependency-graph issue tracker for coding agents; documented JSONL concurrency considerations

2025

[1] [1]

Anthropic: Model context protocol.https://modelcontextprotocol.io(2024), open standard for connecting AI assistants to data/tools; JSON-RPC 2.0

2024

[2] [2]

In: 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI)

Burrows, M.: The Chubby lock service for loosely-coupled distributed systems. In: 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI). pp. 335–350 (2006)

2006

[3] [3]

The state of the ML-universe: 10 years of artificial intelligence & machine learning soft- ware development on GitHub,

Dey, T., Mousavi, S., Ponce, E., Fry, T., Vasilescu, B., Filippova, A., Mockus, A.: Detecting and characterizing bots that commit code. In: Proceedings of the 17th International Conference on Mining Software Repositories (MSR). pp. 209–219 (2020). https://doi.org/10.1145/3379597.3387478, arXiv:2003.03172

work page doi:10.1145/3379597.3387478 2020

[4] [4]

In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data

Ellis, C.A., Gibbs, S.J.: Concurrency control in groupware systems. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data. pp. 399–407 (1989). https://doi.org/10.1145/67544.66963

work page doi:10.1145/67544.66963 1989

[5] [5]

Hipp, D.R.: Fossil SCM: The ticket system.https://fossil-scm.org/home/doc/ tip/www/tickets.wiki(2024)

2024

[6] [6]

Jimenez, C.E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., Narasimhan, K.: SWE-bench: Can language models resolve real-world GitHub issues? In: Interna- tional Conference on Learning Representations (ICLR) (2024)

2024

[7] [7]

In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software

Kleppmann, M., Wiggins, A., van Hardenberg, P., McGranaghan, M.: Local- first software: You own your data, in spite of the cloud. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!). pp. 154–178 (2019). https://doi.org/10.1145/3359591.3359737

work page doi:10.1145/3359591.3359737 2019

[8] [8]

In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.t., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

2020

[9] [9]

Replication package: SAILResearch/AI Teammates in SE3

Li, H., Zhang, H., Hassan, A.E.: The rise of AI teammates in software engineer- ing (SE) 3.0: How autonomous coding agents are reshaping software engineering (2025), introduces the AIDev dataset (456K+ agent pull requests). Replication package: SAILResearch/AI Teammates in SE3

2025

[10] [10]

Mur´ e, M.: git-bug: Distributed, offline-first bug tracker embedded in git.https: //github.com/git-bug/git-bug(2024)

2024

[11] [11]

In: Stabilization, Safety, and Security of Distributed Systems (SSS)

Shapiro, M., Pregui¸ ca, N., Baquero, C., Zawirski, M.: Conflict-free replicated data types. In: Stabilization, Safety, and Security of Distributed Systems (SSS). LNCS, vol. 6976, pp. 386–400. Springer (2011). https://doi.org/10.1007/978-3-642-24550- 3˙29

work page doi:10.1007/978-3-642-24550- 2011

[12] [12]

Springer, 2 edn

Wohlin, C., Runeson, P., H¨ ost, M., Ohlsson, M.C., Regnell, B., Wessl´ en, A.: Experimentation in Software Engineering. Springer, 2 edn. (2012). https://doi.org/10.1007/978-3-642-29044-2

work page doi:10.1007/978-3-642-29044-2 2012

[13] [13]

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A.H., White, R.W., Burger, D., Wang, C.: AutoGen: Enabling next-gen LLM applications via multi-agent conversation (2023)

2023

[14] [14]

In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

Yang, J., Jimenez, C.E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., Press, O.: SWE-agent: Agent-computer interfaces enable automated software engineering. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)

2024

[15] [15]

Yegge, S.: Beads: A coding agent memory system.https://github.com/ steveyegge/beads(2025), git-backed, dependency-graph issue tracker for coding agents; documented JSONL concurrency considerations

2025