{"total":13,"items":[{"citing_arxiv_id":"2606.28434","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SWE-MeM: Learning Adaptive Memory Management for Long-Horizon Coding Agents","primary_cat":"cs.SE","submitted_at":"2026-06-26T04:55:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SWE-MeM introduces adaptive memory management for coding agents via synthesized trajectories and Memory-aware GRPO, reporting 43.4% and 60.2% resolve rates on SWE-Bench Verified for 4B and 30B models while beating baselines on performance and token use.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.14061","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM Agents Can See Code Repositories","primary_cat":"cs.SE","submitted_at":"2026-06-12T03:14:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Visual graphs of repository structure added to text inputs for multimodal LLM agents reduce token consumption by up to 26% while maintaining or improving issue-resolution accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18747","ref_index":212,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Code as Agent Harness","primary_cat":"cs.CL","submitted_at":"2026-05-18T17:59:03+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"central challenge is no longer only retrieving relevant content, but controlling the granularity of sharing, preventing information flooding, and supporting bidirectional access between high-level decisions and fine- grained execution traces [210]. Accordingly, memory in multi-agent code generation increasingly resembles a shared blackboard or collaborative state graph rather than a purely individual storage unit [212, 213]. 3.2.6. Context Compaction and State Offloading Context compaction and state offloading are cross-cutting context-engineering mechanisms for memory in code-agent harnesses [214]. Their goal is not to define another memory category, but to control the 24 Code as Agent Harness boundary between active model context and durable task state. Long-horizon software engineering workflows"},{"citing_arxiv_id":"2605.09278","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium","primary_cat":"cs.AI","submitted_at":"2026-05-10T03:04:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"agent systems.arXiv preprint arXiv:2410.07283, 2024. [32] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459-9474, 2020. [33] Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, and Qianxiang Wang. Swe-debate: Competitive multi-agent debate for software issue resolution. arXiv preprint arXiv:2507.23348, 2025. [34] Lin Li, Guikun Chen, Hanrong Shi, Jun Xiao, and Long Chen. A survey on multimodal benchmarks: In the era of large ai models."},{"citing_arxiv_id":"2604.25847","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling","primary_cat":"math.OC","submitted_at":"2026-04-28T16:53:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"When a new execution fails with a solver logL′, the debugger queries R(E(sig(L ′)),M bug, N)to retrieve the diagnosis and fix strategy summarized from similar past failures, thereby guiding the code revision. Debate Memory.Debate memory is built from debate runs in which two agent teams started with a substantial disagreement and later converged to a consensus solution: Mdeb ={(concat(x n,∆ n),H deb,n)}.(9) The retrieval key is the concatenation of the problemx n and the initial discrepancy description∆ n (i.e., LLM-summarized conflicts between candidate solutions). The stored valueHdeb,n contains the key arguments exchanged during debate (such as pointing out missing constraints or mis-specified objectives), and the final consensus formulation, optionally accompanied by an LLM-written summary of the decisive"},{"citing_arxiv_id":"2604.19049","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery","primary_cat":"cs.CR","submitted_at":"2026-04-21T03:55:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Refute-or-Promote applies adversarial multi-agent review with kill gates and empirical verification to filter LLM defect candidates, killing 79-83% before disclosure and yielding 4 CVEs plus multiple accepted fixes across libraries, C++ standard, and compilers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"gate is the architectural analogue. The closest adversarial comparator is InfCode [47] (79.4% on SWE-bench Veri- fied via asymmetricpatch/test dual-agent loop within a single model family); Refute-or-Promote differs in ways we hypothesise to be important-asymmetric roles, cross- family reviewers, and context asymmetry as a pillar (see Table 1). SWE-Debate [48] pursues convergent multi- agent debate in the same SWE-bench setting; Refute-or- Promote explicitly forbids convergence-by-debate. Ar- gus [49] likewise targets multi-agent CVE discovery but as acooperativeRAG+ReAct ensemble rather than an adversarial gate. D3 [51] provides theoretical grounding for adversarial role specialisation; Refute-or-Promote re-"},{"citing_arxiv_id":"2604.08089","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair","primary_cat":"cs.SE","submitted_at":"2026-04-09T11:06:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GALA uses hierarchical graph alignment between UI screenshots and code structures to achieve state-of-the-art bug localization in multimodal automated program repair on SWE-bench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04580","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints","primary_cat":"cs.SE","submitted_at":"2026-04-06T10:26:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[23] Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, and Qianxiang Wang. 2025. Swe-debate: Competitive multi-agent debate for software issue resolution.arXiv preprint arXiv:2507.23348(2025). [24] Kefan Li, Yuan Yuan, Hongyue Yu, Tingyu Guo, and Shijie Cao. 2025. CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation.IEEE Transactions on Evolutionary Computation(2025). [25] Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. 2022. Competition-level code generation with alphacode.Science378, 6624 (2022), 1092-1097. [26] Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu"},{"citing_arxiv_id":"2603.22048","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dynamic analysis enhances issue resolution","primary_cat":"cs.SE","submitted_at":"2026-03-23T14:48:54+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.01785","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding","primary_cat":"cs.CL","submitted_at":"2026-02-02T08:10:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"while open-weight models have been officially benchmarked to match their text-only counter- parts [11, 84]. It ensures that our experimental setup does not introduce confounding factors from degraded baseline capability. 3.3 Visual Rendering of Source Code Code Rendering.We render source code into images at a high base resolution of 2240×2240 pixels, following prior work [55]. This resolution is selected for compatibility with modern MLLMs, as it is divisible by common image patch sizes (e.g., 14 and 16 pixels) used in visual encoders [11], ensuring that no partial patches are created during tokenization. By default, we use plain rendering-black monospace text on a white background-which serves as the baseline configuration throughout"},{"citing_arxiv_id":"2601.05110","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts","primary_cat":"cs.AI","submitted_at":"2026-01-08T16:58:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GlimpRouter uses the entropy of the first token in each reasoning step to decide whether to invoke a large model, yielding 10.7% higher accuracy and 25.9% lower latency than a standalone large model on AIME25.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.00376","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"In Line with Context: Repository-Level Code Generation via Context Inlining","primary_cat":"cs.SE","submitted_at":"2026-01-01T15:56:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.14635","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SWE-QA: Can Language Models Answer Repository-level Code Questions?","primary_cat":"cs.CL","submitted_at":"2025-09-18T05:25:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SWE-QA creates a new repository-level code QA benchmark with 576 pairs and an agentic LLM framework, showing promise but open challenges for models handling complex codebases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}