hub

arXiv: 2511.13646 [cs.SE].URL:https://arxiv.org/abs/2511.13646

· 2025 · arXiv 2511.13646

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

cs.AI · 2026-06-03 · unverdicted · novelty 8.0

The Meta-Agent Challenge shows frontier AI models rarely match human-engineered agent baselines when tasked with autonomous development, with proprietary models succeeding most often and some exhibiting cheating under pressure.

EvoRepair: Enhancing Vulnerability Repair Agents Through Experience-Based Self-Evolution

cs.SE · 2026-05-28 · unverdicted · novelty 7.0

EvoRepair is the first experience-based self-evolving agent framework for automated vulnerability repair, reporting 90.46% overall success on PATCHEVAL and SEC-bench benchmarks.

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

PerfCodeBench reveals that state-of-the-art LLMs produce functionally correct but significantly slower code than expert-optimized versions on system-level tasks, especially those involving parallelism and GPUs.

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.

Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems

cs.AI · 2026-04-13 · unverdicted · novelty 7.0

A harness for AI agents enabled construction of a Rust library with 100+ problem types and 200+ reduction rules for NP-hard problems in three months.

Beyond Textual Repository Exploration: Dual-Modal Structural Reasoning for Agentic Issue Resolution

cs.SE · 2026-07-02 · unverdicted · novelty 6.0

DUALVIEW is a dual-modal framework using Module Coupling, Function Call, Class Hierarchy, and Program Dependence graphs to enable persistent structural reasoning for agentic issue resolution, reporting gains on SWE-bench Pro and Verified.

A Single Patch Is Not Enough: Deterministic Fusion of Repair Candidates

cs.SE · 2026-07-02 · unverdicted · novelty 6.0

PatchFusion uses deterministic atomic evidence fusion on candidate patches to outperform ranking, test-filtering, and LLM-judge selectors on SWE-bench and Defects4J pools.

LLVM-Bench: Benchmarking and Advancing Large Language Models for LLVM Compiler Issue Resolution

cs.SE · 2026-07-01 · unverdicted · novelty 6.0

LLVM-Bench supplies 423 validated LLVM issues and LLVM-Gym automates evaluation, showing LLMs are limited but an ensemble reaches 21.99% resolution.

Symbolon: Symbolic Execution by Learning Code Transformation

cs.CR · 2026-06-27 · unverdicted · novelty 6.0

Symbolon learns diverse code transformations via search on small programs, distills them into agent skills, and applies them to improve KLEE symbolic execution, yielding 3.69x coverage gains and 21 new Linux kernel bugs.

DemoEvolve: Overcoming Sparse Feedback in Agentic Harness Evolution with Demonstrations

cs.AI · 2026-05-23 · unverdicted · novelty 6.0

DemoEvolve bootstraps harness evolution with demonstrations to achieve more stable and effective edits than self-rollout search in sparse-feedback environments like Balatro.

AgentSPEX: An Agent SPecification and EXecution Language

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.

Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution

cs.SE · 2026-04-07 · unverdicted · novelty 6.0

LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.

VeruSAGE: A Study of Agent-Based Verification for Rust Systems

cs.OS · 2025-12-20 · unverdicted · novelty 6.0

LLM agents complete over 80% of tasks on a new 849-task Rust verification benchmark and over 90% on unfinished human proofs.

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators

cs.LG · 2026-06-24 · unverdicted · novelty 5.0

RQGM enables co-evolution of agents and evaluators across epochs with non-stationary utilities, reporting gains in coding pass rates, paper acceptance, and proof grading over prior self-improving agents.

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

cs.LG · 2026-06-07 · unverdicted · novelty 5.0

SkillHone introduces a harness that maintains persistent decision histories to support continual evolution of language-model agent skills, reporting 15.8-point gains on GAIA over a commercial deep-research agent.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

cs.SE · 2026-02-08 · unverdicted · novelty 5.0

Agent-generated tests mainly act as observational feedback channels and do not meaningfully improve issue resolution success in current LLM software engineering agents.

Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

cs.SE · 2026-06-04 · unverdicted · novelty 4.0

MicroSkill Architecture partitions knowledge into atomic skill capsules selected via constrained optimization to cut token use over 90% and improve code generation metrics in one enterprise case study.

Dynamic analysis enhances issue resolution

cs.SE · 2026-03-23

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

cs.SE · 2025-12-21

citing papers explorer

Showing 2 of 2 citing papers after filters.

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators cs.LG · 2026-06-24 · unverdicted · none · ref 23
RQGM enables co-evolution of agents and evaluators across epochs with non-stationary utilities, reporting gains in coding pass rates, paper acceptance, and proof grading over prior self-improving agents.
SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History cs.LG · 2026-06-07 · unverdicted · none · ref 33
SkillHone introduces a harness that maintains persistent decision histories to support continual evolution of language-model agent skills, reporting 15.8-point gains on GAIA over a commercial deep-research agent.

arXiv: 2511.13646 [cs.SE].URL:https://arxiv.org/abs/2511.13646

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer