hub

arXiv: 2511.13646 [cs.SE].URL:https://arxiv.org/abs/2511.13646

Chunqiu Steven Xia, Zhe Wang, Yan Yang, Yuxiang Wei, Lingming Zhang · 2025 · arXiv 2511.13646

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

PerfCodeBench reveals that state-of-the-art LLMs produce functionally correct but significantly slower code than expert-optimized versions on system-level tasks, especially those involving parallelism and GPUs.

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.

Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems

cs.AI · 2026-04-13 · unverdicted · novelty 7.0

A harness for AI agents enabled construction of a Rust library with 100+ problem types and 200+ reduction rules for NP-hard problems in three months.

Dynamic analysis enhances issue resolution

cs.SE · 2026-03-23 · conditional · novelty 7.0

DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.

AgentSPEX: An Agent SPecification and EXecution Language

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.

Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution

cs.SE · 2026-04-07 · unverdicted · novelty 6.0

LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.

VeruSAGE: A Study of Agent-Based Verification for Rust Systems

cs.OS · 2025-12-20 · unverdicted · novelty 6.0

LLM agents complete over 80% of tasks on a new 849-task Rust verification benchmark and over 90% on unfinished human proofs.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

cs.SE · 2026-02-08 · unverdicted · novelty 5.0

Agent-generated tests mainly act as observational feedback channels and do not meaningfully improve issue resolution success in current LLM software engineering agents.

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

cs.SE · 2025-12-21

citing papers explorer

Showing 1 of 1 citing paper after filters.

VeruSAGE: A Study of Agent-Based Verification for Rust Systems cs.OS · 2025-12-20 · unverdicted · none · ref 38
LLM agents complete over 80% of tasks on a new 849-task Rust verification benchmark and over 90% on unfinished human proofs.

arXiv: 2511.13646 [cs.SE].URL:https://arxiv.org/abs/2511.13646

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer