Pooled top-1 accuracy rankings in RCA benchmarks do not reliably identify per-subsystem winners, as pairwise comparisons across 11 subsystems show effects of both signs and leave-one-system-out selection incurs regret up to 24.8 pp.
arXiv preprint arXiv:2502.05352 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
LLMs corrupt an average of 25% of document content during long delegated editing workflows across 52 domains, even frontier models, and agentic tools do not mitigate the issue.
Graph Traversal Agent improves root-cause F1 from 0.6087 to 0.9130 on ITBench snapshots but the gain is benchmark-coupled to cases where the injected fault is already in the evidence graph.
Runtime-structured task decomposition reduces retry costs in agentic coding systems by up to 51.7% versus monolithic prompts by rerunning only failed subtasks on two software engineering workloads.
The central challenge in AI-augmented CI/CD is designing authority transfer from humans to agents under constraints, as current systems remain limited to bounded data-plane autonomy backed by external governance.
The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.
citing papers explorer
No citing papers match the current filters.