2509.17158 , archivePrefix=

Pierre Andrews, Amine Benhalloum, Gerard Moreno-Torres Bertran, Matteo Bettini, Amar Budhiraja, Ricardo Silveira Cabral · 2026 · arXiv 2509.17158

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

cs.AI · 2026-05-15 · unverdicted · novelty 6.0

Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.

Agentic Coding Needs Proactivity, Not Just Autonomy

cs.SE · 2026-05-07 · conditional · novelty 6.0

Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

cs.AI · 2026-04-20 · conditional · novelty 6.0

Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.

Scalable Environments Drive Generalizable Agents

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.

Interactive Evaluation Requires a Design Science

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

Interactive evaluation of AI must be reframed as a distinct paradigm that maps interaction trajectories to judgments on process, recoverability, coordination, robustness, and system performance, supported by a two-axis taxonomy and design principles.

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

cs.CL · 2026-01-09 · unverdicted · novelty 5.0

EnvScaler synthesizes 191 environments and 7K scenarios to improve LLM agents' multi-tool interaction performance via SFT and RL on Qwen3 models.

TrafficClaw: A Generalizable LLM Agent in the Unified Physical Environment for Urban Traffic Control

cs.AI · 2026-04-19

citing papers explorer

Showing 8 of 8 citing papers.

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design cs.AI · 2026-05-15 · unverdicted · none · ref 29
Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
Agentic Coding Needs Proactivity, Not Just Autonomy cs.SE · 2026-05-07 · conditional · none · ref 12
Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence cs.AI · 2026-04-20 · unverdicted · none · ref 3
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks cs.AI · 2026-04-20 · conditional · none · ref 4
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
Scalable Environments Drive Generalizable Agents cs.AI · 2026-05-18 · unverdicted · none · ref 8
Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.
Interactive Evaluation Requires a Design Science cs.AI · 2026-05-18 · unverdicted · none · ref 14
Interactive evaluation of AI must be reframed as a distinct paradigm that maps interaction trajectories to judgments on process, recoverability, coordination, robustness, and system performance, supported by a two-axis taxonomy and design principles.
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis cs.CL · 2026-01-09 · unverdicted · none · ref 1
EnvScaler synthesizes 191 environments and 7K scenarios to improve LLM agents' multi-tool interaction performance via SFT and RL on Qwen3 models.
TrafficClaw: A Generalizable LLM Agent in the Unified Physical Environment for Urban Traffic Control cs.AI · 2026-04-19 · unreviewed · ref 11

2509.17158 , archivePrefix=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer