Large-scale terminal agentic trajectory generation from dockerized environments.CoRR, abs/2602.01244

Siwei Wu, Yizhi Li, Yuyang Song, Wei Zhang, Yang Wang, Riza Batista-Navarro, Xian Yang, Mingjie Tang, Bryan Dai, Jian Yang, Chenghua Lin · 2026 · arXiv 2602.01244

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

cs.AI · 2026-05-21 · conditional · novelty 8.0

TerminalWorld builds a scalable benchmark of 1,530 real terminal tasks from recordings and finds frontier models and agents reach at most 62.5% pass rate with only weak correlation to prior expert-curated sets.

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

LiteCoder-Terminal-Gen creates synthetic terminal datasets that, after SFT and DMPO on Qwen models, yield 29.06%, 18.54%, and 34.00% pass@1 on Terminal Bench 1.0, 2.0, and Pro.

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

HiRAS introduces hierarchical multi-agent coordination for paper-to-code generation and experiment reproduction, claiming over 10% relative gains over prior state-of-the-art on a refined benchmark with reduced hallucination.

citing papers explorer

Showing 3 of 3 citing papers.

TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks cs.AI · 2026-05-21 · conditional · none · ref 11
TerminalWorld builds a scalable benchmark of 1,530 real terminal tasks from recordings and finds frontier models and agents reach at most 62.5% pass rate with only weak correlation to prior expert-curated sets.
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents cs.CL · 2026-05-28 · unverdicted · none · ref 14
LiteCoder-Terminal-Gen creates synthetic terminal datasets that, after SFT and DMPO on Qwen models, yield 29.06%, 18.54%, and 34.00% pass@1 on Terminal Bench 1.0, 2.0, and Pro.
HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution cs.CL · 2026-04-20 · unverdicted · none · ref 6
HiRAS introduces hierarchical multi-agent coordination for paper-to-code generation and experiment reproduction, claiming over 10% relative gains over prior state-of-the-art on a refined benchmark with reduced hallucination.

Large-scale terminal agentic trajectory generation from dockerized environments.CoRR, abs/2602.01244

fields

years

verdicts

representative citing papers

citing papers explorer