Large-scale terminal agentic trajectory generation from dockerized environments.CoRR, abs/2602.01244

Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments , author= · 2026 · arXiv 2602.01244

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

cs.AI · 2026-05-21 · conditional · novelty 8.0

TerminalWorld builds a scalable benchmark of 1,530 real terminal tasks from recordings and finds frontier models and agents reach at most 62.5% pass rate with only weak correlation to prior expert-curated sets.

CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

CLI-Universe synthesizes a verified 6K dataset of terminal-agent tasks that, when used to fine-tune Qwen3-32B, reaches 33.4% on Terminal-Bench 2.0 and sets a new open-source SOTA for models at or below 32B parameters.

Tmax: A simple recipe for terminal agents

cs.CL · 2026-06-22 · unverdicted · novelty 6.0

Tmax is an open RL training recipe for terminal agents that achieves 27% on Terminal-Bench 2.0 with a 9B model via a novel data generation taxonomy combining difficulty control, personas, and verifier diversification.

What Makes Interaction Trajectories Effective for Training Terminal Agents?

cs.AI · 2026-06-02 · unverdicted · novelty 6.0

Trajectories from weaker agents outperform stronger ones for training terminal agents due to environment-grounded supervision that exposes inspect-act-verify behaviors.

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

LiteCoder-Terminal-Gen creates synthetic terminal datasets that, after SFT and DMPO on Qwen models, yield 29.06%, 18.54%, and 34.00% pass@1 on Terminal Bench 1.0, 2.0, and Pro.

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

HiRAS introduces hierarchical multi-agent coordination for paper-to-code generation and experiment reproduction, claiming over 10% relative gains over prior state-of-the-art on a refined benchmark with reduced hallucination.

citing papers explorer

Showing 2 of 2 citing papers after filters.

CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents cs.AI · 2026-06-22 · unverdicted · none · ref 3
CLI-Universe synthesizes a verified 6K dataset of terminal-agent tasks that, when used to fine-tune Qwen3-32B, reaches 33.4% on Terminal-Bench 2.0 and sets a new open-source SOTA for models at or below 32B parameters.
What Makes Interaction Trajectories Effective for Training Terminal Agents? cs.AI · 2026-06-02 · unverdicted · none · ref 25
Trajectories from weaker agents outperform stronger ones for training terminal agents due to environment-grounded supervision that exposes inspect-act-verify behaviors.

Large-scale terminal agentic trajectory generation from dockerized environments.CoRR, abs/2602.01244

fields

years

verdicts

representative citing papers

citing papers explorer