pith. sign in

Vitabench: Benchmarking llm agents with versatile interactive tasks in real-world applications

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 2 dataset 2

citation-polarity summary

years

2026 7

clear filters

representative citing papers

AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

cs.AI · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

AgentEscapeBench is a benchmark of 270 tasks across five difficulty tiers that measures LLM agents' ability to manage long-range tool dependencies, state tracking, and intermediate result propagation, revealing sharp performance drops with increasing depth.

UserGPT Technical Report

cs.IR · 2026-05-09 · unverdicted · novelty 5.0

UserGPT introduces a generative LLM framework with a behavior simulation engine, semantization module, and DF-GRPO post-training that scores 0.7325 on tag prediction and 0.7528 on summary generation on HPR-Bench while compressing records by up to 97.9%.

citing papers explorer

Showing 1 of 1 citing paper after filters.