ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
Proactiveeval: A unified evaluation framework for proactive dialogue agents
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A temporal-graph model on structured event streams replaces per-event LLM calls for trigger decisions in proactive agents, reporting mean F1 gains of 16.7 and 4-83x speedups.
citing papers explorer
-
ProactBench: Beyond What The User Asked For
ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
-
Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?
A temporal-graph model on structured event streams replaces per-event LLM calls for trigger decisions in proactive agents, reporting mean F1 gains of 16.7 and 4-83x speedups.