π-Bench is a new benchmark for evaluating proactive personal assistant agents on 100 multi-turn tasks that include hidden intents, inter-task dependencies, and cross-session continuity.
PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Proactivity is a core expectation for AGI. Prior work remains largely confined to laboratory settings, leaving a clear gap in real-world proactive agent: depth, complexity, ambiguity, precision and real-time constraints. We study this setting, where useful intervention requires inferring latent needs from ongoing context and grounding actions in evolving user memory under latency and long-horizon constraints. We first propose DD-MM-PAS (Demand Detection, Memory Modeling, Proactive Agent System) as a general paradigm for streaming proactive AI agent. We instantiate this paradigm in Pask, with streaming IntentFlow model for DD, a hybrid memory (workspace, user, global) for long-term MM, PAS infra framework and introduce how these components form a closed loop. We also introduce LatentNeeds-Bench, a real-world benchmark built from user-consented data and refined through thousands of rounds of human editing. Experiments show that IntentFlow matches leading Gemini3-Flash models under latency constraints, while identifying deeper user intent.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A temporal-graph model on structured event streams replaces per-event LLM calls for trigger decisions in proactive agents, reporting mean F1 gains of 16.7 and 4-83x speedups.
PRPF uses a lightweight Multimodal Proactive Perceptor for intervention gating and context compression, activating the Proactive Agent Reasoner only when needed, reducing false trigger rates and improving efficiency on the ProactiveMobile benchmark.
citing papers explorer
-
$\pi$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows
π-Bench is a new benchmark for evaluating proactive personal assistant agents on 100 multi-turn tasks that include hidden intents, inter-task dependencies, and cross-session continuity.
-
Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?
A temporal-graph model on structured event streams replaces per-event LLM calls for trigger decisions in proactive agents, reporting mean F1 gains of 16.7 and 4-83x speedups.
-
Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents
PRPF uses a lightweight Multimodal Proactive Perceptor for intervention gating and context compression, activating the Proactive Agent Reasoner only when needed, reducing false trigger rates and improving efficiency on the ProactiveMobile benchmark.