SAGE with MHFA improves failure recovery in autonomous research agents, raising metrics-bearing outputs from 42% to 92% on a 12-topic benchmark versus single-reflection baselines.
Data Interpreter: An LLM Agent for Data Science
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Concurrent human-agent interactions occur in 31.8% of turns and follow five action patterns explained by six triggers and four enabling factors, enabled by a context-aware design probe called CLEO.
Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.
citing papers explorer
-
One Reflection Is Not Enough: Self-Correcting Autonomous Research via Multi-Hypothesis Failure Attribution
SAGE with MHFA improves failure recovery in autonomous research agents, raising metrics-bearing outputs from 42% to 92% on a 12-topic benchmark versus single-reflection baselines.
-
"When to Hand Off, When to Work Together": Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction
Concurrent human-agent interactions occur in 31.8% of turns and follow five action patterns explained by six triggers and four enabling factors, enabled by a context-aware design probe called CLEO.
-
Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media
Presents a new question-based evaluation framework for LLMs on aggregated social media text and reports that performance declines with input scale, task complexity, and numerical operations beyond 500 instances.