TaxoBench shows deep research agents retrieve only 20.92% of expert-cited papers and generate taxonomies with 75.9% sibling overlap plus other structural flaws, while LLMs reach only 28-29% semantic path similarity versus 47-58% for humans.
semantic_coverage
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies
TaxoBench shows deep research agents retrieve only 20.92% of expert-cited papers and generate taxonomies with 75.9% sibling overlap plus other structural flaws, while LLMs reach only 28-29% semantic path similarity versus 47-58% for humans.