PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
ISBN 979-8-89176-251-0
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Exploration-Commitment Decoupling instantiated as Calibration-Aware Generation improves long-form factuality by up to 13% and reduces decoding time by up to 37% on five benchmarks.
citing papers explorer
-
PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
-
Only Say What You Know: Calibration-Aware Generation for Long-Form Factuality
Exploration-Commitment Decoupling instantiated as Calibration-Aware Generation improves long-form factuality by up to 13% and reduces decoding time by up to 37% on five benchmarks.