Sci-PRM is a tool-aware process reward model trained on the SCIPRM70K dataset to provide fine-grained supervision for scientific reasoning and shown to boost foundation models via Best-of-N selection and RL.
Atlas: A high-difficulty, multidisciplinary benchmark for frontier scientific reasoning.arXiv preprint arXiv:2511.14366, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
ResearchClawBench is a new benchmark that evaluates autonomous AI research agents on 40 tasks grounded in published papers using expert rubrics, finding that top systems score only 20-26 out of 100.
citing papers explorer
-
SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification
Sci-PRM is a tool-aware process reward model trained on the SCIPRM70K dataset to provide fine-grained supervision for scientific reasoning and shown to boost foundation models via Best-of-N selection and RL.
-
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
ResearchClawBench is a new benchmark that evaluates autonomous AI research agents on 40 tasks grounded in published papers using expert rubrics, finding that top systems score only 20-26 out of 100.