Derives backward and forward corrections for asymmetric verifier noise that improve RLVR performance on math reasoning tasks.
Let’s verify step by step
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
dataset 1
citation-polarity summary
representative citing papers
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.
citing papers explorer
-
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
Derives backward and forward corrections for asymmetric verifier noise that improve RLVR performance on math reasoning tasks.
-
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding, and QA tasks.
- Hint Tuning: Less Data Makes Better Reasoners