RAC adds ranking-aware group loss and clean-corrupted pairwise loss to RL post-training to boost both accuracy and calibration in multimodal reasoning without extra annotations.
Vision-r1: Incentivizing reasoning capability in multimodal large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning
RAC adds ranking-aware group loss and clean-corrupted pairwise loss to RL post-training to boost both accuracy and calibration in multimodal reasoning without extra annotations.
- Flow-OPD: On-Policy Distillation for Flow Matching Models