GPRO trains a meta-controller on 790k failure-labeled samples to dynamically select fast, perception, or reasoning paths in LVLMs, yielding higher accuracy and shorter responses than prior slow-thinking methods.
hub
Virgo: A preliminary exploration on reproducing o1-like mllm
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
SpaceR uses a new verifiable dataset and map-imagination-augmented RLVR to reach SOTA spatial reasoning accuracy in MLLMs, exceeding GPT-4o on VSI-Bench.
ForeSight lets VLMs use low-level visual cues and mask-based visual feedback within an RL loop to reason more accurately, with the 7B model beating same-scale peers and some closed-source SOTA on a new benchmark.
LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.
OmniDrive-R1 boosts VLM reasoning score from 51.77% to 80.35% and answer accuracy from 37.81% to 73.62% on DriveLMM-o1 via reinforcement-driven interleaved multi-modal chain-of-thought with annotation-free grounding.
DRP decouples reasoning from perception in LMMs by using an LLM reasoner to query an LMM observer for visual details as needed, reducing visual grounding loss.
Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.
LASER uses Visual Grounding Reward and Sink Suppression Reward to preserve visual attention trajectories and suppress sink tokens, reducing visual forgetting in LVLMs.
MoVT unifies different visual reasoning modes in a single model and uses the AdaVaR two-stage framework with supervised cold-start and RL via AdaGRPO to enable context-adaptive mode selection, yielding consistent gains on visual reasoning tasks.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
citing papers explorer
-
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.