arXiv preprint arXiv:2502.13383 , year=

MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification , author= · 2025 · arXiv 2502.13383

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2 dataset 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

TraceAV-Bench: Benchmarking Multi-Hop Trajectory Reasoning over Long Audio-Visual Videos

cs.CV · 2026-05-08 · unverdicted · novelty 8.0

TraceAV-Bench is the first benchmark for multi-hop trajectory reasoning over long audio-visual videos, showing top models reach only 51-68% accuracy with substantial room for improvement.

Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

VeGAS improves MLLM-based embodied agents by sampling action ensembles and using a verifier trained on LLM-synthesized failure cases, yielding up to 36% relative gains on hard multi-object long-horizon tasks in Habitat and ALFRED.

OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving

cs.CV · 2025-12-16 · unverdicted · novelty 6.0

OmniDrive-R1 boosts VLM reasoning score from 51.77% to 80.35% and answer accuracy from 37.81% to 73.62% on DriveLMM-o1 via reinforcement-driven interleaved multi-modal chain-of-thought with annotation-free grounding.

Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework

cs.CV · 2025-09-27 · unverdicted · novelty 6.0

DRP decouples reasoning from perception in LMMs by using an LLM reasoner to query an LMM observer for visual details as needed, reducing visual grounding loss.

A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

A Nash equilibrium framework for training-free multimodal step verification that uses cross-modal agreement and disagreement signals for filtering and ranking reasoning steps.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

cs.CV · 2026-04-06

citing papers explorer

Showing 7 of 7 citing papers.

TraceAV-Bench: Benchmarking Multi-Hop Trajectory Reasoning over Long Audio-Visual Videos cs.CV · 2026-05-08 · unverdicted · none · ref 74
TraceAV-Bench is the first benchmark for multi-hop trajectory reasoning over long audio-visual videos, showing top models reach only 51-68% accuracy with substantial room for improvement.
Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents cs.AI · 2026-05-12 · unverdicted · none · ref 41
VeGAS improves MLLM-based embodied agents by sampling action ensembles and using a verifier trained on LLM-synthesized failure cases, yielding up to 36% relative gains on hard multi-object long-horizon tasks in Habitat and ALFRED.
OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving cs.CV · 2025-12-16 · unverdicted · none · ref 37
OmniDrive-R1 boosts VLM reasoning score from 51.77% to 80.35% and answer accuracy from 37.81% to 73.62% on DriveLMM-o1 via reinforcement-driven interleaved multi-modal chain-of-thought with annotation-free grounding.
Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework cs.CV · 2025-09-27 · unverdicted · none · ref 32
DRP decouples reasoning from perception in LMMs by using an LLM reasoner to query an LMM observer for visual details as needed, reducing visual grounding loss.
A Nash Equilibrium Framework For Training-Free Multimodal Step Verification cs.CV · 2026-05-19 · unverdicted · none · ref 33
A Nash equilibrium framework for training-free multimodal step verification that uses cross-modal agreement and disagreement signals for filtering and ranking reasoning steps.
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey cs.CV · 2025-03-16 · unverdicted · none · ref 257
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models cs.CV · 2026-04-06 · unreviewed · ref 109

arXiv preprint arXiv:2502.13383 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer