hub

Sophiavl-r1: Reinforcing mllms reasoning with thinking reward

Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue · 2025 · arXiv 2505.17018

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

From Web to Pixels: Bringing Agentic Search into Visual Perception

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

WebEye benchmark and Pixel-Searcher agent enable visual perception tasks by using web search to resolve object identities before precise localization or answering.

Gen-Searcher: Reinforcing Agentic Search for Image Generation

cs.CV · 2026-03-30 · unverdicted · novelty 7.0 · 2 refs

Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

cs.CV · 2026-01-30 · unverdicted · novelty 7.0

CamReasoner uses structured O-T-A reasoning and RL on 56k samples to lift camera movement classification from 73.8% to 78.4% and VQA from 60.9% to 74.5% on Qwen2.5-VL-7B.

Video-R1: Reinforcing Video Reasoning in MLLMs

cs.CV · 2025-03-27 · conditional · novelty 7.0

Video-R1 uses temporal-aware RL and mixed datasets to boost video reasoning in MLLMs, with a 7B model reaching 37.1% on VSI-Bench and surpassing GPT-4o.

PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

PDCR improves vision-language reasoning by computing separate normalized confidence advantages for perception steps and reasoning steps after unsupervised decomposition.

GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking

cs.CV · 2026-02-19 · unverdicted · novelty 6.0

GraphThinker reduces temporal hallucinations in video reasoning by constructing event-based scene graphs and applying visual attention rewards in reinforcement finetuning.

AdaTooler-V: Adaptive Tool-Use for Images and Videos

cs.CV · 2025-12-18 · conditional · novelty 6.0

AdaTooler-V trains MLLMs to adaptively use vision tools via AT-GRPO reinforcement learning and new datasets, reaching 89.8% on V* and outperforming GPT-4o.

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under optimization pressure.

OneThinker: All-in-one Reasoning Model for Image and Video

cs.CV · 2025-12-02 · unverdicted · novelty 5.0

OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.

Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

cs.CL · 2026-05-19

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12

citing papers explorer

Showing 3 of 3 citing papers after filters.

PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning cs.CL · 2026-05-13 · unverdicted · none · ref 7
PDCR improves vision-language reasoning by computing separate normalized confidence advantages for perception steps and reasoning steps after unsupervised decomposition.
Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning cs.CL · 2026-05-19 · unreviewed · ref 76
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unreviewed · ref 71

Sophiavl-r1: Reinforcing mllms reasoning with thinking reward

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer