pith. sign in

hub Canonical reference

Star-r1: Spatial transformation reasoning by reinforcing multimodal llms

Canonical reference. 80% of citing Pith papers cite this work as background.

10 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 baseline 1

citation-polarity summary

fields

cs.CV 8 cs.AI 2

years

2026 5 2025 5

representative citing papers

Video-R1: Reinforcing Video Reasoning in MLLMs

cs.CV · 2025-03-27 · conditional · novelty 7.0

Video-R1 uses temporal-aware RL and mixed datasets to boost video reasoning in MLLMs, with a 7B model reaching 37.1% on VSI-Bench and surpassing GPT-4o.

OneThinker: All-in-one Reasoning Model for Image and Video

cs.CV · 2025-12-02 · unverdicted · novelty 5.0

OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.

citing papers explorer

Showing 10 of 10 citing papers.