pith. sign in

hub

Visual agentic reinforcement fine-tuning

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

background 3 dataset 1

citation-polarity summary

years

2026 10 2025 2

verdicts

UNVERDICTED 12

representative citing papers

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Maestro uses outcome-based RL to train a lightweight policy that orchestrates ensembles of frozen expert models and skills, reporting 70.1% average accuracy across ten multimodal benchmarks and outperforming GPT-5 and Gemini-2.5-Pro while generalizing to unseen components.

DeepEyesV2: Toward Agentic Multimodal Model

cs.CV · 2025-11-07 · unverdicted · novelty 6.0

DeepEyesV2 uses a two-stage cold-start plus reinforcement learning pipeline to produce an agentic multimodal model that adaptively invokes tools and outperforms direct RL on real-world reasoning benchmarks.

Perceptual Flow Network for Visually Grounded Reasoning

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).

citing papers explorer

Showing 12 of 12 citing papers.