arXiv preprint arXiv:2602.20739 , year=

PyVision-RL: Forging Open Agentic Vision Models via RL , author= · 2026 · arXiv 2602.20739

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations

cs.CV · 2026-04-20 · unverdicted · novelty 8.0

EVE enables verifiable self-evolution of MLLMs by using a Challenger-Solver architecture to generate dynamic executable visual transformations that produce VQA problems with absolute execution-verified ground truth.

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

P2R decouples perception from reasoning in VLMs via a two-stage process and PRA-GRPO alternating RL training, reporting gains such as 93.2% on V-Star for the 4B model over its Qwen3-VL backbone.

TACO: Tool-Augmented Credit Optimization for Agentic Tool Use

cs.MA · 2026-06-29 · unverdicted · novelty 6.0

TACO combines Differential Answer-Probe Reward (DAPR) and Outcome-Gated Advantage Routing (OGAR) to assign credit to tool calls in agentic visual reasoning, producing accuracy gains on multimodal benchmarks.

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 20
AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.

arXiv preprint arXiv:2602.20739 , year=

fields

years

verdicts

representative citing papers

citing papers explorer