EVE enables verifiable self-evolution of MLLMs by using a Challenger-Solver architecture to generate dynamic executable visual transformations that produce VQA problems with absolute execution-verified ground truth.
Pyvision-rl: Forging open agentic vision models via RL.arxiv, 2602.20739, 2026
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.
citing papers explorer
-
EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations
EVE enables verifiable self-evolution of MLLMs by using a Challenger-Solver architecture to generate dynamic executable visual transformations that produce VQA problems with absolute execution-verified ground truth.
-
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
AXPO addresses the Thinking-Acting Gap in agentic RL training by targeted resampling of tool calls in all-wrong subgroups, delivering +1.8pp gains over GRPO on nine multimodal benchmarks with an 8B model beating a 32B baseline on Pass@4.