← back to paper
arxiv: 2606.22158 · 2 revisions
Improving Reasoning in Vision-Language Models via Perception Verified Self-Training