Why does rl generalize better than sft? a data-centric perspective on vlm post-training.arXiv preprint arXiv:2602.10815, 2026

Aojun Lu, Tao Feng, Hangjie Yuan, Wei Li, Yanan Sun · 2026 · arXiv 2602.10815

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

AnE combines Truth Anchor Expansion and Scaffold-Stripping to deliver 10.3% gains on eight multimodal reasoning benchmarks for MLLMs.

Showing 1 of 1 citing paper.

AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution cs.CV · 2026-05-25 · unverdicted · none · ref 5
AnE combines Truth Anchor Expansion and Scaffold-Stripping to deliver 10.3% gains on eight multimodal reasoning benchmarks for MLLMs.