MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.
2.5 years in class: A multimodal textbook for vision- language pretraining
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Omni Parsing framework converts complex multimodal signals into locatable, enumerable, and traceable structured knowledge via hierarchical detection, recognition, and interpreting with strict evidence alignment.
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
citing papers explorer
-
MMSearch-R1: Incentivizing LMMs to Search
MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.
-
Logics-Parsing-Omni Technical Report
Omni Parsing framework converts complex multimodal signals into locatable, enumerable, and traceable structured knowledge via hierarchical detection, recognition, and interpreting with strict evidence alignment.
-
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.