pith. sign in

2.5 years in class: A multimodal textbook for vision- language pretraining

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 2 cs.CV 1

years

2026 2 2025 1

verdicts

UNVERDICTED 3

roles

background 1

polarities

background 1

representative citing papers

MMSearch-R1: Incentivizing LMMs to Search

cs.CV · 2025-06-25 · unverdicted · novelty 7.0

MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.

Logics-Parsing-Omni Technical Report

cs.AI · 2026-03-10 · unverdicted · novelty 6.0

Omni Parsing framework converts complex multimodal signals into locatable, enumerable, and traceable structured knowledge via hierarchical detection, recognition, and interpreting with strict evidence alignment.

citing papers explorer

Showing 3 of 3 citing papers.

  • MMSearch-R1: Incentivizing LMMs to Search cs.CV · 2025-06-25 · unverdicted · none · ref 68

    MMSearch-R1 uses reinforcement learning to train multimodal models for on-demand multi-turn internet search with image and text tools, outperforming same-size RAG baselines and matching larger ones while cutting search calls by over 30%.

  • Logics-Parsing-Omni Technical Report cs.AI · 2026-03-10 · unverdicted · none · ref 25

    Omni Parsing framework converts complex multimodal signals into locatable, enumerable, and traceable structured knowledge via hierarchical detection, recognition, and interpreting with strict evidence alignment.

  • Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding cs.AI · 2026-05-10 · unverdicted · none · ref 101

    Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.