pith. sign in

hub Tool reference

SEED-Bench-2-Plus: Bench- 12 marking multimodal large language models with text-rich vi- sual comprehension

Tool reference. 86% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

15 Pith papers citing it
Method reference 86% of classified citations

hub tools

citation-role summary

dataset 6 background 1

citation-polarity summary

fields

cs.CV 13 cs.CL 2

representative citing papers

Deep Pre-Alignment for VLMs

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.

DeepEyesV2: Toward Agentic Multimodal Model

cs.CV · 2025-11-07 · unverdicted · novelty 6.0

DeepEyesV2 uses a two-stage cold-start plus reinforcement learning pipeline to produce an agentic multimodal model that adaptively invokes tools and outperforms direct RL on real-world reasoning benchmarks.

citing papers explorer

Showing 15 of 15 citing papers.