pith. sign in

Densefusion-1m: Merging vision experts for comprehensive multimodal perception.Advances in Neural Information Processing Systems, 37:18535–18556

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CV 2

years

2026 2

representative citing papers

Let ViT Speak: Generative Language-Image Pre-training

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

GenLIP pretrains ViTs to generate language tokens from images via LM objective without contrastive batches or extra decoders, matching baselines on less data and improving on OCR after multi-resolution continued pretraining.

citing papers explorer

Showing 2 of 2 citing papers.