Zamba2-VL is a family of 1.2B–7B hybrid Mamba2-transformer vision-language models that match leading transformer VLMs on image, reasoning, OCR, grounding and counting benchmarks while delivering roughly 10x lower time-to-first-token.
Cobra: Extending 10 mamba to multi-modal large language model for efficient inference.arXiv preprint arXiv:2403.14520, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.
citing papers explorer
-
Zamba2-VL Technical Report
Zamba2-VL is a family of 1.2B–7B hybrid Mamba2-transformer vision-language models that match leading transformer VLMs on image, reasoning, OCR, grounding and counting benchmarks while delivering roughly 10x lower time-to-first-token.
-
A Survey of Mamba
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.