Tokenpacker: Efficient visual projector for multimodal llm

Li, W · 2024 · arXiv 2407.02392

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.

MS-Resampler: Multi-Scope Visual Resampling for Efficient Multimodal LLMs

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

MS-Resampler deploys multiple scope-specific resamplers with explicit spatial priors and adaptive fusion to outperform single-scope global cross-attention in MLLMs on ten benchmarks with minimal added cost.

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

PARCEL is a new visual tokenization architecture combining pool-anchored resampling with conditioned elastic queries to enhance performance-efficiency tradeoffs in LVLMs over prior matryoshka methods.

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

cs.CV · 2024-11-15 · unverdicted · novelty 6.0

LLaVA-CoT adds autonomous multistage reasoning to vision-language models, delivering 9.4% gains over its base model and outperforming larger models like Gemini-1.5-pro on reasoning benchmarks via a 100k annotated dataset and SWIRES test-time scaling.

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

cs.RO · 2025-11-10 · unverdicted · novelty 5.0

SlotVLA uses slot attention to model object-relation representations for multitask robotic manipulation, reducing visual tokens while achieving competitive generalization on the new LIBERO+ benchmark.

citing papers explorer

Showing 5 of 5 citing papers.

UIPress: Bringing Optical Token Compression to UI-to-Code Generation cs.CL · 2026-04-10 · unverdicted · none · ref 23
UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.
MS-Resampler: Multi-Scope Visual Resampling for Efficient Multimodal LLMs cs.CV · 2026-06-30 · unverdicted · none · ref 19
MS-Resampler deploys multiple scope-specific resamplers with explicit spatial priors and adaptive fusion to outperform single-scope global cross-attention in MLLMs on ten benchmarks with minimal added cost.
PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding cs.CV · 2026-05-28 · unverdicted · none · ref 64
PARCEL is a new visual tokenization architecture combining pool-anchored resampling with conditioned elastic queries to enhance performance-efficiency tradeoffs in LVLMs over prior matryoshka methods.
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step cs.CV · 2024-11-15 · unverdicted · none · ref 32
LLaVA-CoT adds autonomous multistage reasoning to vision-language models, delivering 9.4% gains over its base model and outperforming larger models like Gemini-1.5-pro on reasoning benchmarks via a 100k annotated dataset and SWIRES test-time scaling.
SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation cs.RO · 2025-11-10 · unverdicted · none · ref 21
SlotVLA uses slot attention to model object-relation representations for multitask robotic manipulation, reducing visual tokens while achieving competitive generalization on the new LIBERO+ benchmark.

Tokenpacker: Efficient visual projector for multimodal llm

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer