pith. sign in

Tokenpacker: Efficient visual projector for multimodal llm

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

verdicts

UNVERDICTED 5

roles

background 1

polarities

background 1

representative citing papers

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

cs.CV · 2024-11-15 · unverdicted · novelty 6.0

LLaVA-CoT adds autonomous multistage reasoning to vision-language models, delivering 9.4% gains over its base model and outperforming larger models like Gemini-1.5-pro on reasoning benchmarks via a 100k annotated dataset and SWIRES test-time scaling.

citing papers explorer

Showing 5 of 5 citing papers.