MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.
citing papers explorer
-
MemoryCard: Topic-Aware Multi-Modal Clue Compression for Long-Video Question Answering
MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.
-
AdaCodec: A Predictive Visual Code for Video MLLMs
AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.