Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Khan · 2024 · DOI 10.18653/v1/2024.acl-long.679

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

MemoryCard: Topic-Aware Multi-Modal Clue Compression for Long-Video Question Answering

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.

AdaCodec: A Predictive Visual Code for Video MLLMs

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.

citing papers explorer

Showing 2 of 2 citing papers.

MemoryCard: Topic-Aware Multi-Modal Clue Compression for Long-Video Question Answering cs.CV · 2026-06-04 · unverdicted · none · ref 54
MemoryCard organizes long videos into self-contained topic-aware Memory Cards that improve long-video QA accuracy by up to 21.8% relative under fixed visual-token budgets.
AdaCodec: A Predictive Visual Code for Video MLLMs cs.CV · 2026-06-01 · unverdicted · none · ref 2
AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

fields

years

verdicts

representative citing papers

citing papers explorer