DynaTok introduces temporally adaptive budget allocation with EMA memory and spatial selection with memory to compress video tokens, retaining over 95% accuracy at 90% reduction on VideoQA benchmarks.
Timechat: A time-sensitive multimodal large lan- guage model for long video understanding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MARS converts long videos to captions and summaries, maintains modality-specific memories, and deploys an agent to select evidence or answer, placing second on the CASTLE Challenge leaderboard.
citing papers explorer
-
DynaTok: Temporally Adaptive and Positional Bias-Aware Token Compression for Video-LLMs
DynaTok introduces temporally adaptive budget allocation with EMA memory and spatial selection with memory to compress video tokens, retaining over 95% accuracy at 90% reduction on VideoQA benchmarks.
-
MARS: Technical Report for the CASTLE Challenge at EgoVis 2026
MARS converts long videos to captions and summaries, maintains modality-specific memories, and deploys an agent to select evidence or answer, placing second on the CASTLE Challenge leaderboard.