Vgent: Graph-based retrieval- reasoning-augmented generation for long video understanding

Shen, X · 2025 · arXiv 2510.14032

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents

cs.CV · 2026-03-02 · unverdicted · novelty 7.0

MM-Mem distills video input through a hierarchical memory of sensory buffer, episodic stream, and symbolic schema, optimized by a semantic information bottleneck and SIB-GRPO, to achieve SOTA on long-horizon video benchmarks.

VisReflect: Latent Visual Reflection for Fine-Grained Perception in Long Visual Context

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

VisReflect generates continuous latent visual reflections to emphasize relevant visual features and guide attention in LVLMs, yielding 4.1% gains on image benchmarks and 1.8% on video benchmarks with 44% less inference time than zooming methods.

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.

GLANCE: A Global-Local Coordination Multi-Agent Framework for Music-Grounded Non-Linear Video Editing

cs.MA · 2026-04-06 · unverdicted · novelty 6.0

GLANCE introduces a bi-loop multi-agent framework with global-local coordination mechanisms that outperforms baselines by up to 33% on music-grounded nonlinear video editing tasks using a new MVEBench benchmark.

citing papers explorer

Showing 2 of 2 citing papers after filters.

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG cs.CV · 2026-04-07 · unverdicted · none · ref 3
VideoStir introduces a spatio-temporal graph-based structure and intent-aware retrieval for long-video RAG, achieving competitive performance with SOTA methods via a new IR-600K dataset.
GLANCE: A Global-Local Coordination Multi-Agent Framework for Music-Grounded Non-Linear Video Editing cs.MA · 2026-04-06 · unverdicted · none · ref 37
GLANCE introduces a bi-loop multi-agent framework with global-local coordination mechanisms that outperforms baselines by up to 33% on music-grounded nonlinear video editing tasks using a new MVEBench benchmark.

Vgent: Graph-based retrieval- reasoning-augmented generation for long video understanding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer