Creates the first egocentric screen-view movie emotion benchmark and demonstrates that cinematic models drop sharply in Macro-F1 on realistic robot-like viewing conditions while domain-specific training improves robustness.
Advances in Neural Information Processing Systems36, 46212–46244 (2023)
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.
EgoSelf uses graph-based memory of user interactions to derive personalized profiles and predict future behaviors for egocentric assistants.
citing papers explorer
-
Watching Movies Like a Human: Egocentric Emotion Understanding for Embodied Companions
Creates the first egocentric screen-view movie emotion benchmark and demonstrates that cinematic models drop sharply in Macro-F1 on realistic robot-like viewing conditions while domain-specific training improves robustness.
-
GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models
GTASA supplies annotated multi-actor videos with exact 3D spatial and temporal ground truth that outperforms neural video generators in physical and semantic validity while enabling new probes of video encoders.
-
EgoSelf: From Memory to Personalized Egocentric Assistant
EgoSelf uses graph-based memory of user interactions to derive personalized profiles and predict future behaviors for egocentric assistants.
- Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs