Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Ego4d: Around the world in 3,000 hours of egocentric video , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.

citing papers explorer

Showing 2 of 2 citing papers.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 87
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models cs.RO · 2026-05-06 · unverdicted · none · ref 23
A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

fields

years

verdicts

representative citing papers

citing papers explorer