pith. machine review for the scientific record. sign in

arxiv: 1804.09626 · v2 · submitted 2018-04-25 · 💻 cs.CV

Recognition: unknown

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

Authors on Pith no claims yet
classification 💻 cs.CV
keywords charades-egodatasetvideoactivityegocentricfirstthird-personannotations
0
0 comments X
read the original abstract

In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances. Charades-Ego has temporal annotations and textual descriptions, making it suitable for egocentric video classification, localization, captioning, and new tasks utilizing the cross-modal nature of the data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation

    cs.CV 2026-04 unverdicted novelty 6.0

    A scalable training-free pipeline using video segmentation, filtering, and off-the-shelf multimodal models creates DenseStep2M, a dataset of 100K videos and 2M detailed instructional steps that improves dense captioni...

  2. Bringing a Personal Point of View: Evaluating Dynamic 3D Gaussian Splatting for Egocentric Scene Reconstruction

    cs.CV 2026-04 conditional novelty 5.0

    Dynamic 3DGS models achieve lower PSNR on egocentric videos than exocentric ones, with the gap arising from static content reconstruction.