Key Frame Extraction with Attention Based Deep Neural Networks

Samed Arslan; Senem Tanberk

arxiv: 2306.13176 · v1 · pith:BY5IJMNOnew · submitted 2023-06-21 · 💻 cs.CV · cs.LG· eess.IV

Key Frame Extraction with Attention Based Deep Neural Networks

Samed Arslan , Senem Tanberk This is my paper

classification 💻 cs.CV cs.LGeess.IV

keywords videouseddeepframesmethodmethodsvideosattention

0 comments

read the original abstract

Automatic keyframe detection from videos is an exercise in selecting scenes that can best summarize the content for long videos. Providing a summary of the video is an important task to facilitate quick browsing and content summarization. The resulting photos are used for automated works (e.g. summarizing security footage, detecting different scenes used in music clips) in different industries. In addition, processing high-volume videos in advanced machine learning methods also creates resource costs. Keyframes obtained; It can be used as an input feature to the methods and models to be used. In this study; We propose a deep learning-based approach for keyframe detection using a deep auto-encoder model with an attention layer. The proposed method first extracts the features from the video frames using the encoder part of the autoencoder and applies segmentation using the k-means clustering algorithm to group these features and similar frames together. Then, keyframes are selected from each cluster by selecting the frames closest to the center of the clusters. The method was evaluated on the TVSUM video dataset and achieved a classification accuracy of 0.77, indicating a higher success rate than many existing methods. The proposed method offers a promising solution for key frame extraction in video analysis and can be applied to various applications such as video summarization and video retrieval.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

QCA: Query- and Content-Aware Keyframe Selection for Long Video Understanding
cs.CV 2026-07 unverdicted novelty 5.0

QCA selects compact, query-relevant keyframes from long videos via segment-wise budget allocation and diversity-aware addition, achieving higher accuracy than GPT-4o on LongVideoBench with half the frames.