A Survey on Deep Learning Technique for Video Segmentation

David Crandall; Fatih Porikli; Luc Van Gool; Tianfei Zhou; Wenguan Wang

arxiv: 2107.01153 · v4 · pith:EEL655BDnew · submitted 2021-07-02 · 💻 cs.CV

A Survey on Deep Learning Technique for Video Segmentation

Tianfei Zhou , Fatih Porikli , David Crandall , Luc Van Gool , Wenguan Wang This is my paper

classification 💻 cs.CV

keywords videosegmentationbackgrounddatasetsdeepfieldfurtherlearning

0 comments

read the original abstract

Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance. In this survey, we comprehensively review two basic lines of research -- generic object segmentation (of unknown categories) in videos, and video semantic segmentation -- by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out open issues in this field, and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/tfzhou/VS-Survey.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Zero-Parameter Geometric Gating for Temporally Stable Low-Altitude UAV Video Semantic Segmentation
cs.CV 2026-06 unverdicted novelty 6.0

A RANSAC-based geometric gate routes regions to homography or optical flow warping before SSP fusion, improving mIoU by 4.24-4.91% on synthetic UAVid with only 211K added parameters to frozen backbones.