A Review of Deep Learning for Video Captioning

Abbas Khosravi; Abduallah Mohamed; Daniel McDuff; Erik Cambria; Farhad Pourpanah; Fatih Porikli; Meenakshi Kollati; Mohammad Ghavamzadeh; Moloud Abdar; Shuicheng Yan

arxiv: 2304.11431 · v1 · pith:2IZSRPKYnew · submitted 2023-04-22 · 💻 cs.CV

A Review of Deep Learning for Video Captioning

Moloud Abdar , Meenakshi Kollati , Swaraja Kuraparthi , Farhad Pourpanah , Daniel McDuff , Mohammad Ghavamzadeh , Shuicheng Yan , Abduallah Mohamed

show 3 more authors

Abbas Khosravi Erik Cambria Fatih Porikli

This is my paper

classification 💻 cs.CV

keywords videocaptioningapplicationsdeeplanguagelearningnetworksused

0 comments

read the original abstract

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction. In essence, VC involves understanding a video and describing it with language. Captioning is used in a host of applications from creating more accessible interfaces (e.g., low-vision navigation) to video question answering (V-QA), video retrieval and content generation. This survey covers deep learning-based VC, including but, not limited to, attention-based architectures, graph networks, reinforcement learning, adversarial networks, dense video captioning (DVC), and more. We discuss the datasets and evaluation metrics used in the field, and limitations, applications, challenges, and future directions for VC.

This paper has not been read by Pith yet.

A Review of Deep Learning for Video Captioning

discussion (0)