Video description involves the generation of the natural language description of actions, events, and objects in the video. There are various applications of video description by filling the gap between languages and vision for visually impaired people, generating automatic title suggestion based on content, browsing of the video based on the content and video-guided machine translation [86] etc.In the past decade, several works had been done in this field in terms of approaches/methods for video description, evaluation metrics,and datasets. For analyzing the progress in the video description task, a comprehensive survey is needed that covers all the phases of video description approaches with a special focus on recent deep learning approaches. In this work, we report a comprehensive survey on the phases of video description approaches, the dataset for video description, evaluation metrics, open competitions for motivating the research on the video description, open challenges in this field, and future research directions. In this survey, we cover the state-of-the-art approaches proposed for each and every dataset with their pros and cons. For the growth of this research domain,the availability of numerous benchmark dataset is a basic need. Further, we categorize all the dataset into two classes: open domain dataset and domain-specific dataset. From our survey, we observe that the work in this field is in fast-paced development since the task of video description falls in the intersection of computer vision and natural language processing. But still, the work in the video description is far from saturation stage due to various challenges like the redundancy due to similar frames which affect the quality of visual features, the availability of dataset containing more diverse content and availability of an effective evaluation metric.
翻译:视频描述涉及视频中行动、事件和对象的自然语言描述。视频描述有各种应用,通过填补语言与视力受损者之间差距的视频描述,产生基于内容的自动标题建议,根据内容和视频制导机器翻译(86)等浏览视频(86)等等。在过去十年中,这一领域在视频描述、评价尺度和数据集的方法/方法方面做了一些工作。为分析视频描述任务的进展,需要进行一次全面的调查,涵盖视频描述方法的所有阶段,特别侧重于最近的深层次学习方法。在这项工作中,我们报告对视频描述方法的各个阶段、视频描述的数据集、评价指标、为激励视频描述研究的公开竞赛、该领域的公开挑战以及未来的研究方向进行了全面调查。在本次调查中,我们介绍了为每个数据集和每个数据集提出的最新方法,以及这些数据集的准确性和交叉性。关于该研究领域是否具备大量基准的标准化数据评估,对于近期的可获取性评估,是一个基本需要。此外,我们将这一领域的所有数据分类都包括了我们所处域的直径、直径、直径、直径、直径、直到直径的实地数据测量的所有数据。