Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. Several approaches have been developed over the last couple of decades and the current state of the art is represented by methods that rely on modern deep neural network architectures. This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization. After presenting the motivation behind the development of technologies for video summarization, we formulate the video summarization task and discuss the main characteristics of a typical deep-learning-based analysis pipeline. Then, we suggest a taxonomy of the existing algorithms and provide a systematic review of the relevant literature that shows the evolution of the deep-learning-based video summarization technologies and leads to suggestions for future developments. We then report on protocols for the objective evaluation of video summarization algorithms and we compare the performance of several deep-learning-based approaches. Based on the outcomes of these comparisons, as well as some documented considerations about the suitability of evaluation protocols, we indicate potential future research directions.
翻译:录像总结技术的目的是通过选择视频内容中信息最丰富的部分来形成简明和完整的概要。在过去几十年中,已经制定了几种方法,目前艺术状态以依赖现代深层神经网络结构的方法为代表。这项工作侧重于该领域的最新进展,对现有的基于深学习的通用录像总结方法进行了全面调查。在展示了开发视频总结技术背后的动机之后,我们制定了视频总结任务,并讨论了典型的深学习分析管道的主要特点。然后,我们建议对现有算法进行分类,并系统地审查显示基于深学习的视频总结技术演变的相关文献,并为今后的发展提出建议。然后,我们报告关于视频总结算法客观评价的规程,并比较若干基于深学习的方法的绩效。我们根据这些比较的结果,以及一些关于评价规程是否适合的有据可查的考虑,指出未来可能的研究方向。