The large amount of videos popping up every day, make it is more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. In this paper, we propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it can select a set of key frames, which contains the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced for enhancing temporal representation capturing. The generator aims to select key frames by using DTR units to effectively exploit global multi-scale temporal context and to complement the commonly used Bi-LSTM. To ensure that the summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The three-player loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on two public datasets SumMe and TVSum show the superiority of our DTR-GAN over the state-of-the-art approaches.
翻译:每天弹出大量视频, 使得视频中的关键信息能够在非常短的时间内被提取和理解变得越来越重要。 视频总和, 寻找最小的一组框架的任务 — — 仍然能传达给特定视频的全部故事, 因此对于提高视频理解效率非常重要。 在本文中, 我们提议建立一个全新的“ 淡化时空关系热向反向网络 ” (DTR- GAN), 以达到框架级视频总和。 在视频中, 它可以选择一组关键框架, 包含最有意义和最紧凑的信息 。 具体地说, DTR- GAN 以对抗的方式, 找到一个淡化的时间关系生成器最小的子框架子子子子子子子子子子子子子子子集, 引入一个新的“ 淡化时间关系 (DTR) 模块, 目的是通过 DTR 单元来选择关键框架, 有效地利用全球多尺度的模型时间背景, 并补充常用的 Bi- LSTM 。 以确保国家摘要从全球角度收集足够的关键视频代表, 而不是一个微不足道的普通的 TR- greal- main lishal- mission 。 我们通过缩略地显示三个“ 损失摘要 ”, 。 显示“ 缩略取了“ ” 缩略取了“ ” 。