In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story. This differs from traditional video highlight detection or video summarization problems in that each sub-video is required to maintain a coherent and integral story, which is becoming particularly important for resource-production video sharing platforms such as Youtube, Facebook, TikTok, Kwai, etc. To address the problem, we collect and annotate a new large video truncation dataset, named as TruNet, which contains 1470 videos with on average 11 short stories per video. With the new dataset, we further develop and train a neural architecture for video truncation that consists of two components: a Boundary Aware Network (BAN) and a Fast-Forward Long Short-Term Memory (FF-LSTM). We first use the BAN to generate high quality temporal proposals by jointly considering frame-level attractiveness and boundaryness. We then apply the FF-LSTM, which tends to capture high-order dependencies among a sequence of frames, to decide whether a temporal proposal is a coherent and integral story. We show that our proposed framework outperforms existing approaches for the story-preserving long video truncation problem in both quantitative measures and user-study. The dataset is available for public academic research usage at https://ai.baidu.com/broad/download.
翻译:在这项工作中,我们引入了一个新的问题,即所谓的“故事保存长视频记录”这样的新问题,这需要一种算法,将一个长途视频自动转换成多部短短和有吸引力的子视频,每个视频包含一个未破碎的故事。这与传统的视频突出探测或视频总结问题不同,因为每个子视频需要保持一个连贯和完整的故事,对于Youtube、Facebook、TikTok、Kwai等资源生产视频共享平台来说,这变得特别重要。为了解决这个问题,我们收集并公布一个新的大型视频记录数据集,名为TruNet,其中包含1 470个视频,每个视频平均有11个短故事。有了新数据集,我们进一步开发和培训了一个视频跟踪的线性结构,由两个部分组成:边界了解网络(BAN)和快速前长期短期记忆(FF-LSTM)。我们首先利用BA来提出高质量的时间跨时间建议,共同考虑框架的吸引力和边界。我们随后应用了1,470个视频视频视频视频视频数据,在新版本中,我们倾向于显示一个连续的顺序。