Highlight detection has the potential to significantly ease video browsing, but existing methods often suffer from expensive supervision requirements, where human viewers must manually identify highlights in training videos. We propose a scalable unsupervised solution that exploits video duration as an implicit supervision signal. Our key insight is that video segments from shorter user-generated videos are more likely to be highlights than those from longer videos, since users tend to be more selective about the content when capturing shorter videos. Leveraging this insight, we introduce a novel ranking framework that prefers segments from shorter videos, while properly accounting for the inherent noise in the (unlabeled) training data. We use it to train a highlight detector with 10M hashtagged Instagram videos. In experiments on two challenging public video highlight detection benchmarks, our method substantially improves the state-of-the-art for unsupervised highlight detection.
翻译:高光探测有可能大大方便视频浏览,但现有方法往往受到昂贵的监督要求的影响,人类观众必须手动识别培训视频中的亮点。 我们提出一个可以利用视频持续时间作为隐含监督信号的可扩缩的、不受监督的解决方案。 我们的关键洞察力是,用户生成的较短视频的视频部分比长视频的视频部分更有可能被突出,因为用户在捕捉较短视频时往往对内容更具有选择性。 利用这一洞察力,我们引入了一个新颖的排名框架,它更偏爱短视频段段,同时适当计算( 未贴标签的)培训数据中固有的噪音。 我们用它来用10M标记的Instagram视频来训练亮点探测器。 在两个挑战性公共视频突出探测基准的实验中,我们的方法大大改进了不受监控的亮点探测技术。