This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks. Conventional work in temporal video segmentation and action detection focuses on localizing pre-defined action categories and thus does not scale to generic videos. Cognitive Science has known since last century that humans consistently segment videos into meaningful temporal chunks. This segmentation happens naturally, with no pre-defined event categories and without being explicitly asked to do so. Here, we repeat these cognitive experiments on mainstream CV datasets; with our novel annotation guideline which addresses the complexities of taxonomy-free event boundary annotation, we introduce the task of Generic Event Boundary Detection (GEBD) and the new benchmark Kinetics-GEBD. Through experiment and human study we demonstrate the value of the annotations. We view this as an important stepping stone towards understanding the video as a whole, and believe it has been previously neglected due to a lack of proper task definition and annotations. Further, inspired by the cognitive finding that humans mark boundaries at points where they are unable to predict the future accurately, we explore un-supervised approaches based on temporal predictability. We identify and extensively explore important design factors for GEBD models on the TAPOS dataset and our Kinetics-GEBD while achieving competitive performance and suggesting future work. We will release our annotations and code at CVPR'21 LOVEU Challenge: https://sites.google.com/view/loveucvpr21
翻译:本文展示了一个新任务, 以及用于探测非常规、 无分类事件界限的新基准, 将整段视频分割成块块。 时间性视频分割和行动探测的常规工作侧重于将预定的行动类别本地化, 因而不至于扩大到通用视频。 认知科学自上个世纪以来一直知道, 人类一贯将视频分割成有意义的时间块。 这种分割自然发生, 没有预先界定的事件类别, 也没有明确要求这样做。 在此, 我们重复了主流 CV 数据集的认知实验; 我们的新认知发现, 人类在无法准确预测未来无分类事件边界识别的点上标出了界限, 我们引入了通用事件边界探测(GEBD)的任务, 以及新的基尼基特- GEBD基准任务。 我们通过实验和人类研究, 展示了说明的价值。 我们将此视为一个重要的垫脚石, 并相信它先前被忽略了, 是因为没有正确的任务定义和说明。 此外, 我们的认知发现, 人类在无法准确预测未来边界界限的点上, 我们探索了通用事件边界探测(GEO- GEO) 的预测, 未来数据, 我们根据重要的预测, 我们的透明- Slimal- revial destal droview 进行我们的重要设计了我们的重要的预测, 我们的透明性数据, 我们在C- real- real- greal- sal- greal- grealdaldaldaldaldaldaldalbaldaldaldaldaldaldald 上, 我们 上, 我们 做了关于重要设计了我们的重要数据设计了我们的透明性数据 。