While classic video anomaly detection (VAD) requires labeled normal videos for training, emerging unsupervised VAD (UVAD) aims to discover anomalies directly from fully unlabeled videos. However, existing UVAD methods still rely on shallow models to perform detection or initialization, and they are evidently inferior to classic VAD methods. This paper proposes a full deep neural network (DNN) based solution that can realize highly effective UVAD. First, we, for the first time, point out that deep reconstruction can be surprisingly effective for UVAD, which inspires us to unveil a property named "normality advantage", i.e., normal events will enjoy lower reconstruction loss when DNN learns to reconstruct unlabeled videos. With this property, we propose Localization based Reconstruction (LBR) as a strong UVAD baseline and a solid foundation of our solution. Second, we propose a novel self-paced refinement (SPR) scheme, which is synthesized into LBR to conduct UVAD. Unlike ordinary self-paced learning that injects more samples in an easy-to-hard manner, the proposed SPR scheme gradually drops samples so that suspicious anomalies can be removed from the learning process. In this way, SPR consolidates normality advantage and enables better UVAD in a more proactive way. Finally, we further design a variant solution that explicitly takes the motion cues into account. The solution evidently enhances the UVAD performance, and it sometimes even surpasses the best classic VAD methods. Experiments show that our solution not only significantly outperforms existing UVAD methods by a wide margin (5% to 9% AUROC), but also enables UVAD to catch up with the mainstream performance of classic VAD.
翻译:虽然经典视频异常检测(VAD)要求有标签的正常培训视频(VAD),但新出现不受监督的VAD(UVAD)旨在直接发现完全没有标签的视频的异常。然而,现有的UVAD方法仍然依赖浅度模型来进行检测或初始化,而且显然低于典型的VAD方法。本文提出了一个完全深层的神经网络(DNN)基础解决方案,可以实现高效的UVAD。首先,我们首次指出,对UVAD来说,深度重建可能令人惊讶地有效,这促使我们披露一个名为“正常优势”的财产,即当DNNE学会重建未贴标签的视频时,正常的重建事件将享受较低的重建损失。有了这一属性,我们建议基于本地化的重建(LBRBR)作为强大的UVAD基线和我们解决方案的坚实基础。 其次,我们提出一个全新的自我加速完善的改进(SPR)计划,它有时被合成为进行UVADAD。与普通的自我加速学习方式不同,普通的自我速度让我们能够以更精确的方式将更多的样本取出一个更好的解决方案。