Weakly Supervised Video Anomaly Detection (WSVAD) is challenging because the binary anomaly label is only given on the video level, but the output requires snippet-level predictions. So, Multiple Instance Learning (MIL) is prevailing in WSVAD. However, MIL is notoriously known to suffer from many false alarms because the snippet-level detector is easily biased towards the abnormal snippets with simple context, confused by the normality with the same bias, and missing the anomaly with a different pattern. To this end, we propose a new MIL framework: Unbiased MIL (UMIL), to learn unbiased anomaly features that improve WSVAD. At each MIL training iteration, we use the current detector to divide the samples into two groups with different context biases: the most confident abnormal/normal snippets and the rest ambiguous ones. Then, by seeking the invariant features across the two sample groups, we can remove the variant context biases. Extensive experiments on benchmarks UCF-Crime and TAD demonstrate the effectiveness of our UMIL. Our code is provided at https://github.com/ktr-hubrt/UMIL.
翻译:弱监督视频异常检测(WSVAD)具有挑战性,因为仅在视频级别上给出二元异常标签,但输出需要片段级别的预测。因此,多实例学习(MIL)在WSVAD中盛行。然而,MIL因为片段级别检测器容易被偏向于简单背景下的异常片段,从而产生许多误报,很容易被同样偏差的正常部分所混淆,而忽略了不同模式的异常。为此,我们提出了一个新的MIL框架:无偏MIL(UMIL),用于学习无偏的异常特征,从而提高WSVAD性能。在每个MIL训练迭代中,我们使用当前的检测器将样本分成两组,其中包括最有信心的异常/正常片段和其余的模棱两可的片段。然后,通过寻求两个样本组之间的不变特征,我们可以消除这些变量背景干扰。对UCF-Crime和TAD基准测试进行的大量实验表明,我们的UMIL是有效的。我们提供了代码:https://github.com/ktr-hubrt/UMIL。