This work tackles Weakly Supervised Anomaly detection, in which a predictor is allowed to learn not only from normal examples but also from a few labeled anomalies made available during training. In particular, we deal with the localization of anomalous activities within the video stream: this is a very challenging scenario, as training examples come only with video-level annotations (and not frame-level). Several recent works have proposed various regularization terms to address it i.e. by enforcing sparsity and smoothness constraints over the weakly-learned frame-level anomaly scores. In this work, we get inspired by recent advances within the field of self-supervised learning and ask the model to yield the same scores for different augmentations of the same video sequence. We show that enforcing such an alignment improves the performance of the model on XD-Violence.
翻译:这项工作解决了薄弱的监管异常检测问题, 预测者不仅可以从正常例子中学习,还可以从培训期间提供的少数有标签的异常现象中学习。 特别是, 我们处理视频流中的异常活动本地化问题: 这是一个非常具有挑战性的情景, 因为培训范例只包含视频级说明( 而不是框架级) 。 最近的一些工作提出了解决这一问题的各种规范化条件, 即通过对学习不力的框架级异常分数实施松散和平滑的制约。 在这项工作中, 我们从自我监督学习领域的最新进展中得到启发, 并要求模型为同一视频序列的不同增强产生相同的分数。 我们显示, 实施这种匹配可以改善 XD- 暴力模型的性能。