Video anomaly detection is recently formulated as a multiple instance learning task under weak supervision, in which each video is treated as a bag of snippets to be determined whether contains anomalies. Previous efforts mainly focus on the discrimination of the snippet itself without modeling the temporal dynamics, which refers to the variation of adjacent snippets. Therefore, we propose a Discriminative Dynamics Learning (DDL) method with two objective functions, i.e., dynamics ranking loss and dynamics alignment loss. The former aims to enlarge the score dynamics gap between positive and negative bags while the latter performs temporal alignment of the feature dynamics and score dynamics within the bag. Moreover, a Locality-aware Attention Network (LA-Net) is constructed to capture global correlations and re-calibrate the location preference across snippets, followed by a multilayer perceptron with causal convolution to obtain anomaly scores. Experimental results show that our method achieves significant improvements on two challenging benchmarks, i.e., UCF-Crime and XD-Violence.
翻译:最近,录相异常探测作为一种多实例学习任务,在监督不力的情况下,每个视频都被视为一袋片段,以确定是否含有异常。以前的努力主要侧重于片段本身的区别,而没有时间动态模型,即相邻片段的变化。因此,我们提议采用差异动态动态动态学习方法(DDL),有两个客观功能,即动态排序损失和动态协调损失。前者的目的是扩大正式袋和负式袋之间的得分动态差距,而后者则对特征动态进行时间调整,并在包内进行得分动态。此外,还建立了地方认知注意网络(LA-Net),以捕捉全球关联关系,并重新校准各片段的位置偏好,然后采用多层感应感,以获得异常分数。实验结果表明,我们的方法在两个挑战性基准(即UCF-Crime和XD-violence)上取得了重大改进。