Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations, and hence result in sub-optimal performance due to the inconsistency of feature magnitudes across scenes. To address this issue, we propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies. Experimental results on two large-scale benchmarks UCF-Crime and XD-Violence manifest that our method outperforms state-of-the-art approaches.
翻译:对监视录像中异常现象的监视不力的检测是一项艰巨的任务。除了现有工作缺乏能力在长长的录像中将异常现象本地化之外,我们提议建立一个新颖的目光和焦点网络,以有效地整合空间-时空信息,从而准确检测异常现象。此外,我们从经验中发现,利用现有方法使用特征量来代表异常现象的程度,通常忽视了场景变化的影响,因此由于场面特征大小不一致而导致业绩不尽人意。为了解决这一问题,我们提议建立特征放大机制和磁性对称损失,以加强特征大小的区别性,以便发现异常现象。关于两个大规模UCF-犯罪和XD-暴力基准的实验结果表明,我们的方法优于最新的方法。