Video anomaly detection is a challenging task in the computer vision community. Most single task-based methods do not consider the independence of unique spatial and temporal patterns, while two-stream structures lack the exploration of the correlations. In this paper, we propose spatial-temporal memories augmented two-stream auto-encoder framework, which learns the appearance normality and motion normality independently and explores the correlations via adversarial learning. Specifically, we first design two proxy tasks to train the two-stream structure to extract appearance and motion features in isolation. Then, the prototypical features are recorded in the corresponding spatial and temporal memory pools. Finally, the encoding-decoding network performs adversarial learning with the discriminator to explore the correlations between spatial and temporal patterns. Experimental results show that our framework outperforms the state-of-the-art methods, achieving AUCs of 98.1% and 89.8% on UCSD Ped2 and CUHK Avenue datasets.
翻译:视频异常点检测是计算机视觉界的一项艰巨任务。 大多数基于任务的方法都不考虑独特的空间和时间模式的独立性, 而双流结构缺乏对相关关系的探索。 在本文中, 我们提出时空空间记忆可增强双流自动编码框架, 以独立学习外观常态和运动常态, 并通过对抗性学习探索相关关系。 具体地说, 我们首先设计两个代理任务来培训双流结构, 以孤立地提取外观和运动特征。 然后, 原始特征将记录在相应的空间和时间记忆库中。 最后, 编码解码网络与歧视者进行对立学习, 以探索空间和时间模式之间的相互关系。 实验结果表明, 我们的框架超越了最新技术方法, 在UCSD Ped2和 CUHK大道数据集上实现了98.1%和89.8%的 AUCS。