Video anomaly detection (VAD) remains a challenging task in the pattern recognition community due to the ambiguity and diversity of abnormal events. Existing deep learning-based VAD methods usually leverage proxy tasks to learn the normal patterns and discriminate the instances that deviate from such patterns as abnormal. However, most of them do not take full advantage of spatial-temporal correlations among video frames, which is critical for understanding normal patterns. In this paper, we address unsupervised VAD by learning the evolution regularity of appearance and motion in the long and short-term and exploit the spatial-temporal correlations among consecutive frames in normal videos more adequately. Specifically, we proposed to utilize the spatiotemporal long short-term memory (ST-LSTM) to extract and memorize spatial appearances and temporal variations in a unified memory cell. In addition, inspired by the generative adversarial network, we introduce a discriminator to perform adversarial learning with the ST-LSTM to enhance the learning capability. Experimental results on standard benchmarks demonstrate the effectiveness of spatial-temporal correlations for unsupervised VAD. Our method achieves competitive performance compared to the state-of-the-art methods with AUCs of 96.7%, 87.8%, and 73.1% on the UCSD Ped2, CUHK Avenue, and ShanghaiTech, respectively.
翻译:由于异常事件的模糊性和多样性,现有深层次的基于学习的 VAD 方法通常会利用代理任务来学习正常模式并区分与异常模式不同的情况。然而,大多数这些方法并不充分利用视频框架之间的空间-时际相关性,这对理解正常模式至关重要。在本文中,我们通过了解长期和短期外观和运动的规律性演变来应对VAD不受监督的 VAD,并更充分地利用正常视频连续框架之间的空间-时际相关性。具体地说,我们提议利用空间时空长期记忆(ST-LSTM)来提取和记忆中统一的空间外观和时间变异。此外,我们受到变相网络的启发,引入了与ST-LSTM 进行对抗性学习以提高学习能力的区别对待者。标准基准的实验结果显示,空间-时空相关性对于不受超强VAD的 VAD的有效性。我们的方法分别是利用空间-时空长期记忆(ST-LSTM)长期记忆(ST-SM-M-M-M-SU-C-C-C-C-CULA-C-C-C-C-SD-SD-SD-SD-C-C-C-C-CULA-C-C-C-CUL-C-CULM-C-C-C-C-C-C-SL-C-C-C-C-C-C-C-C-C-C-C-C-C-C-SL-C-C-C-C-C-C-C-C-C-SL-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-I-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-