Video anomaly detection is commonly used in many applications such as security surveillance and is very challenging.A majority of recent video anomaly detection approaches utilize deep reconstruction models, but their performance is often suboptimal because of insufficient reconstruction error differences between normal and abnormal video frames in practice. Meanwhile, frame prediction-based anomaly detection methods have shown promising performance. In this paper, we propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design which is more in line with the characteristics of surveillance videos. The proposed method is equipped with a multi-path ConvGRU-based frame prediction network that can better handle semantically informative objects and areas of different scales and capture spatial-temporal dependencies in normal videos. A noise tolerance loss is introduced during training to mitigate the interference caused by background noise. Extensive experiments have been conducted on the CUHK Avenue, ShanghaiTech Campus, and UCSD Pedestrian datasets, and the results show that our proposed method outperforms existing state-of-the-art approaches. Remarkably, our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
翻译:在安全监视等许多应用中,常见地使用视频异常探测方法,这非常具有挑战性。 大部分最近的视频异常探测方法使用深重重建模型,但由于正常和异常视频框架之间在实践中的重建错误差异不足,其性能往往不理想。同时,基于预测的异常探测方法显示了有希望的性能。在本文中,我们提出一种创新的、强有力的、不受监督的视频异常探测方法,方法是以适当的设计进行框架预测,更符合监视视频的特征。拟议方法配备了多路径的CONUGRU框架预测网络,能够更好地处理语义信息对象和不同尺度的区域,并在正常视频中捕捉到空间时空依赖性。在培训中引入了噪音容忍损失,以减轻背景噪音造成的干扰。在CUHK大道、上海科技校园和UCSDPedestrian数据集进行了广泛的实验,结果显示,我们拟议的方法超出了现有的最新方法。值得注意的是,我们拟议的方法在正常视频中获得了88.3%的AUROC分数。