In contemporary society, surveillance anomaly detection, i.e., spotting anomalous events such as crimes or accidents in surveillance videos, is a critical task. As anomalies occur rarely, most training data consists of unlabeled videos without anomalous events, which makes the task challenging. Most existing methods use an autoencoder (AE) to learn to reconstruct normal videos; they then detect anomalies based on their failure to reconstruct the appearance of abnormal scenes. However, because anomalies are distinguished by appearance as well as motion, many previous approaches have explicitly separated appearance and motion information-for example, using a pre-trained optical flow model. This explicit separation restricts reciprocal representation capabilities between two types of information. In contrast, we propose an implicit two-path AE (ITAE), a structure in which two encoders implicitly model appearance and motion features, along with a single decoder that combines them to learn normal video patterns. For the complex distribution of normal scenes, we suggest normal density estimation of ITAE features through normalizing flow (NF)-based generative models to learn the tractable likelihoods and identify anomalies using out of distribution detection. NF models intensify ITAE performance by learning normality through implicitly learned features. Finally, we demonstrate the effectiveness of ITAE and its feature distribution modeling on six benchmarks, including databases that contain various anomalies in real-world scenarios.
翻译:在当代社会,监视异常现象的发现,即监视异常现象的发现,即发现犯罪或监视视频中的事故等异常事件,是一项关键的任务。由于异常现象很少发生,大多数培训数据都由没有异常事件、没有标记的录像组成,这使得任务具有挑战性。大多数现有方法使用自动编码器(AE)学习重建正常视频;然后根据无法重建异常场景的外观发现异常现象;然而,由于异常现象的外观和动作不同,许多以往方法都明显地将外观和运动信息分开,例如使用预先训练的光学流模型。这种明确区分限制了两种信息之间的相互代表能力。相比之下,我们建议采用隐含双向AE(ITAE)结构,即两个编码器隐含模型的外观和动作特征,同时使用一个单一的解码器来学习正常视频模式。关于正常场景的复杂分布,我们建议通过正常流动(NF)为基础的归别模型来正常的密度估计,以学习可移动的可能性和识别异常现象,包括使用分销检测的正常状态,最后我们通过SDA模型在正常状态上学习了正常的性特征。