Abnormal event detection in videos is a challenging problem, partly due to the multiplicity of abnormal patterns and the lack of their corresponding annotations. In this paper, we propose new constrained pretext tasks to learn object level normality patterns. Our approach consists in learning a mapping between down-scaled visual queries and their corresponding normal appearance and motion characteristics at the original resolution. The proposed tasks are more challenging than reconstruction and future frame prediction tasks which are widely used in the literature, since our model learns to jointly predict spatial and temporal features rather than reconstructing them. We believe that more constrained pretext tasks induce a better learning of normality patterns. Experiments on several benchmark datasets demonstrate the effectiveness of our approach to localize and track anomalies as it outperforms or reaches the current state-of-the-art on spatio-temporal evaluation metrics.
翻译:视频中的异常事件探测是一个具有挑战性的问题,部分是由于异常模式的多样性和缺乏相应的说明。在本文中,我们提出了新的有限制的借口任务,以学习对象水平的正常模式。我们的方法是在原始分辨率上对缩小尺度的视觉查询及其相应的正常外观和运动特征进行绘图。拟议的任务比文献中广泛使用的重建和未来框架预测任务更具挑战性,因为我们的模型学会了共同预测空间和时间特征,而不是重建这些特征。我们认为,更受限制的借口任务有助于更好地了解正常模式。对几个基准数据集的实验表明,随着异常现象的成形或达到当前关于时空评估指标的最新水平,我们对异常现象进行本地化和跟踪的方法是有效的。