Video anomaly detection has proved to be a challenging task owing to its unsupervised training procedure and high spatio-temporal complexity existing in real-world scenarios. In the absence of anomalous training samples, state-of-the-art methods try to extract features that fully grasp normal behaviors in both space and time domains using different approaches such as autoencoders, or generative adversarial networks. However, these approaches completely ignore or, by using the ability of deep networks in the hierarchical modeling, poorly model the spatio-temporal interactions that exist between objects. To address this issue, we propose a novel yet efficient method named Ano-Graph for learning and modeling the interaction of normal objects. Towards this end, a Spatio-Temporal Graph (STG) is made by considering each node as an object's feature extracted from a real-time off-the-shelf object detector, and edges are made based on their interactions. After that, a self-supervised learning method is employed on the STG in such a way that encapsulates interactions in a semantic space. Our method is data-efficient, significantly more robust against common real-world variations such as illumination, and passes SOTA by a large margin on the challenging datasets ADOC and Street Scene while stays competitive on Avenue, ShanghaiTech, and UCSD.
翻译:视频异常现象的探测证明是一项具有挑战性的任务,原因是其未经监督的培训程序和现实世界情景中存在的高度时空复杂性。在缺乏异常的培训样本的情况下,最先进的方法试图利用自动校正器或基因对抗网络等不同方法,在时空领域提取完全掌握正常行为的特征,但这些方法完全忽视或利用等级模型中深层网络的能力,对天体之间存在的时空互动进行不甚完善的建模。为了解决这一问题,我们提出了名为“Ano-Graph”的新颖而有效的方法,用于学习和模拟正常物体的互动。为达到这一目的,Spatio-Temooral图(STG)通过将每个节点视为实时离场天体物体探测器的一个特征,并根据它们之间的交互作用而形成边缘。 之后,在STG中采用了一种自我超强的学习方法,用以在安全空间平台上进行隐蔽互动和建模模型。 至此端,我们的方法通过一个具有高度挑战性的数据效率,在现实空间空间变异性上,在现实空间空间变异上也具有更高的数据。