Anomaly identification is highly dependent on the relationship between the object and the scene, as different/same object actions in same/different scenes may lead to various degrees of normality and anomaly. Therefore, object-scene relation actually plays a crucial role in anomaly detection but is inadequately explored in previous works. In this paper, we propose a Spatial-Temporal Relation Learning (STRL) framework to tackle the video anomaly detection task. First, considering dynamic characteristics of the objects as well as scene areas, we construct a Spatio-Temporal Auto-Encoder (STAE) to jointly exploit spatial and temporal evolution patterns for representation learning. For better pattern extraction, two decoding branches are designed in the STAE module, i.e. an appearance branch capturing spatial cues by directly predicting the next frame, and a motion branch focusing on modeling the dynamics via optical flow prediction. Then, to well concretize the object-scene relation, a Relation Learning (RL) module is devised to analyze and summarize the normal relations by introducing the Knowledge Graph Embedding methodology. Specifically in this process, the plausibility of object-scene relation is measured by jointly modeling object/scene features and optimizable object-scene relation maps. Extensive experiments are conducted on three public datasets, and the superior performance over the state-of-the-art methods demonstrates the effectiveness of our method.
翻译:异常识别高度取决于对象和场景之间的关系,因为同一/不同场景的不同/相同对象动作可能导致不同程度的正常和异常。 因此,对象-表面关系实际上在异常探测中起着关键作用,但在先前的作品中没有得到充分探讨。 在本文中,我们建议建立一个空间-时际关系学习框架,以解决视频异常探测任务。 首先,考虑到对象的动态特性以及场景区域,我们建立一个Spatio-Temporal Auto-Encoder(STAE)模块,以共同利用空间和时间演变模式进行演示学习。为了改进模式的提取,在STAE模块中设计了两个解码分支,即一个外观分支,通过直接预测下一个框架来捕捉空间提示,另一个运动分支则侧重于通过光流预测来模拟动态。随后,为了更好地将对象-环境关系混为一谈,我们设计了一个 Relational-L(RL)模块,通过引入知识-嵌入式模型来分析和总结正常关系。 具体地说,在这一过程中,以可测量的精确度-直观的天标/直观关系的方法展示了我们测量的天体- 。