In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video. Specifically, given an untrimmed video, WSSTAD aims to localize a spatio-temporal tube (i.e., a sequence of bounding boxes at consecutive times) that encloses the abnormal event, with only coarse video-level annotations as supervision during training. To address this challenging task, we propose a dual-branch network which takes as input the proposals with multi-granularities in both spatial-temporal domains. Each branch employs a relationship reasoning module to capture the correlation between tubes/videolets, which can provide rich contextual information and complex entity relationships for the concept learning of abnormal behaviors. Mutually-guided Progressive Refinement framework is set up to employ dual-path mutual guidance in a recurrent manner, iteratively sharing auxiliary supervision information across branches. It impels the learned concepts of each branch to serve as a guide for its counterpart, which progressively refines the corresponding branch and the whole framework. Furthermore, we contribute two datasets, i.e., ST-UCF-Crime and STRA, consisting of videos containing spatio-temporal abnormal annotations to serve as the benchmarks for WSSTAD. We conduct extensive qualitative and quantitative evaluations to demonstrate the effectiveness of the proposed approach and analyze the key factors that contribute more to handle this task.
翻译:在本文中,我们引入了一个新的任务,即监视视频中的“微弱超超强的Spatio-时空异常探测(WSSTAD)”(WSSTAD),在监视视频中称为“微弱超强的Spatio-时空探测(WSSTAD) ” 。具体地说,考虑到一个未剪动的视频,WSSTAD旨在将一个包含异常事件(即连续时间捆绑盒的顺序)的spatio-时空管(即连续串串连框)带管(即连续连续串串联开来)本地化。在培训期间,我们引入了一个新颖的双向双向的定量指导。为了应对这一艰巨的任务,我们建议了一个双向每个分支的双向管理概念,作为对应方的指南,逐步完善相应的分支和整个框架。此外,我们为该结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、结构、