Semi-supervised video anomaly detection (VAD) methods formulate the task of anomaly detection as detection of deviations from the learned normal patterns. Previous works in the field (reconstruction or prediction-based methods) suffer from two drawbacks: 1) They focus on low-level features, and they (especially holistic approaches) do not effectively consider the object classes. 2) Object-centric approaches neglect some of the context information (such as location). To tackle these challenges, this paper proposes a novel two-stream object-aware VAD method that learns the normal appearance and motion patterns through image translation tasks. The appearance branch translates the input image to the target semantic segmentation map produced by Mask-RCNN, and the motion branch associates each frame with its expected optical flow magnitude. Any deviation from the expected appearance or motion in the inference stage shows the degree of potential abnormality. We evaluated our proposed method on the ShanghaiTech, UCSD-Ped1, and UCSD-Ped2 datasets and the results show competitive performance compared with state-of-the-art works. Most importantly, the results show that, as significant improvements to previous methods, detections by our method are completely explainable and anomalies are localized accurately in the frames.
翻译:半监督的视频异常探测方法(VAD)将异常检测任务作为发现与已学的正常模式的偏差的检测任务。以往的实地工作(重建或预测方法)有两个缺点:(1) 侧重于低层次的特征,它们(特别是整体方法)没有有效地考虑对象类别。(2) 以物体为中心的方法忽视了一些背景信息(如位置)。为了应对这些挑战,本文件提议了一种新的双流天体觉识别VAD方法,通过图像翻译任务学习正常外观和运动模式。外观分支将输入图像转换成由Mask-RCNN和运动分支制作的目标语义分隔图,运动分支将每个框架与预期的光学流量联系起来。任何偏离预测阶段的外观或运动都显示潜在异常的程度。我们评估了我们在上海科技、UCSD-Ped1和UCSD-Ped2上的拟议方法,结果显示与最新工程相比具有竞争性的性能。最重要的是,结果显示,作为对以往方法的精确改进,对地方性框架进行彻底解释。