In this paper, we propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos using multiple instance learning. The proposed approach uses both abnormal and normal video clips during the training phase which is developed in the multiple instance framework where we treat video as a bag and video clips as instances in the bag. Our main contribution lies in the proposed novel approach to consider temporal relations between video instances. We deal with video instances (clips) as a sequential visual data rather than independent instances. We employ a deep temporal and encoder network that is designed to capture spatial-temporal evolution of video instances over time. We also propose a new loss function that is smoother than similar loss functions recently presented in the computer vision literature, and therefore; enjoys faster convergence and improved tolerance to local minima during the training phase. The proposed temporal encoding-decoding approach with modified loss is benchmarked against the state-of-the-art in simulation studies. The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications.
翻译:在本文中,我们建议采用一种监督不力的深时间编码破码办法,用于利用多实例学习在监视录像中发现异常现象。拟议办法在培训阶段使用异常和正常的视频剪辑,这是在多实例框架中开发的,我们把视频当作袋子和视频剪辑作为实例处理。我们的主要贡献在于拟议的新办法,以考虑视频实例之间的时间关系。我们把视频实例(剪辑)作为连续的视觉数据而不是独立实例处理。我们使用一个深时间和编码网络,目的是捕捉视频实例的时空演变。我们还提议一种新的损失函数,比计算机视觉文献中最近出现的类似损失函数更顺畅,因此;在培训阶段,对本地迷你的趋同和耐受度更快。拟议的与修改损失的脱码方法以模拟研究中的最新技术为基准。结果显示,拟议的方法与视频监视应用中异常现象探测的状态解决方案相似或更好。