Abnormal event detection in video is a challenging vision problem. Most existing approaches formulate abnormal event detection as an outlier detection task, due to the scarcity of anomalous data during training. Because of the lack of prior information regarding abnormal events, these methods are not fully-equipped to differentiate between normal and abnormal events. In this work, we formalize abnormal event detection as a one-versus-rest binary classification problem. Our contribution is two-fold. First, we introduce an unsupervised feature learning framework based on object-centric convolutional auto-encoders to encode both motion and appearance information. Second, we propose a supervised classification approach based on clustering the training samples into normality clusters. A one-versus-rest abnormal event classifier is then employed to separate each normality cluster from the rest. For the purpose of training the classifier, the other clusters act as dummy anomalies. During inference, an object is labeled as abnormal if the highest classification score assigned by the one-versus-rest classifiers is negative. Comprehensive experiments are performed on four benchmarks: Avenue, ShanghaiTech, UCSD and UMN. Our approach provides superior results on all four data sets. On the large-scale ShanghaiTech data set, our method provides an absolute gain of 12.1% in terms of frame-level AUC compared to the state-of-the-art method [Liu et al., CVPR 2018].
翻译:视频中异常事件检测是一个具有挑战性的视觉问题。 由于培训期间缺少异常数据,大多数现有方法将异常事件检测作为一种异常检测任务,作为异常检测任务。由于缺乏关于异常事件的先前信息,这些方法无法完全区分正常事件和异常事件。在这项工作中,我们将异常事件检测正规化为一反反反的二进制分类问题。我们的贡献是双重的。首先,我们引入了一个基于以目标为中心的共振动自动编码器来编码运动和外观信息的不受监督的特征学习框架。第二,我们提出了一种监督的分类方法,将培训样本组合成正常情况组。由于缺少关于异常事件的先前信息,这些方法没有完全能够区分正常事件和异常事件。在培训分类器中,我们将异常事件检测作为一反反反双双双的二进制。在推断中,如果一反向共振荡式自动分解器指定的最高分数是否定的,则被标为不正常的。在四个基准上进行了全面实验:大道、上海科技、UCSDSD和UMT。我们的方法提供了四级A的高级数据框架。