The aim of this work is to detect and automatically generate high-level explanations of anomalous events in video. Understanding the cause of an anomalous event is crucial as the required response is dependant on its nature and severity. Recent works typically use object or action classifier to detect and provide labels for anomalous events. However, this constrains detection systems to a finite set of known classes and prevents generalisation to unknown objects or behaviours. Here we show how to robustly detect anomalies without the use of object or action classifiers yet still recover the high level reason behind the event. We make the following contributions: (1) a method using saliency maps to decouple the explanation of anomalous events from object and action classifiers, (2) show how to improve the quality of saliency maps using a novel neural architecture for learning discrete representations of video by predicting future frames and (3) beat the state-of-the-art anomaly explanation methods by 60\% on a subset of the public benchmark X-MAN dataset.
翻译:这项工作的目的是在视频中检测并自动生成异常事件的高层次解释。了解异常事件的原因至关重要,因为所需要的反应取决于其性质和严重程度。最近的工作通常使用对象或行动分类器来检测异常事件并提供标签。然而,这限制了检测系统对已知类别的范围,防止对未知对象或行为的概括化。这里我们展示了如何在不使用对象或行动分类器的情况下强有力地检测异常现象,但仍能恢复事件背后的高层次原因。我们做出以下贡献:(1) 使用突出的地图将异常事件的解释与对象和行动分类器脱钩,(2) 展示如何使用新的神经结构改进突出地图的质量,通过预测未来框架和(3) 在公共基准X-MAN数据集中以60 ⁇ 击打最先进的异常解释方法。