Video anomaly detection aims to find the events in a video that do not conform to the expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, the error is highly dependent on the local context of the current snippet and lacks the understanding of normality. To address this issue, we propose to detect anomalous events not only by the local context, but also according to the consistency between the testing event and the knowledge about normality from the training data. Concretely, we propose a novel two-stream framework based on context recovery and knowledge retrieval, where the two streams can complement each other. For the context recovery stream, we propose a spatiotemporal U-Net which can fully utilize the motion information to predict the future frame. Furthermore, we propose a maximum local error mechanism to alleviate the problem of large recovery errors caused by complex foreground objects. For the knowledge retrieval stream, we propose an improved learnable locality-sensitive hashing, which optimizes hash functions via a Siamese network and a mutual difference loss. The knowledge about normality is encoded and stored in hash tables, and the distance between the testing event and the knowledge representation is used to reveal the probability of anomaly. Finally, we fuse the anomaly scores from the two streams to detect anomalies. Extensive experiments demonstrate the effectiveness and complementarity of the two streams, whereby the proposed two-stream framework achieves state-of-the-art performance on four datasets.
翻译:视频异常检测的目的是在不符合预期行为的视频中找到事件。 流行的方法主要是通过片段重建或未来框架预测错误来检测异常现象。 但是, 错误高度取决于当前片段的当地背景, 缺乏对正常性的理解。 为了解决这一问题, 我们提议不仅根据当地背景, 并且根据测试活动与培训数据对正常性的认识的一致性来检测异常事件。 具体地说, 我们提议了一个基于背景恢复和知识检索的新颖的双流框架, 两个流可以相互补充。 对于背景恢复流, 我们提议了一个可全面利用运动信息预测未来框架的超时U- Net 。 此外, 我们提出一个最大本地错误机制, 以缓解由复杂地面物体造成的大规模恢复错误问题。 对于知识回收流, 我们建议改进可学习的对地点敏感的散列, 通过Siamese网络优化拟议的功能, 相互差异损失。 关于正常性的知识被编码并存储在两个背景恢复流中。 对于背景的系统, 我们提议了一个可充分使用移动 UNet 的信息, 并且 测试两个周期中的数据流之间的距离 。 我们测量了两个变相的概率 。