We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices.
翻译:我们提出一个非常快速的视频异常现象检测框架级模型,该模型通过从多个高度精确的目标级教师模型中提取知识来发现异常现象。为了提高学生的忠诚性,我们通过联合使用标准和对抗性蒸馏法来筛选教师的低分辨率异常现象图,引入了每名教师区分目标和生成异常情况图的对立歧视器。我们根据三个基准(Avenue、上海科技、UCSD Ped2)进行实验,表明我们的方法比最快的竞争方法快7倍以上,比以物体为中心的模型快28至62倍,同时取得与最近方法相似的结果。我们的评估还表明,由于我们的模式以前没有听说过的速度为1480 FPS,因此在速度和准确性之间实现了最佳的权衡。此外,我们开展了一项全面的模拟研究,以证明我们建筑设计选择的合理性。