A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.
翻译:最近文献中引入了用于视频异常现象检测的自监督多任务学习(SSMTL)框架(SSMTL) 。 由于其高度准确的结果, 该方法吸引了许多研究人员的注意。 在这项工作中, 我们重新审视了自监督多任务学习框架, 提出了对原始方法的若干更新。 首先, 我们研究了各种检测方法, 例如, 利用光学流或背景减法探测高活动区域, 因为我们认为, 目前使用的预培训YOLOv3 是不完美的, 例如, 从未发现来自未知阶级的移动或对象 。 第二, 我们引入多头自我关注模块, 受最近愿景变异者成功启发。 因此, 我们可能会引入 2D 和 3D 革命视觉变异( CvT) 块。 第三, 我们试图进一步改进模型, 我们研究更多的自监督学习任务, 例如通过知识蒸馏、 拼图解、 通过知识蒸馏来估算3D动骨干骨干骨干, 通过知识蒸馏, 预测多头自我关注模块的自我关注模块模块,, 因此, 我们引入了2 的模拟变压数据, 我们的模型 正在不断 学习 我们的模型, 更新的运行中 的运行中 的,, 更新 的 更新 更新 更新 更新 的 的 更新 更新 更新 的 的 的 的 的 更新 更新 更新的 更新的 更新 的 的 更新 的 的 的 。