This paper investigates the performance of diffusion models for video anomaly detection (VAD) within the most challenging but also the most operational scenario in which the data annotations are not used. As being sparse, diverse, contextual, and often ambiguous, detecting abnormal events precisely is a very ambitious task. To this end, we rely only on the information-rich spatio-temporal data, and the reconstruction power of the diffusion models such that a high reconstruction error is utilized to decide the abnormality. Experiments performed on two large-scale video anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models while in some cases our method achieves better scores than the more complex models. This is the first study using a diffusion model and examining its parameters' influence to present guidance for VAD in surveillance scenarios.
翻译:本文研究了扩散模型在视频异常检测(VAD)中的性能,针对数据注释不可用,这是最具挑战性但也是最操作的场景。由于数据稀少、多样、上下文相关且常常不明确,精确地检测异常事件是一个非常宏大的任务。为此,我们仅依赖于信息丰富的时空数据和扩散模型的重构能力,通过高重构误差来判断异常性。在两个大规模视频异常检测数据集上进行的实验表明,所提出的方法在超出现有生成模型的情况下始终保持着卓越的表现,甚至在某些情况下比更复杂的模型获得更好的分数。这是第一篇使用扩散模型并检查其参数影响的研究,为监视场景下的VAD提供了指导。