Video anomaly detection (VAD) aims to detect anomalies that deviate from what is expected. In open-world scenarios, the expected events may change as requirements change. For example, not wearing a mask may be considered abnormal during a flu outbreak but normal otherwise. However, existing methods assume that the definition of anomalies is invariable, and thus are not applicable to the open world. To address this, we propose a novel open-world VAD paradigm with variable definitions, allowing guided detection through user-provided natural language at inference time. This paradigm necessitates establishing a robust mapping from video and textual definition to anomaly scores. Therefore, we propose LaGoVAD (Language-guided Open-world Video Anomaly Detector), a model that dynamically adapts anomaly definitions under weak supervision with two regularization strategies: diversifying the relative durations of anomalies via dynamic video synthesis, and enhancing feature robustness through contrastive learning with negative mining. Training such adaptable models requires diverse anomaly definitions, but existing datasets typically provide labels without semantic descriptions. To bridge this gap, we collect PreVAD (Pre-training Video Anomaly Dataset), the largest and most diverse video anomaly dataset to date, featuring 35,279 annotated videos with multi-level category labels and descriptions that explicitly define anomalies. Zero-shot experiments on seven datasets demonstrate LaGoVAD's SOTA performance. Our dataset and code will be released at https://github.com/Kamino666/LaGoVAD-PreVAD.
翻译:视频异常检测(VAD)旨在检测偏离预期事件的异常行为。在开放世界场景中,预期事件可能随需求变化而改变。例如,流感爆发期间未佩戴口罩可能被视为异常,而在其他情况下则属正常。然而,现有方法假设异常定义是固定不变的,因此不适用于开放世界。为解决此问题,我们提出了一种具有可变定义的新型开放世界VAD范式,允许在推理时通过用户提供的自然语言进行引导检测。该范式需要建立从视频和文本定义到异常评分的稳健映射。为此,我们提出LaGoVAD(语言引导开放世界视频异常检测器),这是一种在弱监督下通过两种正则化策略动态适应异常定义的模型:通过动态视频合成实现异常相对时长的多样化,以及通过负样本挖掘的对比学习增强特征鲁棒性。训练此类自适应模型需要多样化的异常定义,但现有数据集通常仅提供标签而缺乏语义描述。为弥补这一空白,我们收集了PreVAD(预训练视频异常数据集),这是迄今为止规模最大、多样性最丰富的视频异常数据集,包含35,279个标注视频,具有多级类别标签和明确定义异常的描述性文本。在七个数据集上的零样本实验证明了LaGoVAD的SOTA性能。我们的数据集和代码将在https://github.com/Kamino666/LaGoVAD-PreVAD发布。