弱监督下的语言引导开放世界视频异常检测 (Language-guided Open-world Video Anomaly Detection under Weak Supervision)

Video anomaly detection (VAD) aims to detect anomalies that deviate from what is expected. In open-world scenarios, the expected events may change as requirements change. For example, not wearing a mask may be considered abnormal during a flu outbreak but normal otherwise. However, existing methods assume that the definition of anomalies is invariable, and thus are not applicable to the open world. To address this, we propose a novel open-world VAD paradigm with variable definitions, allowing guided detection through user-provided natural language at inference time. This paradigm necessitates establishing a robust mapping from video and textual definition to anomaly scores. Therefore, we propose LaGoVAD (Language-guided Open-world Video Anomaly Detector), a model that dynamically adapts anomaly definitions under weak supervision with two regularization strategies: diversifying the relative durations of anomalies via dynamic video synthesis, and enhancing feature robustness through contrastive learning with negative mining. Training such adaptable models requires diverse anomaly definitions, but existing datasets typically provide labels without semantic descriptions. To bridge this gap, we collect PreVAD (Pre-training Video Anomaly Dataset), the largest and most diverse video anomaly dataset to date, featuring 35,279 annotated videos with multi-level category labels and descriptions that explicitly define anomalies. Zero-shot experiments on seven datasets demonstrate LaGoVAD's SOTA performance. Our dataset and code will be released at https://github.com/Kamino666/LaGoVAD-PreVAD.

翻译：视频异常检测（VAD）旨在检测偏离预期事件的异常行为。在开放世界场景中，预期事件可能随需求变化而改变。例如，流感爆发期间未佩戴口罩可能被视为异常，而在其他情况下则属正常。然而，现有方法假设异常定义是固定不变的，因此不适用于开放世界。为解决此问题，我们提出了一种具有可变定义的新型开放世界VAD范式，允许在推理时通过用户提供的自然语言进行引导检测。该范式需要建立从视频和文本定义到异常评分的稳健映射。为此，我们提出LaGoVAD（语言引导开放世界视频异常检测器），这是一种在弱监督下通过两种正则化策略动态适应异常定义的模型：通过动态视频合成实现异常相对时长的多样化，以及通过负样本挖掘的对比学习增强特征鲁棒性。训练此类自适应模型需要多样化的异常定义，但现有数据集通常仅提供标签而缺乏语义描述。为弥补这一空白，我们收集了PreVAD（预训练视频异常数据集），这是迄今为止规模最大、多样性最丰富的视频异常数据集，包含35,279个标注视频，具有多级类别标签和明确定义异常的描述性文本。在七个数据集上的零样本实验证明了LaGoVAD的SOTA性能。我们的数据集和代码将在https://github.com/Kamino666/LaGoVAD-PreVAD发布。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日