Video Anomaly Detection (VAD) has been extensively studied under the settings of One-Class Classification (OCC) and Weakly-Supervised learning (WS), which however both require laborious human-annotated normal/abnormal labels. In this paper, we study Unsupervised VAD (UVAD) that does not depend on any label by combining OCC and WS into a unified training framework. Specifically, we extend OCC to weighted OCC (wOCC) and propose a wOCC-WS interleaving training module, where the two models automatically generate pseudo-labels for each other. We face two challenges to make the combination effective: (1) Models' performance fluctuates occasionally during the training process due to the inevitable randomness of the pseudo labels. (2) Thresholds are needed to divide pseudo labels, making the training depend on the accuracy of user intervention. For the first problem, we propose to use wOCC requiring soft labels instead of OCC trained with hard zero/one labels, as soft labels exhibit high consistency throughout different training cycles while hard labels are prone to sudden changes. For the second problem, we repeat the interleaving training module multiple times, during which we propose an adaptive thresholding strategy that can progressively refine a rough threshold to a relatively optimal threshold, which reduces the influence of user interaction. A benefit of employing OCC and WS methods to compose a UVAD method is that we can incorporate the most recent OCC or WS model into our framework. Experiments demonstrate the effectiveness of the proposed UVAD framework.
翻译:暂无翻译