Weakly Supervised Video Anomaly Detection (WSVAD) has achieved notable advancements, yet existing models remain vulnerable to adversarial attacks, limiting their reliability. Due to the inherent constraints of weak supervision, where only video-level labels are provided despite the need for frame-level predictions, traditional adversarial defense mechanisms, such as adversarial training, are not effective since video-level adversarial perturbations are typically weak and inadequate. To address this limitation, pseudo-labels generated directly from the model can enable frame-level adversarial training; however, these pseudo-labels are inherently noisy, significantly degrading performance. We therefore introduce a novel Pseudo-Anomaly Generation method called Spatiotemporal Region Distortion (SRD), which creates synthetic anomalies by applying severe augmentations to localized regions in normal videos while preserving temporal consistency. Integrating these precisely annotated synthetic anomalies with the noisy pseudo-labels substantially reduces label noise, enabling effective adversarial training. Extensive experiments demonstrate that our method significantly enhances the robustness of WSVAD models against adversarial attacks, outperforming state-of-the-art methods by an average of 71.0\% in overall AUROC performance across multiple benchmarks. The implementation and code are publicly available at https://github.com/rohban-lab/FrameShield.
翻译:弱监督视频异常检测(WSVAD)已取得显著进展,但现有模型仍易受对抗攻击,限制了其可靠性。由于弱监督的固有约束——仅提供视频级标签却需要帧级预测——传统的对抗防御机制(如对抗训练)并不有效,因为视频级对抗扰动通常较弱且不足。为应对此局限,直接从模型生成的伪标签可实现帧级对抗训练;然而,这些伪标签本质上是噪声的,会显著降低性能。因此,我们引入一种称为时空区域扭曲(SRD)的新型伪异常生成方法,该方法通过对正常视频中的局部区域施加剧烈增强来创建合成异常,同时保持时间一致性。将这些精确标注的合成异常与噪声伪标签相结合,可大幅减少标签噪声,从而实现有效的对抗训练。大量实验表明,我们的方法显著增强了WSVAD模型对抗攻击的鲁棒性,在多个基准测试中,整体AUROC性能平均优于最先进方法71.0%。实现代码已在https://github.com/rohban-lab/FrameShield公开。