Automated video surveillance with Large Vision-Language Models is limited by their inherent bias towards normality, often failing to detect crimes. While Chain-of-Thought reasoning strategies show significant potential for improving performance in language tasks, the lack of inductive anomaly biases in their reasoning further steers the models towards normal interpretations. To address this, we propose Chain-of-Anomaly-Thoughts (CoAT), a multi-agent reasoning framework that introduces inductive criminal bias in the reasoning process through a final, anomaly-focused classification layer. Our method significantly improves Anomaly Detection, boosting F1-score by 11.8 p.p. on challenging low-resolution footage and Anomaly Classification by 3.78 p.p. in high-resolution videos.
翻译:基于大型视觉语言模型的自动化视频监控受限于其固有的正常性偏差,往往难以检测犯罪行为。虽然思维链推理策略在提升语言任务性能方面展现出显著潜力,但其推理过程中缺乏归纳性异常偏差,进一步将模型导向正常性解释。为此,我们提出异常思维链——一种多智能体推理框架,通过最终聚焦异常的分类层在推理过程中引入归纳性犯罪偏差。该方法显著提升了异常检测性能,在低分辨率监控视频上将F1分数提升11.8个百分点,在高分辨率视频中将异常分类准确率提升3.78个百分点。