Video forgery attack threatens the surveillance system by replacing the video captures with unrealistic synthesis, which can be powered by the latest augment reality and virtual reality technologies. From the machine perception aspect, visual objects often have RF signatures that are naturally synchronized with them during recording. In contrast to video captures, the RF signatures are more difficult to attack given their concealed and ubiquitous nature. In this work, we investigate multimodal video forgery attack detection methods using both vision and wireless modalities. Since wireless signal-based human perception is environmentally sensitive, we propose a self-supervised training strategy to enable the system to work without external annotation and thus can adapt to different environments. Our method achieves a perfect human detection accuracy and a high forgery attack detection accuracy of 94.38% which is comparable with supervised methods.
翻译:视频伪造攻击以不切实际的合成取代视频捕捉,从而威胁到监视系统,而这种合成可以用最新的增强现实和虚拟现实技术来驱动。从机器感知方面看,视觉物体往往有在录制期间与它们自然同步的RF信号。与视频捕捉相比,RF信号由于其隐蔽和无处不在的性质而更难攻击。在这项工作中,我们利用视觉和无线模式调查多式视频伪造攻击探测方法。由于无线信号人类感知对环境敏感,我们提出了一个自我监督的培训战略,以使系统在没有外部注释的情况下工作,从而能够适应不同的环境。我们的方法实现了完美的人类探测准确性,伪造攻击探测精确度高达94.38%,这与监督方法相当。