Realistic fake videos are a potential tool for spreading harmful misinformation given our increasing online presence and information intake. This paper presents a multimodal learning-based method for detection of real and fake videos. The method combines information from three modalities - audio, video, and physiology. We investigate two strategies for combining the video and physiology modalities, either by augmenting the video with information from the physiology or by novelly learning the fusion of those two modalities with a proposed Graph Convolutional Network architecture. Both strategies for combining the two modalities rely on a novel method for generation of visual representations of physiological signals. The detection of real and fake videos is then based on the dissimilarity between the audio and modified video modalities. The proposed method is evaluated on two benchmark datasets and the results show significant increase in detection performance compared to previous methods.
翻译:鉴于我们日益扩大的在线存在和信息收集,真实假冒视频是传播有害错误信息的潜在工具。本文介绍了一种基于多式联运的检测真实和假冒视频的学习方法。该方法将三种模式----音频、视频和生理学----的信息结合起来。我们调查了两种将视频和生理学模式相结合的战略,一种是用生理学的信息来补充视频,另一种是新颖地学习这两种模式与拟议的图表革命网络结构相结合。两种模式相结合的两种模式的结合战略都依赖于生成生理信号视觉表现的新方法。然后,对真实和假冒视频的检测基于音频和修改视频模式之间的差异。拟议方法在两个基准数据集上进行了评估,结果显示检测性能与以往方法相比有了显著提高。