As tools for content editing mature, and artificial intelligence (AI) based algorithms for synthesizing media grow, the presence of manipulated content across online media is increasing. This phenomenon causes the spread of misinformation, creating a greater need to distinguish between ``real'' and ``manipulated'' content. To this end, we present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated). Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face. VideoSham, on the other hand, contains more diverse, context-rich, and human-centric, high-resolution videos manipulated using a combination of 6 different spatial and temporal attacks. Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham. We performed a user study on Amazon Mechanical Turk with 1200 participants to understand if they can differentiate between the real and manipulated videos in VideoSham. Finally, we dig deeper into the strengths and weaknesses of performances by humans and SOTA-algorithms to identify gaps that need to be filled with better AI algorithms. We present the dataset at https://github.com/adobe-research/VideoSham-dataset.
翻译:随着内容编辑工具的成熟和基于人工智能(AI)的媒体合成工具的成熟和人工智能(AI)的算法的增多,网上媒体中受操纵的内容的存在正在增加。这一现象导致错误信息的传播,造成更需要区分“真实”和“管理”的内容。为此,我们提供由826个视频(413个真实和413个操作的)组成的数据集“视频Sham”。许多现有的深假数据集专门侧重于两类面部操纵 — — 与不同对象的面部互换或改变现有面部。另一方面,视频Sham包含更多样化、内容丰富和以人为中心的高分辨率视频,使用六种不同的空间和时间攻击组合进行操纵。我们的分析显示,最先进的操纵检测算法仅对少数特定攻击起作用,对视频Sham进行比例不高。我们对Amazon Mechanical Turk进行了用户研究,有1200名参与者,以了解他们能否区分视频Sham中真实和被操纵的视频。最后,我们更深入地探索了由人类和SOTAS-com进行更好的数据搜索的优势和弱点。