Conventional fake video detection methods outputs a possibility value or a suspected mask of tampering images. However, such unexplainable results cannot be used as convincing evidence. So it is better to trace the sources of fake videos. The traditional hashing methods are used to retrieve semantic-similar images, which can't discriminate the nuances of the image. Specifically, the sources tracing compared with traditional video retrieval. It is a challenge to find the real one from similar source videos. We designed a novel loss Hash Triplet Loss to solve the problem that the videos of people are very similar: the same scene with different angles, similar scenes with the same person. We propose Vision Transformer based models named Video Tracing and Tampering Localization (VTL). In the first stage, we train the hash centers by ViTHash (VTL-T). Then, a fake video is inputted to ViTHash, which outputs a hash code. The hash code is used to retrieve the source video from hash centers. In the second stage, the source video and fake video are inputted to generator (VTL-L). Then, the suspect regions are masked to provide auxiliary information. Moreover, we constructed two datasets: DFTL and DAVIS2016-TL. Experiments on DFTL clearly show the superiority of our framework in sources tracing of similar videos. In particular, the VTL also achieved comparable performance with state-of-the-art methods on DAVIS2016-TL. Our source code and datasets have been released on GitHub: \url{https://github.com/lajlksdf/vtl}.
翻译:常规的假视频检测方法输出一种可能性值或被怀疑的篡改图像掩码。 但是, 无法用这种无法解释的结果作为令人信服的证据。 因此, 我们最好去追踪假视频的来源。 传统的散列方法用来检索语义相似的图像, 这无法区分图像的细微差别。 具体地说, 与传统视频检索相比, 追查来源与传统视频检索。 从类似来源视频中找到真实的源代码是一个挑战 。 我们设计了一个新颖的丢失 Hash Triplet Loss 来解决人们的视频非常相似的问题 : 同一场景, 与同一个人相似的场景 。 因此, 我们建议基于愿景的变换模型, 名为 Videove Transport and Tappering Concilation (VTL) 。 在第一阶段, 我们用 VTLL-L 格式构建了一个假的图像中心 。 在第二个阶段, 我们的图像源代码和假的版本中, 我们的DFS- tavi 提供了可比较的 DL 数据 。 在 DVL 框架中, 我们的模型中, 提供了可比较的 DVL 和 DFT 的原始数据 。