In recent years, image and video manipulations with DeepFake have become a severe concern for security and society. Therefore, many detection models and databases have been proposed to detect DeepFake data reliably. However, there is an increased concern that these models and training databases might be biased and thus, cause DeepFake detectors to fail. In this work, we tackle these issues by (a) providing large-scale demographic and non-demographic attribute annotations of 41 different attributes for five popular DeepFake datasets and (b) comprehensively analysing AI-bias of multiple state-of-the-art DeepFake detection models on these databases. The investigation analyses the influence of a large variety of distinctive attributes (from over 65M labels) on the detection performance, including demographic (age, gender, ethnicity) and non-demographic (hair, skin, accessories, etc.) information. The results indicate that investigated databases lack diversity and, more importantly, show that the utilised DeepFake detection models are strongly biased towards many investigated attributes. Moreover, the results show that the models' decision-making might be based on several questionable (biased) assumptions, such if a person is smiling or wearing a hat. Depending on the application of such DeepFake detection methods, these biases can lead to generalizability, fairness, and security issues. We hope that the findings of this study and the annotation databases will help to evaluate and mitigate bias in future DeepFake detection techniques. Our annotation datasets are made publicly available.
翻译:近年来,DhiepFake的图像和视频操作已成为安全和社会的严重关切,因此,许多探测模型和数据库被提出来可靠地探测DeepFake数据,然而,人们越来越担心这些模型和培训数据库可能存在偏差,从而导致DhiepFake探测器失败。在这项工作中,我们通过以下方式解决这些问题:(a) 为五种流行的DhiepFake数据集提供大规模的人口和非人口属性说明,共41个不同属性的41个不同属性,用于五种流行的DhiepFake数据集;(b) 全面分析这些数据库中多种最先进的深藏假发现模型的异常偏差。调查分析了大量不同特征(来自65M的标签)对探测性能的影响,包括人口(年龄、性别、族裔)和非人口(头发、皮膚、配件等)信息。结果显示,调查数据库缺乏多样性,更重要的是,显示,利用的DeepFake检测模型的检测模型严重偏向于许多调查属性。此外,现有结果显示,模型的决策可能基于若干可疑的(含偏见的)假设。如果一个人在深度检测中,那么,那么,那么,那么,就会以这种判断性研究的准确性研究,那么,那么,那么,那么,那么,那么,我们就将采用这种判断性研究,那么,那么,那么,那么,那么,那么,那么,就会以这样的方法,那么,那么,那么,那么,那么,那么,那么,那么,那么,那么,那么,那么,我们,就会以这样的一个普通的测错就将使用。