This work addresses the problem of anonymizing the identity of faces in a dataset of images, such that the privacy of those depicted is not violated, while at the same time the dataset is useful for downstream task such as for training machine learning models. To the best of our knowledge, we are the first to explicitly address this issue and deal with two major drawbacks of the existing state-of-the-art approaches, namely that they (i) require the costly training of additional, purpose-trained neural networks, and/or (ii) fail to retain the facial attributes of the original images in the anonymized counterparts, the preservation of which is of paramount importance for their use in downstream tasks. We accordingly present a task-agnostic anonymization procedure that directly optimizes the images' latent representation in the latent space of a pre-trained GAN. By optimizing the latent codes directly, we ensure both that the identity is of a desired distance away from the original (with an identity obfuscation loss), whilst preserving the facial attributes (using a novel feature-matching loss in FaRL's deep feature space). We demonstrate through a series of both qualitative and quantitative experiments that our method is capable of anonymizing the identity of the images whilst -- crucially -- better-preserving the facial attributes. We make the code and the pre-trained models publicly available at: https://github.com/chi0tzp/FALCO.
翻译:本文介绍了如何匿名化图像数据集中面部身份信息的问题,如何确保保密性不被侵犯的同时,确保数据集能用于深度学习来训练人工智能模型。为了处理现有技术的两个主要局限性:(i)需要额外的神经网络训练,成本昂贵;(ii)无法保留图像中的面部属性,而这些属性对于下游任务的使用至关重要,我们提出了一种基于GAN预训练的潜在空间属性保留匿名化的方法,该方法不依赖于任务,能够直接优化图像的潜在编码。通过直接优化编码,我们通过一种身份糊化损失确保身份远离原始图像,同时使用FaRL深度特征空间中的一种新颖特征匹配损失来保留面部属性。通过一系列定性和定量实验,我们证明了我们的方法能够匿名化图像数据集的身份信息,并更好地保留面部属性。我们在以下网址公开了源代码和预训练模型:https://github.com/chi0tzp/FALCO。