With the increasing ubiquity of cameras and smart sensors, humanity is generating data at an exponential rate. Access to this trove of information, often covering yet-underrepresented use-cases (e.g., AI in medical settings) could fuel a new generation of deep-learning tools. However, eager data scientists should first provide satisfying guarantees w.r.t. the privacy of individuals present in these untapped datasets. This is especially important for images or videos depicting faces, as their biometric information is the target of most identification methods. While a variety of solutions have been proposed to de-identify such images, they often corrupt other non-identifying facial attributes that would be relevant for downstream tasks. In this paper, we propose Disguise, a novel algorithm to seamlessly de-identify facial images while ensuring the usability of the altered data. Unlike prior arts, we ground our solution in both differential privacy and ensemble-learning research domains. Our method extracts and swaps depicted identities with fake ones, synthesized via variational mechanisms to maximize obfuscation and non-invertibility; while leveraging the supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method on multiple datasets, demonstrating higher de-identification rate and superior consistency than prior art w.r.t. various downstream tasks.
翻译:由于相机和智能传感器的普及,人类正在以指数级别产生数据。访问这个海量数据通常关乎仍未得到充分代表的用例(如医疗领域的AI),它可以推动新一代的深度学习工具的发展。然而,热心的数据科学家首先应提供满意的保证,以确保这些未经开发的数据集中个人隐私的安全。这对于描绘人脸的图像或视频尤为重要,因为它们的生物测量信息是大多数识别方法的目标。虽然已经提出了各种解决方案来去识别化这样的图像,但它们经常破坏其他非识别面部属性,这对下游任务是有关的。在本文中,我们提出了一种新的算法Disguise,可以无缝去识别面部图像,同时确保改变后的数据的可用性。与先前的方法不同,我们的解决方案基于差分隐私和集成学习研究领域。我们的方法提取并替换描绘的身份与假身份,这些身份是通过变分机制合成的,以最大化混淆和不可逆性;同时利用专家混合体的监督来区分和保护其他实用属性。我们在多个数据集上进行了广泛的评估,证明了我们的方法在保留各种下游任务方面具有更高的去识别率和优越的一致性,比之前的方法要好。