Data privacy is an increasingly important aspect of many real-world big data analytics tasks. Data sources that contain sensitive information may have immense potential which could be unlocked using privacy enhancing transformations, but current methods often fail to produce convincing output. Furthermore, finding the right balance between privacy and utility is often a tricky trade-off. In this work, we propose a novel approach for data privatization, which involves two steps: in the first step, it removes the sensitive information, and in the second step, it replaces this information with an independent random sample. Our method builds on adversarial representation learning which ensures strong privacy by training the model to fool an increasingly strong adversary. While previous methods only aim at obfuscating the sensitive information, we find that adding new random information in its place strengthens the provided privacy and provides better utility at any given level of privacy. The result is an approach that can provide stronger privatization on image data, and yet be preserving both the domain and the utility of the inputs, entirely independent of the downstream task.
翻译:数据隐私是许多真实世界大数据分析任务中日益重要的一个方面。 包含敏感信息的数据来源可能具有巨大的潜力,而使用隐私增强转换,但目前的方法往往无法产生令人信服的产出。 此外,在隐私和公用事业之间找到正确的平衡往往是一个棘手的权衡。 在这项工作中,我们提出数据私有化的新办法,这涉及两个步骤:第一步,它删除敏感信息;第二步,它用独立的随机抽样来取代这些信息。我们的方法基于对抗性代表学习,通过训练模型来欺骗日益强大的对手,确保强大的隐私。虽然以前的方法仅仅旨在混淆敏感信息,但我们发现在其位置上添加新的随机信息会加强提供的隐私,并在任何特定隐私级别提供更好的效用。结果是一种能够提供更强有力的图像数据私有化的方法,并且仍然能够维护域和投入的效用,完全独立于下游任务。