Differential privacy (DP) has arisen as the gold standard in protecting an individual's privacy in datasets by adding calibrated noise to each data sample. While the application to categorical data is straightforward, its usability in the context of images has been limited. Contrary to categorical data the meaning of an image is inherent in the spatial correlation of neighboring pixels making the simple application of noise infeasible. Invertible Neural Networks (INN) have shown excellent generative performance while still providing the ability to quantify the exact likelihood. Their principle is based on transforming a complicated distribution into a simple one e.g. an image into a spherical Gaussian. We hypothesize that adding noise to the latent space of an INN can enable differentially private image modification. Manipulation of the latent space leads to a modified image while preserving important details. Further, by conditioning the INN on meta-data provided with the dataset we aim at leaving dimensions important for downstream tasks like classification untouched while altering other parts that potentially contain identifying information. We term our method content-aware differential privacy (CADP). We conduct experiments on publicly available benchmarking datasets as well as dedicated medical ones. In addition, we show the generalizability of our method to categorical data. The source code is publicly available at https://github.com/Cardio-AI/CADP.
翻译:不同的隐私(DP)是作为保护个人在数据集中的隐私的黄金标准产生的,办法是在每样数据样本中增加校准噪音。虽然对绝对数据的应用是直截了当的,但在图像方面却有限。与绝对数据相反,图像的含义是内在的,而相邻像素的空间关联性则具有内在的意义,使噪音的简单应用变得不可行。不可置疑的神经网络(INN)显示了极好的基因性能,同时仍然提供了量化确切可能性的能力。其原则的基础是将复杂的分布转换成简单的一个,例如将图像转换成一个球形的高斯。我们假设的是,将噪音添加到一个INN的隐蔽空间上,可以促成对私人图像的不同修改。对潜在空间的调整导致图像的修改,同时保留重要的细节。此外,通过对所提供的元数据设置进行调整,我们的目标是为下游任务留下重要的维度,例如分类不受影响,同时改变可能包含识别信息的其它部分。我们用方法表达内容差异的隐私(CADP)。我们假设在INNNE/CA的可公开基准化数据源中进行实验。我们现有的通用数据源码/CLADADADA/CA的精确化方法是我们现有的数据。我们现有的数据来源。我们用来显示的精确性。我们现有的数据源。