We can protect user data privacy via many approaches, such as statistical transformation or generative models. However, each of them has critical drawbacks. On the one hand, creating a transformed data set using conventional techniques is highly time-consuming. On the other hand, in addition to long training phases, recent deep learning-based solutions require significant computational resources. In this paper, we propose PrivateSMOTE, a technique designed for competitive effectiveness in protecting cases at maximum risk of re-identification while requiring much less time and computational resources. It works by synthetic data generation via interpolation to obfuscate high-risk cases while minimizing data utility loss of the original data. Compared to multiple conventional and state-of-the-art privacy-preservation methods on 20 data sets, PrivateSMOTE demonstrates competitive results in re-identification risk. Also, it presents similar or higher predictive performance than the baselines, including generative adversarial networks and variational autoencoders, reducing their energy consumption and time requirements by a minimum factor of 9 and 12, respectively.
翻译:我们可以通过统计转型或基因模型等多种方法保护用户数据隐私,但每种方法都有关键的缺点。一方面,利用传统技术创建一套经过改造的数据集非常耗时。另一方面,除了长期培训阶段之外,最近的深层次学习解决方案需要大量的计算资源。在本文件中,我们提议采用PrenceSMOTE这一旨在具有竞争力的技术,在保护具有最大再识别风险、同时需要时间和计算资源的最大风险的情况下保护案件,这种技术具有竞争力,但需要的时间和计算资源要少得多得多。它通过合成数据生成,通过对高风险案例进行内插,同时尽量减少原始数据的数据效用损失。与20套数据中多种传统和最先进的隐私保护方法相比,PrentSMOTE展示了重新识别风险方面的竞争性结果。此外,它提出了类似或更高的预测性业绩,包括基因对抗网络和变式自动组合,分别减少其能源消耗和时间要求9和12个最低系数。