The collection and sharing of individuals' data has become commonplace in many industries. Local differential privacy (LDP) is a rigorous approach to preserving data privacy even from a database administrator, unlike the more standard central differential privacy. To achieve LDP, one traditionally adds noise directly to each data dimension, but for high-dimensional data the level of noise required for sufficient anonymization all but entirely destroys the data's utility. In this paper, we introduce a novel LDP mechanism that leverages representation learning to overcome the prohibitive noise requirements of direct methods. We demonstrate that, rather than simply estimating aggregate statistics of the privatized data as is the norm in LDP applications, our method enables the training of performant machine learning models. Unique applications of our approach include private novel-class classification and the augmentation of clean datasets with additional privatized features. Methods that rely on central differential privacy are not applicable to such tasks. Our approach achieves significant performance gains on these tasks relative to state-of-the-art LDP benchmarks that noise data directly.
翻译:在许多行业中,收集和分享个人数据已成为司空见惯的现象。地方差异隐私(LDP)是保护数据隐私的严格方法,即使是从数据库管理员那里保护数据隐私,这与较为标准的中央差异隐私不同。为了实现LDP,传统上,一个人在每一个数据层面都直接增加噪音,但对于高维数据来说,足够匿名所需的噪音水平完全破坏了数据的效用。在本文件中,我们引入了一个新的LDP机制,利用代表性学习来克服直接方法中令人望而却步的噪音要求。我们证明,我们的方法不是简单地估计私有化数据的总统计数据,而是像LDP应用程序中通常采用的那样,能够培训性能机器学习模型。我们方法的独特应用包括私人新式分类和增加清洁数据集的私有化特性。依靠中央差异隐私的方法不适用于这类任务。我们的方法在这些任务中取得了与噪音数据的最新LDP基准相比的重大成绩。