The collection of individuals' data has become commonplace in many industries. Local differential privacy (LDP) offers a rigorous approach to preserving privacy whereby the individual privatises their data locally, allowing only their perturbed datum to leave their possession. LDP thus provides a provable privacy guarantee to the individual against both adversaries and database administrators. Existing LDP mechanisms have successfully been applied to low-dimensional data, but in high dimensions the privacy-inducing noise largely destroys the utility of the data. In this work, our contributions are two-fold: first, by adapting state-of-the-art techniques from representation learning, we introduce a novel approach to learning LDP mechanisms. These mechanisms add noise to powerful representations on the low-dimensional manifold underlying the data, thereby overcoming the prohibitive noise requirements of LDP in high dimensions. Second, we introduce a novel denoising approach for downstream model learning. The training of performant machine learning models using collected LDP data is a common goal for data collectors, and downstream model performance forms a proxy for the LDP data utility. Our approach significantly outperforms current state-of-the-art LDP mechanisms.
翻译:个人数据收集在许多行业已变得司空见惯。地方差异隐私(LDP)为维护隐私提供了一种严格的方法,个人在当地将数据私有化,只允许他们处于偏僻的地块离开拥有权。因此,LDP为个人提供了针对对手和数据库管理员的可证实的隐私保障。现有的LDP机制已成功地应用于低维数据,但在高维方面,引起隐私的噪音在很大程度上破坏了数据的效用。在这项工作中,我们的贡献有两方面:首先,通过调整代表性学习中最先进的技术,我们引入了学习LDP机制的新颖方法。这些机制为数据基础的低维维维的多维度代表增加了声音,从而克服了LDP高维度的噪音要求。第二,我们为下游示范学习引入了一种新的分层方法。使用收集到的LDP数据进行性机器学习模型培训是数据收集者的一个共同目标,下游模型表现为LDP数据效用的代用。我们的方法大大超越了目前水平的LDP机制。