Local differential privacy (LDP), a state-of-the-art technique for privacy preservation, has been successfully deployed in a few real-world applications. In the future, LDP can be adopted to anonymize richer user data attributes that will be input to more sophisticated machine learning (ML) tasks. However, today's LDP approaches are largely task-agnostic and often lead to sub-optimal performance -- they will simply inject noise to all data attributes according to a given privacy budget, regardless of what features are most relevant for an ultimate task. In this paper, we address how to significantly improve the ultimate task performance for multi-dimensional user data by considering a task-aware privacy preservation problem. The key idea is to use an encoder-decoder framework to learn (and anonymize) a task-relevant latent representation of user data, which gives an analytical near-optimal solution for a linear setting with mean-squared error (MSE) task loss. We also provide an approximate solution through a learning algorithm for general nonlinear cases. Extensive experiments demonstrate that our task-aware approach significantly improves ultimate task accuracy compared to a standard benchmark LDP approach while guaranteeing the same level of privacy.
翻译:本地差异隐私(LDP)是保护隐私的最先进技术,在少数现实世界应用中成功应用了这种保护隐私的最先进技术。今后,LDP可以被采用,将更丰富的用户数据属性匿名化,用于更复杂的机器学习(ML)任务。然而,今天的LDP方法主要是任务不可知性,往往导致亚最佳性能 -- -- 它们只是根据特定隐私预算对所有数据属性注入噪音,而不论哪些特征对最终任务最为相关。本文还探讨了如何通过考虑注意任务隐私保护问题,大大改进多维用户数据的最终任务性能。关键的想法是使用一个编码器-解密器框架来学习(和匿名)与任务相关的用户数据潜在代表性,从而提供分析中度错误(MSE)任务损失的线性设置的近于最佳性的解决办法。我们还通过普通非线性案例的学习算法提供一种近似的解决办法。广泛的实验表明,我们的任务认知方法在保证最终的保密性定标度的同时,也大大改进了标准的精确性能。