Local differential privacy (LDP) can be adopted to anonymize richer user data attributes that will be input to sophisticated machine learning (ML) tasks. However, today's LDP approaches are largely task-agnostic and often lead to severe performance loss -- they simply inject noise to all data attributes according to a given privacy budget, regardless of what features are most relevant for the ultimate task. In this paper, we address how to significantly improve the ultimate task performance with multi-dimensional user data by considering a task-aware privacy preservation problem. The key idea is to use an encoder-decoder framework to learn (and anonymize) a task-relevant latent representation of user data. We obtain an analytical near-optimal solution for the linear setting with mean-squared error (MSE) task loss. We also provide an approximate solution through a gradient-based learning algorithm for general nonlinear cases. Extensive experiments demonstrate that our task-aware approach significantly improves ultimate task accuracy compared to standard benchmark LDP approaches with the same level of privacy guarantee.
翻译:本地差异隐私(LDP)可用于将更丰富的用户数据属性匿名化,这些属性将输入复杂的机器学习(ML)任务。然而,今天的LDP方法主要是任务不可知性,往往导致严重性能损失 -- -- 它们只是根据特定隐私预算向所有数据属性注入噪音,而不管哪些特征与最终任务最为相关。在本文件中,我们讨论了如何通过考虑任务认知隐私保护问题,大大改进多维用户数据的最终任务性能。关键的想法是使用编码器解码框架学习(和匿名)与任务相关的用户数据潜在代表。我们获得了具有平均差错(MSE)任务损失的线性设置的近于最佳的分析解决方案。我们还通过基于梯度的学习算法为一般非线性案例提供近似的解决办法。广泛的实验表明,我们的任务认知方法大大改进了最终任务准确性,而标准基准LDP方法则具有同样的隐私保障水平。