Mobile apps and location-based services generate large amounts of location data that can benefit research on traffic optimization, context-aware notifications and public health (e.g., spread of contagious diseases). To preserve individual privacy, one must first sanitize location data, which is commonly done using the powerful differential privacy (DP) concept. However, existing solutions fall short of properly capturing density patterns and correlations that are intrinsic to spatial data, and as a result yield poor accuracy. We propose a machine-learning based approach for answering statistical queries on location data with DP guarantees. We focus on countering the main source of error that plagues existing approaches (namely, uniformity error), and we design a neural database system that models spatial datasets such that important density and correlation features present in the data are preserved, even when DP-compliant noise is added. We employ a set of neural networks that learn from diverse regions of the dataset and at varying granularities, leading to superior accuracy. We also devise a framework for effective system parameter tuning on top of public data, which helps practitioners set important system parameters without having to expend scarce privacy budget. Extensive experimental results on real datasets with heterogeneous characteristics show that our proposed approach significantly outperforms the state of the art.
翻译:移动应用程序和基于位置的服务产生大量位置数据,有利于交通优化、环境觉悟通知和公共卫生的研究(例如传染病的传播)。为了保护个人隐私,首先必须清洁定位数据,通常使用强大的差异隐私(DP)概念进行。然而,现有解决方案还不足以正确捕捉空间数据固有的密度模式和相关性,从而导致不准确性。我们建议采用基于机械学习的方法,以DP保证的方式回答关于定位数据的统计询问。我们侧重于打击困扰现有方法的主要误差源(即统一错误),我们设计一个神经数据库系统,以模拟空间数据设置,从而保存数据中的重要密度和相关性特征,即使在添加符合DP的噪音时也是如此。我们采用一套神经网络,从不同区域学习数据集和不同微粒,从而导致更准确性。我们还设计了一个框架,用于在公共数据顶部进行有效的系统参数调整,帮助从业者设定重要的系统参数,而不必花费稀缺的隐私预算。在实际数据配置上,用大量实验结果显示我们的拟议变形特征。