Several companies (e.g., Meta, Google) have initiated "data-for-good" projects where aggregate location data are first sanitized and released publicly, which is useful to many applications in transportation, public health (e.g., COVID-19 spread) and urban planning. Differential privacy (DP) is the protection model of choice to ensure the privacy of the individuals who generated the raw location data. However, current solutions fail to preserve data utility when each individual contributes multiple location reports (i.e., under user-level privacy). To offset this limitation, public releases by Meta and Google use high privacy budgets (e.g., $\epsilon$=10-100), resulting in poor privacy. We propose a novel approach to release spatio-temporal data privately and accurately. We employ the pattern recognition power of neural networks, specifically variational auto-encoders (VAE), to reduce the noise introduced by DP mechanisms such that accuracy is increased, while the privacy requirement is still satisfied. Our extensive experimental evaluation on real datasets shows the clear superiority of our approach compared to benchmarks.
翻译:一些公司(如Meta、Google)启动了“数据换好”项目,这些项目首先清理并公开公布综合定位数据,这对交通、公共卫生(如COVID-19扩散)和城市规划方面的许多应用都有用。差异隐私(DP)是保护选择模式,以确保原始定位数据生成者的隐私。然而,当每个人提供多个位置报告(即用户一级的隐私)时,目前的解决方案无法维护数据效用。为抵消这一限制,Meta和Google的公共发布数据使用了高隐私预算(如$\epsilon=10-100),从而导致隐私不良。我们提议了一种新的办法,私下和准确地发布随机数据。我们使用神经网络的模式识别能力,特别是变异式自动摄像器(VAE),以减少DP机制带来的噪音,从而提高准确性,同时满足隐私要求。我们对真实数据集的广泛实验性评估表明,我们的方法相对于基准而言显然优于优势。