Mobile apps that use location data are pervasive, spanning domains such as transportation, urban planning and healthcare. Important use cases for location data rely on statistical queries, e.g., identifying hotspots where users work and travel. Such queries can be answered efficiently by building histograms. However, precise histograms can expose sensitive details about individual users. Differential privacy (DP) is a mature and widely-adopted protection model, but most approaches for DP-compliant histograms work in a data-independent fashion, leading to poor accuracy. The few proposed data-dependent techniques attempt to adjust histogram partitions based on dataset characteristics, but they do not perform well due to the addition of noise required to achieve DP. We identify density homogeneity as a main factor driving the accuracy of DP-compliant histograms, and we build a data structure that splits the space such that data density is homogeneous within each resulting partition. We show through extensive experiments on large-scale real-world data that the proposed approach achieves superior accuracy compared to existing approaches.
翻译:使用定位数据的移动应用程序非常普遍,涉及交通、城市规划和医疗保健等各个领域。定位数据的重要使用案例依赖于统计查询,例如,确定用户工作和旅行的热点。这些查询可以通过建立直方图来有效解答。然而,精确的直方图可以暴露个别用户的敏感细节。不同的隐私(DP)是一个成熟和广泛采用的保护模式,但符合DP要求的直方图大多数方法都以数据独立的方式运作,导致准确性差。根据数据集特性调整直方图分区的拟议数据技术很少,但由于实现DP需要增加噪音,结果效果不佳。我们确定密度同质性是驱动DP符合直方图准确性的主要因素,我们建立一个数据结构,使每个数据密度在生成的分区内都具有同一性。我们通过对大规模真实世界数据的广泛实验,显示拟议方法比现有方法更准确。