This paper is motivated by applications of a Census Bureau interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual. The released information can be the number of individuals living alone, the number of cars they own, or their salary brackets. Recent events have identified some of the privacy challenges faced by these organizations. To address them, this paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals. The counts are reported at multiple granularities (e.g., the national, state, and county levels) and must be consistent across all levels. The core of the mechanism is an optimization model that redistributes the noise introduced to achieve differential privacy in order to meet the consistency constraints between the hierarchical levels. The key technical contribution of the paper shows that this optimization problem can be solved in polynomial time by exploiting the structure of its cost functions. Experimental results on very large, real datasets show that the proposed mechanism provides improvements of up to two orders of magnitude in terms of computational efficiency and accuracy with respect to other state-of-the-art techniques.
翻译:本文的动因是致力于公布关于大量人口的社会经济数据而没有透露任何个人敏感信息的普查局的申请。发布的信息可以是单独生活的人数、他们拥有的汽车数量或工资括号。最近的事件查明了这些组织所面临的一些隐私挑战。为了解决这些问题,本文件介绍了一种创新的差别财产机制,用以释放个人等级数字。计数是在多个颗粒上(例如国家、州和县各级)报告的,必须在所有级别上一致。该机制的核心是一种优化模式,即重新分配为实现不同隐私而引入的噪音,以满足等级之间的一致性限制。论文的主要技术材料表明,这一优化问题可以通过利用成本功能的结构在多元时间内解决。关于非常庞大、真实的数据集的实验结果表明,拟议的机制在计算效率和准确性方面对其他州级技术提高到了两个层次。