分阶段分阶段选举(STEPS)及其在区别对待的私下发布和分析青年选民登记数据方面的应用 (STatistical Election to Partition Sequentially (STEPS) and Its Application in Differentially Private Release and Analysis of Youth Voter Registration Data)

2018 年 3 月 18 日

STatistical Election to Partition Sequentially (STEPS) and Its Application in Differentially Private Release and Analysis of Youth Voter Registration Data

翻译：分阶段分阶段选举(STEPS)及其在区别对待的私下发布和分析青年选民登记数据方面的应用

Claire McKay Bowen,Fang Liu

from arxiv, 22 pages, 2 figures

Voter data is important in political science research and applications such as improving youth voter turnout. Privacy protection is imperative in voter data since it often contains sensitive individual information. Differential privacy (DP) formalizes privacy in probabilistic terms and provides a robust concept for privacy protection. DIfferentially Private Data Synthesis (DIPS) techniques produce synthetic data in the DP setting. However, statistical efficiency of the synthetic data via DIPS can be low due to the potentially large amount of noise injected to satisfy DP, especially in high-dimensional data. We propose a new DIPS approach STatistical Election to Partition Sequentially (STEPS) that sequentially partitions data by attributes per their differentiability of the data variability. Additionally, we propose a metric SPECKS that effectively assesses the similarity of synthetic data to the actual data. The application of the STEPS procedure on the 2000-2012 Current Population Survey youth voter data suggests STEPS is easy to implement and better preserves the original information than some DIPS approaches including the Laplace mechanism on the full cross-tabulation of the data and the hierarchical histograms generated via random partitioning.

翻译：选民数据在政治科学研究和应用(如提高青年投票率)中很重要。隐私保护在选民数据中至关重要,因为它往往包含敏感的个人信息。不同隐私(DP)以概率化的术语将隐私正规化,并为隐私保护提供一个强有力的概念。不同私隐(DP)技术在DP环境中生成合成数据。然而,由于可能为满足DP而注入大量噪音,特别是高维数据,通过DIPS合成数据的统计效率可能较低。我们提议采用新的DIPS方法,按数据变异性的不同性按属性顺序分割数据。此外,我们提议采用标准SPECS,有效评估合成数据与实际数据的相似性。在2000-2012年当前人口调查中应用STEPS程序表明,STEPS较某些DIPS方法,包括全面交叉记录数据和通过随机分区生成的等级直方图的Laplace机制,更容易实施和更好地保存原始信息。