Voter data is important in political science research and applications such as improving youth voter turnout. Privacy protection is imperative in voter data since it often contains sensitive individual information. Differential Privacy (DP) formalizes privacy in probabilistic terms and provides a robust concept for privacy protection. DIfferentially Private Data Synthesis (DIPS) techniques produce synthetic data in the DP setting. However, statistical efficiency of the synthetic data via DIPS can be low due to the potentially large amount of noise injected to satisfy DP, especially in high-dimensional data, which is often the case in voter data. We propose a new DIPS approach STatistical Election to Partition Sequentially (STEPS) that sequentially partitions data by attributes per their contribution in explaining the data variability. Additionally, we develop a metric to effectively assess the similarity of synthetic data to the actual data. The application of the STEPS procedure on the 2000-2012 Current Population Survey youth voter data suggests STEPS is easy to implement and preserves the original information better than some DIPS approaches including the Laplace mechanism on the full cross-tabulation of the data and the hierarchical histograms generated via random partitioning.
翻译:选民数据在政治科学研究和应用中很重要,如提高青年选民投票率。隐私保护在选民数据中至关重要,因为它往往包含敏感的个人信息。不同隐私(DP)以概率化的术语将隐私正规化,并为隐私保护提供一个强有力的概念。私人数据合成(DIPS)技术在DP环境中生成合成数据。然而,通过DIPS合成数据的统计效率可能较低,因为注入大量噪音以满足DP的要求,特别是在高维数据方面。我们提议采用新的DIPS系统系统系统系统分层选举方法,按其在解释数据变异方面的贡献按属性顺序分割数据。此外,我们制定一套衡量标准,有效评估合成数据与实际数据的相似性。对2000-2012年当前人口调查中青年选民数据的应用STEPS程序表明,STEPS较一些DIPS系统系统方法,包括全面交叉记录数据及通过随机分区生成的等级图的Laplace机制,更容易实施和保存原始信息。