CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the U.S. Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for Poisson-distributed, county-level death counts. In doing so, the requirements for satisfying differential privacy for a given privacy budget can be reduced by several orders of magnitude, thereby leading to substantial improvements in utility. To illustrate our proposed approach, we consider a dataset comprised of over 26,000 cancer-related deaths from the Commonwealth of Pennsylvania belonging to over 47,000 combinations of cause-of-death and demographic variables such as age, race, sex, and county-of-residence and demonstrate the proposed framework's ability to preserve features such as geographic, urban/rural, and racial disparities present in the true data.
翻译:CDC WONDER是国家生命统计系统收集的流行病学数据传播的网上工具,CDC WONDER是一个基于网络的工具,用于传播国家生命统计系统收集的流行病学数据。虽然CDC WONDER以隐私保护为内涵,但并不满足正式的隐私保护,例如有差别的隐私保护,因此很容易受到有针对性的攻击。鉴于在维护基本数据主题的隐私的同时公开提供高质量的公共卫生数据的重要性,我们的目标是改进最近制定的方法的效用,利用公开可得的信息来传播Poisson分布的、有差别的私人合成数据,以缩短合成数据的范围。具体地说,我们利用美国人口统计局和国家死亡报告提供的州一级人口信息,来通报州一级死亡率的先前分布情况,并推断Poisson分布的县一级死亡统计的合理范围。为此,满足特定隐私预算差异隐私要求的要求可以减少几个数量级,从而大大改进我们提出的方法。我们认为,从联邦、人口普查局和国家死亡人口结构的26,000多起与癌症有关的地区死亡率,从联邦、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、等等等等等等等等等的、等等等等等等等等的、等的、等人口、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、州、等