A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and interpretation of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, na\"{i}ve release and sharing of the information can be associated with serious privacy concerns. In this study, we use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply. Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.
翻译:在COVID-19大流行期间收集了大量各类数据,这些数据的分析和解释对于遏制该疾病的传播是必不可少的,随着该流行病蔓延到一个地方性国家,该流行病期间收集的数据将继续是进一步研究和了解该流行病对我国社会各个方面的影响的丰富来源。另一方面,公布和分享信息可与严重的隐私问题相联系。在这项研究中,我们使用该大流行期间收集的三种共同但不同的数据类型(个案监视表数据、案例定位数据和联系追踪网络)来说明以隐私保护方式公布和分享颗粒信息和个人一级大流行病数据的情况。我们利用和发扬差异隐私权的概念,为每种数据类型生成和发布隐私保护数据。我们调查通过在不同层次的隐私权保障的模拟研究来推断隐私保护信息是否有用,并展示真实生活数据中的方法。所有研究采用的方法都是直截了当的。我们的研究产生了关于以隐私保障方式分享大流行病数据的实际可行性和如何在公布过程中保持统计用途的平衡的统计效用的统计证据。