A huge amount of data of various types are collected during the COVID-19 pandemic, the analysis and interpretation of which have been indispensable for curbing the spread of the coronavirus. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, na\"{i}ve release and sharing of the information can be associated with serious privacy concerns. In this study, we use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to demonstrate publication and sharing of granular information or individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. All the approaches employed in the study are straightforward to apply. We investigate the inferential utility of the privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the applications of approaches with examples and real-life pandemic data. Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process.
翻译:在COVID-19大流行期间,收集了大量各种类型的数据,这些数据的分析和解释对于遏制冠状病毒的传播是不可或缺的。随着该大流行蔓延到一个地方性国家,该大流行期间收集的数据将继续是进一步研究和了解该大流行对我国社会各个方面的影响的丰富来源。另一方面,公布和分享信息可以与严重的隐私问题相联系。在这项研究中,我们利用该大流行期间收集的三种共同但不同的数据类型(病例监视表数据、案例定位数据和联系追踪网络)来证明以保持隐私的方式公布和分享颗粒信息或个人一级大流行数据。我们利用和发扬不同的隐私概念,为每一类数据生成和发布隐私保护数据。研究中采用的所有方法都与严重的隐私问题有关。我们通过对不同程度的隐私保障进行模拟研究来调查保留隐私信息的内在效用,并用实例和真实的大流行病数据追踪网络展示各种办法的应用情况。我们的研究为在分享这一大流行流行病数据时如何以维护隐私的实用性方法取得统计证据。