Privacy-preserving genomic data sharing is prominent to increase the pace of genomic research, and hence to pave the way towards personalized genomic medicine. In this paper, we introduce ($\epsilon , T$)-dependent local differential privacy (LDP) for privacy-preserving sharing of correlated data and propose a genomic data sharing mechanism under this privacy definition. We first show that the original definition of LDP is not suitable for genomic data sharing, and then we propose a new mechanism to share genomic data. The proposed mechanism considers the correlations in data during data sharing, eliminates statistically unlikely data values beforehand, and adjusts the probability distributions for each shared data point accordingly. By doing so, we show that we can avoid an attacker from inferring the correct values of the shared data points by utilizing the correlations in the data. By adjusting the probability distributions of the shared states of each data point, we also improve the utility of shared data for the data collector. Furthermore, we develop a greedy algorithm that strategically identifies the processing order of the shared data points with the aim of maximizing the utility of the shared data. Considering the interdependent privacy risks while sharing genomic data, we also analyze the information gain of an attacker about genomes of a donor's family members by observing perturbed data of the genome donor and we propose a mechanism to select the privacy budget (i.e., $\epsilon$ parameter of LDP) of the donor by also considering privacy preferences of her family members. Our evaluation results on a real-life genomic dataset show the superiority of the proposed mechanism compared to the randomized response mechanism (a widely used technique to achieve LDP).
翻译:保护隐私的基因组数据共享是提高基因组研究速度的显著举措,因此也有利于为个人基因组医学铺平道路。在本文中,我们引入了(epsilon, T$)依赖本地差异隐私(LDP)的隐私保护共享相关数据,并在此隐私定义下提议了一个基因组数据共享机制。我们首先表明,LDP的原始定义不适合基因组数据共享,然后我们提出了分享基因组数据的新机制。拟议机制考虑了数据共享期间数据的相关性,预先消除统计上不太可能的数据值,并相应调整了每个共享数据点的概率分布。我们这样做表明,我们可以避免从使用数据中的关联性来推断共享数据点的正确值。我们通过调整每个数据点共享状态的概率分布,我们还改进了共享数据采集器的效用。此外,我们开发了一种贪婪的算法,从战略角度确定共享共享数据点的处理顺序,目的是最大限度地增加每个共享数据点的数值值。我们通过使用捐赠者数据机制来评估一个共同的基因组数据。我们通过对数据库数据进行在线分析,同时通过分析数据进行数据分析,我们使用一个在线数据采集的系统,我们获取了一个数据,我们通过分析一个对数据进行构建数据进行数据的分析分析的系统分析。