Large genomic datasets are now created through numerous activities, including recreational genealogical investigations, biomedical research, and clinical care. At the same time, genomic data has become valuable for reuse beyond their initial point of collection, but privacy concerns often hinder access. Over the past several years, Beacon services have emerged to broaden accessibility to such data. These services enable users to query for the presence of a particular minor allele in a private dataset, information that can help care providers determine if genomic variation is spurious or has some known clinical indication. However, various studies have shown that even this limited access model can leak if individuals are members in the underlying dataset. Several approaches for mitigating this vulnerability have been proposed, but they are limited in that they 1) typically rely on heuristics and 2) offer probabilistic privacy guarantees, but neglect utility. In this paper, we present a novel algorithmic framework to ensure privacy in a Beacon service setting with a minimal number of query response flips (e.g., changing a positive response to a negative). Specifically, we represent this problem as combinatorial optimization in both the batch setting (where queries arrive all at once), as well as the online setting (where queries arrive sequentially). The former setting has been the primary focus in prior literature, whereas real Beacons allow sequential queries, motivating the latter investigation. We present principled algorithms in this framework with both privacy and, in some cases, worst-case utility guarantees. Moreover, through an extensive experimental evaluation, we show that the proposed approaches significantly outperform the state of the art in terms of privacy and utility.
翻译:大型基因组数据集现在通过许多活动产生,包括娱乐基因调查、生物医学研究和临床护理。与此同时,基因组数据在最初的收集点之外变得对再利用很有价值,但隐私问题往往阻碍获取。在过去几年里,Beacon服务已经出现,以扩大此类数据的可获取性。这些服务使用户能够查询某个小范围存在于私人数据集中的情况,这些信息可以帮助护理提供者确定基因组变异是否是虚假的或有一些已知的临床迹象。然而,各种研究表明,即使个人是基本数据集的成员,这种有限的获取模式也可能泄漏。提出了减轻这种脆弱性的若干办法,但其中有一些是有限的。在过去几年里,Beacon服务提供了一种稳定性隐私保障,但忽略了效用。在本文中,我们提出了一个新的算法框架以确保Beacon服务中的隐私,其查询次数最少(例如,改变对一个负面的正面反应)。具体地说,如果个人是基本数据集的成员,那么,我们就会把这个问题当作一种可追溯的通用性优化的方法来进行。在最初的顺序调查中,我们先先在头的顺序调查中,然后才开始研究,先是进行。