The advent of online genomic data-sharing services has sought to enhance the accessibility of large genomic datasets by allowing queries about genetic variants, such as summary statistics, aiding care providers in distinguishing between spurious genomic variations and those with clinical significance. However, numerous studies have demonstrated that even sharing summary genomic information exposes individual members of such datasets to a significant privacy risk due to membership inference attacks. While several approaches have emerged that reduce privacy risks by adding noise or reducing the amount of information shared, these typically assume non-adaptive attacks that use likelihood ratio test (LRT) statistics. We propose a Bayesian game-theoretic framework for optimal privacy-utility tradeoff in the sharing of genomic summary statistics. Our first contribution is to prove that a very general Bayesian attacker model that anchors our game-theoretic approach is more powerful than the conventional LRT-based threat models in that it induces worse privacy loss for the defender who is modeled as a von Neumann-Morgenstern (vNM) decision-maker. We show this to be true even when the attacker uses a non-informative subjective prior. Next, we present an analytically tractable approach to compare the Bayesian attacks with arbitrary subjective priors and the Neyman-Pearson optimal LRT attacks under the Gaussian mechanism common in differential privacy frameworks. Finally, we propose an approach for approximating Bayes-Nash equilibria of the game using deep neural network generators to implicitly represent player mixed strategies. Our experiments demonstrate that the proposed game-theoretic framework yields both stronger attacks and stronger defense strategies than the state of the art.
翻译:暂无翻译