An ability to share data, even in aggregated form, is critical to advancing both conventional and data science. However, insofar as such datasets are comprised of individuals, their membership in these datasets is often viewed as sensitive, with membership inference attacks (MIAs) threatening to violate their privacy. We propose a Bayesian game model for privacy-preserving publishing of data-sharing mechanism outputs (for example, summary statistics for sharing genomic data). In this game, the defender minimizes a combination of expected utility and privacy loss, with the latter being maximized by a Bayes-rational attacker. We propose a GAN-style algorithm to approximate a Bayes-Nash equilibrium of this game, and introduce the notions of Bayes-Nash generative privacy (BNGP) and Bayes generative privacy (BGP) risk that aims to optimally balance the defender's privacy and utility in a way that is robust to the attacker's heterogeneous preferences with respect to true and false positives. We demonstrate the properties of composition and post-processing for BGP risk and establish conditions under which BNGP and pure differential privacy (PDP) are equivalent. We apply our method to sharing summary statistics, where MIAs can re-identify individuals even from aggregated data. Theoretical analysis and empirical results demonstrate that our Bayesian game-theoretic method outperforms state-of-the-art approaches for privacy-preserving sharing of summary statistics.
翻译:暂无翻译