Bayesian inference provides a principled framework for learning from complex data and reasoning under uncertainty. It has been widely applied in machine learning tasks such as medical diagnosis, drug design, and policymaking. In these common applications, the data can be highly sensitive. Differential privacy (DP) offers data analysis tools with powerful worst-case privacy guarantees and has been developed as the leading approach in privacy-preserving data analysis. In this paper, we study Metropolis-Hastings (MH), one of the most fundamental MCMC methods, for large-scale Bayesian inference under differential privacy. While most existing private MCMC algorithms sacrifice accuracy and efficiency to obtain privacy, we provide the first exact and fast DP MH algorithm, using only a minibatch of data in most iterations. We further reveal, for the first time, a three-way trade-off among privacy, scalability (i.e. the batch size), and efficiency (i.e. the convergence rate), theoretically characterizing how privacy affects the utility and computational cost in Bayesian inference. We empirically demonstrate the effectiveness and efficiency of our algorithm in various experiments.
翻译:Bayesian推论为在不确定情况下从复杂的数据和推理中学习提供了一个原则性框架,在医学诊断、药物设计和决策等机器学习任务中广泛应用了这一框架。在这些共同应用中,数据可以是高度敏感的。不同隐私(DP)提供数据分析工具,提供最坏的隐私保障,并作为保护隐私数据分析的主要方法加以开发。在本文中,我们研究了Meopolis-Hastings(MH)这一最基本的MCMC方法之一,用于在不同的隐私下进行大规模Bayesian推理。虽然大多数现有的私人MC算法牺牲准确性和效率以获得隐私,但我们提供了第一个精确和快速的DP MH算法,仅使用大多数迭代中的数据的微型组合。我们第一次进一步揭示了隐私、可缩缩缩(即批量大小)和效率(即汇合率)之间的三道交易,从理论上确定隐私如何影响Bayesian推论中的效用和计算成本。我们从实验中从经验上证明了我们的算法的有效性和效率。</s>