We provide a general framework for privacy-preserving variational Bayes (VB) for a large class of probabilistic models, called the conjugate exponential (CE) family. Our primary observation is that when models are in the CE family, we can privatise the variational posterior distributions simply by perturbing the expected sufficient statistics of the complete-data likelihood. For widely used non-CE models with binomial likelihoods, we exploit the P{\'o}lya-Gamma data augmentation scheme to bring such models into the CE family, such that inferences in the modified model resemble the private variational Bayes algorithm as closely as possible. The iterative nature of variational Bayes presents a further challenge since iterations increase the amount of noise needed. We overcome this by combining: (1) a relaxed notion of differential privacy, called concentrated differential privacy, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets.
翻译:我们为大量概率模型(称为共变指数(CE)家庭)的隐私保护变异贝亚(VB)提供了一个总体框架。我们的主要观察是,当模型存在于CE家庭时,我们只需破坏完整数据可能性的预期充足统计数据,就可以使变异后子分布私有化。对于广泛使用的具有二元可能性的非CE模型,我们利用P'o'lya-Gamma数据增强计划将这种模型带入CE家庭,例如,修改模型中的推断尽可能类似于私人变异贝亚算法。变异贝亚的迭接性质带来了进一步的挑战,因为变异贝亚增加了所需的噪音数量。我们克服了这一点,我们结合了:(1) 差异隐私概念的放松,称之为集中差异隐私,这为多种VBelteration的隐私成本提供了紧密的束缚,从而大大降低了添加性噪音的数量;(2) 从大规模数据中抽取的微网的微网的隐私放大效应,包括在Stochrequestation Resulate Resulations Resulations)中,我们的经验性地展示了在Slical-chailtical Ress Resulations assevulational assevactal asseval asseval assevulation。