Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly log-concave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problems.
翻译:这些算法不适用于分散化的学习环境,因为一个代理人网络由于隐私原因或通信限制,正在合作学习统计模型的参数,而不分享其个人数据。我们研究两种算法:分散式的SGLD(DE-SGLD)和分散式的SGHMC(DE-SGHMC),这是对SGLD和SGHMC方法的调整,这些算法允许在分散化的环境下对大数据集进行比例化的Bayesian推论。我们表明,当一个代理人网络由于隐私原因或通信限制,正在协力合作,学习统计模型的参数,而不分享其个人数据。我们研究两种算法:分散式的SGLD(DE-SGLD)和分散式的SGHMC(DE-SGHMC),它们是对SGLD和SGHMC方法的调整,这些方法允许在分散化环境中对大数据集进行比例化的推论。我们指出,如果选择了分化式回归率,那么这些算法将线性地集中到2瓦瑟斯坦距离的目标分布区段的附近。