Stochastic gradient MCMC methods, such as stochastic gradient Langevin dynamics (SGLD), employ fast but noisy gradient estimates to enable large-scale posterior sampling. Although we can easily extend SGLD to distributed settings, it suffers from two issues when applied to federated non-IID data. First, the variance of these estimates increases significantly. Second, delaying communication causes the Markov chains to diverge from the true posterior even for very simple models. To alleviate both these problems, we propose conducive gradients, a simple mechanism that combines local likelihood approximations to correct gradient updates. Notably, conducive gradients are easy to compute, and since we only calculate the approximations once, they incur negligible overhead. We apply conducive gradients to distributed stochastic gradient Langevin dynamics (DSGLD) and call the resulting method federated stochastic gradient Langevin dynamics (FSGLD). We demonstrate that our approach can handle delayed communication rounds, converging to the target posterior in cases where DSGLD fails. We also show that FSGLD outperforms DSGLD for non-IID federated data with experiments on metric learning and neural networks.
翻译:虽然我们可以很容易地将SGLD扩展至分布式的设置,但在应用非IID数据时,它会遇到两个问题。首先,这些估计数的差异会大增。第二,由于通信延迟,Markov链条甚至为了非常简单的模型而与真实的后继体脱节。为了缓解这两个问题,我们建议了有利的梯度,这是一个简单的机制,将当地的可能性近似结合起来,以纠正梯度更新。值得注意的是,有利的梯度很容易计算,而且由于我们只计算一次近似值,它们就会产生微不足道的间接成本。我们应用了有利的梯度,以传播随机梯度梯度梯度梯度动态(DSGLD),并称由此得出的方法的推力梯度梯度梯度梯度梯度动态(FSGLD)。我们证明,我们的方法可以处理延迟的通信周期,在DSGLD失败的情况下与目标的远地标相融合。我们还展示了FSGLD在非基数据实验中超越了 DSGLD。