Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown promise in accelerating the convergence in non-convex learning; however, an excessively large correction for avoiding biases from noisy energy estimators has limited the potential of the acceleration. To address this issue, we study the variance reduction for noisy energy estimators, which promotes much more effective swaps. Theoretically, we provide a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process; moreover, we consider a generalized Girsanov theorem which includes the change of Poisson measure to overcome the crude discretization based on the Gr\"{o}wall's inequality and yields a much tighter error in the 2-Wasserstein ($\mathcal{W}_2$) distance. Numerically, we conduct extensive experiments and obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
翻译:复制式交换梯度 Langevin 动态(reSGLD)在加速非电流学习的趋同方面显示了希望;然而,为避免来自噪音能源估计器的偏差而作的过度大更正限制了加速的潜力。为了解决这一问题,我们研究了噪音能源估计器的差异减少问题,这能促进更为有效的互换。理论上,我们对基底连续时间马可夫跳跃过程的指数加速度进行了非抽解分析;此外,我们认为Girsanov 理论是普遍的,其中包括改变Poisson措施,以克服基于Gr\“{o}墙的不平等性造成的粗体分化,并在2-Wasserstein (\mathcal{W ⁇ 2$)的距离上造成更严重的错误。从数字上看,我们进行了广泛的实验,并获得了合成实验和图像数据的优化和不确定性估计方面的最新结果。