We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues. Specifically, we show both theoretically and via an extensive empirical evaluation that the SNR of the gradient estimates for the latent variable's variational parameters decreases as the number of importance samples increases. As a result, these gradient estimates degrade to pure noise if the number of importance samples is too large. To address this pathology, we show how doubly reparameterized gradient estimators, originally proposed for training variational autoencoders, can be adapted to the DGP setting and that the resultant estimators completely remedy the SNR issue, thereby providing more reliable training. Finally, we demonstrate that our fix can lead to consistent improvements in the predictive performance of DGP models.
翻译:我们发现,在培训深海高斯进程(DGPs)时使用的梯度估计值具有重要加权变异推断值,很容易受到信号到噪音比率问题的影响。具体地说,我们从理论上和通过广泛的经验评价都表明,随着重要样品数量的增加,潜在变量变异参数梯度估计值的下降。结果,如果重要样品数量过大,这些梯度估计值会降解为纯噪音。为了应对这一病理,我们发现,最初建议用于培训变异自动计算器的梯度估计器,如何可以适应DGP设置,而由此产生的估计器完全纠正了变异系数问题,从而提供了更可靠的培训。最后,我们证明,我们的精确度可以导致DGP模型预测性能的一致改善。