Stein variational gradient descent (SVGD) and its variants have shown promising successes in approximate inference for complex distributions. However, their empirical performance depends crucially on the choice of optimal kernel. Unfortunately, RBF kernel with median heuristics is a common choice in previous approaches which has been proved sub-optimal. Inspired by the paradigm of multiple kernel learning, our solution to this issue is using a combination of multiple kernels to approximate the optimal kernel instead of a single one which may limit the performance and flexibility. To do so, we extend Kernelized Stein Discrepancy (KSD) to its multiple kernel view called Multiple Kernelized Stein Discrepancy (MKSD). Further, we leverage MKSD to construct a general algorithm based on SVGD, which be called Multiple Kernel SVGD (MK-SVGD). Besides, we automatically assign a weight to each kernel without any other parameters. The proposed method not only gets rid of optimal kernel dependence but also maintains computational effectiveness. Experiments on various tasks and models show the effectiveness of our method.
翻译:斯坦因梯度梯度下降及其变体在复杂分布的近似推导中表现出了大有希望的成功。然而,它们的实验性表现主要取决于最佳内核的选择。 不幸的是,RBF内核与中偏差是以往方法的共同选择,但后来证明是次最佳的。受多内核学习范式的启发,我们解决这一问题的办法是将多个内核结合,以近似最佳内核,而不是可能限制性能和灵活性的单一内核。为了做到这一点,我们将内核性失常(KSD)扩展至其称为多内核内核的多内核外核。此外,我们利用MKSD建立基于SVGD(称为多内核SVGD(MK-SVGD))的一般算法。此外,我们自动为每个内核内核内核分配了一个重量,而没有任何其他参数。我们提出的方法不仅消除了最佳内核依赖性,而且还保持了计算效力。对各种任务和模型的实验显示了我们的方法的有效性。