Ensembles of deep neural networks have achieved great success recently, but they do not offer a proper Bayesian justification. Moreover, while they allow for averaging of predictions over several hypotheses, they do not provide any guarantees for their diversity, leading to redundant solutions in function space. In contrast, particle-based inference methods, such as Stein variational gradient descent (SVGD), offer a Bayesian framework, but rely on the choice of a kernel to measure the similarity between ensemble members. In this work, we study different SVGD methods operating in the weight space, function space, and in a hybrid setting. We compare the SVGD approaches to other ensembling-based methods in terms of their theoretical properties and assess their empirical performance on synthetic and real-world tasks. We find that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles. It improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely. Moreover, we show that using stochastic SVGD updates, as opposed to the standard deterministic ones, can further improve the performance.
翻译:深层神经网络群集最近取得了巨大成功,但是它们并没有提供适当的贝耶斯理由。 此外,虽然它们允许对若干假设进行平均预测,但它们并没有为其多样性提供任何保障,导致功能空间的冗余解决方案。相反,基于粒子的推论方法,如斯坦因变异梯度下降(SVGD)提供了贝耶斯框架,但依靠选择一个内核来衡量共性成员之间的相似性。在这项工作中,我们研究了在加权空间、功能空间和混合环境中操作的不同SVGD方法。我们将SVGD方法与其他基于聚合方法的理论性能进行比较,并评估其在合成和现实世界任务方面的经验性能。我们发现,使用功能性和混合内核的SVGD可以克服深孔的局限性。它改进了功能多样性和不确定性的估计,并更接近了真正的巴耶斯海脊。此外,我们证明,使用随机SVGD更新方法,而不是标准的确定性能,可以进一步改进性能。