Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.
翻译:深度集合最近在深度学习社区中因其概念上的简单性和效率而受到欢迎。然而,保持独立训练的集合成员之间的功能多样性具有挑战性。当添加更多的集合成员时,这可能导致病态。例如,集合性能的饱和,它收敛到单个模型的性能。此外,这不仅影响其预测的质量,而且更为重要的是,影响集合的不确定性估计,从而影响其在超出数据分布的数据上的表现。我们假设可以通过阻止不同的集合成员折叠到相同的函数来克服这个限制。为此,我们在深度集合的更新规则中引入了一个核化的斥力项。我们证明,这个简单的修改不仅强制实现和维护成员之间的多样性,而且更重要的是将MAP推断转化为正确的贝叶斯推断。即,我们表明,我们所提出的斥力集合的训练动态遵循KL散度与真后验之间的Wasserstein梯度流。我们研究了权重空间和函数空间中的斥力项,并在合成和真实的预测任务上将其性能与标准集合和贝叶斯基线进行了实证比较。