Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.
翻译:深团最近因其概念的简单性和效率而在深层次的学习界中越来越受欢迎。然而,在独立训练的全方位成员之间保持功能多样性是具有挑战性的。这可能导致在增加更多全方位成员时出现病理,例如共性表现的饱和,这与单一模式的表现相融合。此外,这不仅影响其预测的质量,而且更影响共性预测的不确定性估计,从而更影响其超出分配数据的性能。我们假设,通过阻止不同共性成员从崩溃到同一功能,这一限制是可以克服的。为此,我们在深层群状的更新规则中引入一个内嵌式的令人厌恶的术语。我们表明,这种简单的修改不仅加强和维持成员之间的多样性,而且更重要的是,不仅将事后的最大推论转化为适当的巴耶斯数据的推论。也就是说,我们提议的反共性恩的训练动态是:阻止不同的共性成员从崩溃到同一功能。我们为此在深度团群状的更新规则中引入了一个内嵌式的反词。我们展示了一种内嵌化的梯度词,并用真实的模型的模型预测来进行我们所研究。