Ensembles of independently trained neural networks are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning, and can be interpreted as an approximation of the posterior distribution via a mixture of delta functions. The training of ensembles relies on non-convexity of the loss landscape and random initialization of their individual members, making the resulting posterior approximation uncontrolled. This paper proposes a novel and principled method to tackle this limitation, minimizing an $f$-divergence between the true posterior and a kernel density estimator (KDE) in a function space. We analyze this objective from a combinatorial point of view, and show that it is submodular with respect to mixture components for any $f$. Subsequently, we consider the problem of greedy ensemble construction. From the marginal gain on the negative $f$-divergence, which quantifies an improvement in posterior approximation yielded by adding a new component into the KDE, we derive a novel diversity term for ensemble methods. The performance of our approach is demonstrated on computer vision out-of-distribution detection benchmarks in a range of architectures trained on multiple datasets. The source code of our method is made publicly available at https://github.com/Oulu-IMEDS/greedy_ensembles_training.
翻译:独立训练的神经网络群集,是用来估计深层学习中预测不确定性的最先进方法,可被解释为通过混合三角函数对后端分布的近似值。 集合的训练取决于损失地貌的非混杂性和个别成员随机初始化问题,使由此产生的后端近似不受控制。 本文提出了解决这一限制的新颖和有原则的方法, 尽量减少在功能空间内真实的后端和内核密度估计仪( KDE)之间的差价。 我们从组合角度分析这一目标, 并表明它是混合物组成部分的次式。 随后, 我们考虑贪婪合构件的问题。 从负值的负值近似差收益中, 通过在 KDE 中添加一个新的组成部分来量化后端近度的改进, 我们为组合式方法提出了一个新的多样化术语。 我们的方法的性能表现在计算机视野外的组合组件中展示了我们所培训的数据集/源代码范围。