With a better understanding of the loss surfaces for multilayer networks, we can build more robust and accurate training procedures. Recently it was discovered that independently trained SGD solutions can be connected along one-dimensional paths of near-constant training loss. In this paper, we show that there are mode-connecting simplicial complexes that form multi-dimensional manifolds of low loss, connecting many independently trained models. Inspired by this discovery, we show how to efficiently build simplicial complexes for fast ensembling, outperforming independently trained deep ensembles in accuracy, calibration, and robustness to dataset shift. Notably, our approach only requires a few training epochs to discover a low-loss simplex, starting from a pre-trained solution. Code is available at https://github.com/g-benton/loss-surface-simplexes.
翻译:通过更好地了解多层网络的损失表面,我们可以建立更稳健、更准确的培训程序。 最近发现独立训练的 SGD 解决方案可以连接到近固态培训损失的一维路径上。 在本文中,我们显示存在模式连接的简化复合体,形成低损的多维元体,连接许多独立训练的模型。在这项发现启发下,我们展示了如何高效地建立简易综合体,以便快速组合,在精确度、校准性和数据设置变化的稳健性方面表现优于独立训练的深层组合。值得注意的是,我们的方法只需要几个培训手来发现一个低损简单字,从培训前的解决方案开始。代码可在 https://github.com/g-benton/loss-board-sopload-soludexes查阅。