Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.
翻译:最近的观测增进了我们对神经网络优化景观的理解,揭示了存在(1) 包含多种解决方案的高精度路径和(2) 更广阔的微小功能。以往对不同路径的观察方法需要多次培训运行。相比之下,我们的目标是以单一方法和一次培训运行的方式利用财产(1)和(2)的杠杆作用。我们学习高精度神经网络的线条、曲线和简单轴等计算成本,这些神经网络子空间包含多种解决方案,可以混合,接近独立培训网络的混合性能,而无需培训成本。此外,我们利用子空间中点提高精度、校准和稳健性来标注噪音、超强的斯托切思强光能。