Studying neural network loss landscapes provides insights into the nature of the underlying optimization problems. Unfortunately, loss landscapes are notoriously difficult to visualize in a human-comprehensible fashion. One common way to address this problem is to plot linear slices of the landscape, for example from the initial state of the network to the final state after optimization. On the basis of this analysis, prior work has drawn broader conclusions about the difficulty of the optimization problem. In this paper, we put inferences of this kind to the test, systematically evaluating how linear interpolation and final performance vary when altering the data, choice of initialization, and other optimizer and architecture design choices. Further, we use linear interpolation to study the role played by individual layers and substructures of the network. We find that certain layers are more sensitive to the choice of initialization, but that the shape of the linear path is not indicative of the changes in test accuracy of the model. Our results cast doubt on the broader intuition that the presence or absence of barriers when interpolating necessarily relates to the success of optimization.
翻译:研究神经网络损失的场景可以洞察到最优化的基本问题的性质。 不幸的是,损失的场景很难以人类理解的方式想象出来。 解决这一问题的一个常见方式是绘制地貌的线性切片,例如从网络的初始状态到优化后的最后状态。 根据这项分析,先前的工作对优化问题的难度得出了更广泛的结论。在本文中,我们对这种类型的推论进行了测试,系统地评估了在改变数据、初始化选择以及其他优化和建筑设计选择时线性内插和最终性能的差异。此外,我们利用线性内插来研究网络各个层和子结构的作用。我们发现,某些层对初始化的选择比较敏感,但线性路径的形状并不表明模型测试准确性的变化。我们的结果使人怀疑,在进行干涉时是否存在障碍必然与优化的成功相关。