Studying neural network loss landscapes provides insights into the nature of the underlying optimization problems. Unfortunately, loss landscapes are notoriously difficult to visualize in a human-comprehensible fashion. One common way to address this problem is to plot linear slices of the landscape, for example from the initial state of the network to the final state after optimization. On the basis of this analysis, prior work has drawn broader conclusions about the difficulty of the optimization problem. In this paper, we put inferences of this kind to the test, systematically evaluating how linear interpolation and final performance vary when altering the data, choice of initialization, and other optimizer and architecture design choices. Further, we use linear interpolation to study the role played by individual layers and substructures of the network. We find that certain layers are more sensitive to the choice of initialization and optimizer hyperparameter settings, and we exploit these observations to design custom optimization schemes. However, our results cast doubt on the broader intuition that the presence or absence of barriers when interpolating necessarily relates to the success of optimization.
翻译:研究神经网络损失的场景可以洞察到最优化的基本问题的性质。 不幸的是,损失的场景很难以人类理解的方式想象出来。 解决这一问题的一个常见方法是绘制地貌的线形片,例如从网络的初始状态到优化后的最后状态。 根据这项分析,先前的工作就优化问题的难度得出了更广泛的结论。 在本文中,我们对这种类型的推论进行了测试,系统地评价了在改变数据、初始化选择以及其他优化和建筑设计选择时线性内插和最终性能的差异。此外,我们利用线性内插来研究网络各个层和亚结构的作用。我们发现,某些层对于初始化和优化超度环境的选择比较敏感,我们利用这些观察来设计定制优化计划。然而,我们的结果使人怀疑,在干涉必然与优化的成功相关时是否存在或没有障碍这一更广泛的直觉。