Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the hardness of optimization problems, and empirical observations on MLI for deep neural networks depend heavily on biases. In particular, we show that interpolating both weights and biases linearly leads to very different influences on the final output, and when different classes have different last-layer biases on a deep network, there will be a long plateau in both the loss and accuracy interpolation (which existing theory of MLI cannot explain). We also show how the last-layer biases for different classes can be different even on a perfectly balanced dataset using a simple model. Empirically we demonstrate that similar intuitions hold on practical networks and realistic datasets.
翻译:单线性线性内插( MLI) —— 将随机初始化与最小化相连接的线上, 损失和准确性是单向的—— 是神经网络训练中常见的一种现象。 这种现象似乎表明神经网络的优化很容易。 在本文中, 我们显示, MLI 属性不一定与优化问题的难度相关, 而深神经网络对MLI 的经验性观测在很大程度上取决于偏差。 特别是, 我们显示, 将重量和偏向进行间插, 导致对最终输出产生非常不同的影响, 当不同类别对深网络有不同的最后一层偏差时, 损失和精确性内插( MLI 现有的理论无法解释) 将会有一个很长的高度 。 我们还展示了不同类别的最后一层偏差如何不同, 即使是在完全平衡的数据集上, 如何使用简单的模型。 我们生动地证明, 类似的直觉会维持在实用网络和现实的数据集上。