We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On this space, we explore the error landscape rather than the loss. This lets us derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them. Using different optimization algorithms that sample minimizers with different flatness we study the mode connectivity and other characteristics. Testing a variety of state-of-the-art architectures and benchmark datasets, we confirm the correlation between flatness and generalization performance; we further show that in function space flatter minima are closer to each other and that the barriers along the geodesics connecting them are small. We also find that minimizers found by variants of gradient descent can be connected by zero-error paths with a single bend. We observe similar qualitative results in neural networks with binary weights and activations, providing one of the first results concerning the connectivity in this setting. Our results hinge on symmetry removal, and are in remarkable agreement with the rich phenomenology described by some recent analytical studies performed on simple shallow models.
翻译:我们将深度神经网络景观调查方法系统化,方法是根据已执行功能空间空间的几何测量而不是参数空间空间来对深神经网络景观进行调查。 将分类器分组为等同类, 我们开发一个标准化参数, 去除所有对称, 导致非机器人的地形学。 在这个空间, 我们探索误差景观, 而不是损失。 这让我们可以得出一个有意义的概念, 即最小化器的平面和连接它们的大地测量路径的平面性能; 使用不同优化算法, 将执行功能的空间空间的几度最小化, 我们研究模式连接和其他特征的平面性能。 测试各种最先进的结构及基准数据集, 我们确认平面和一般化性表现之间的相关关系; 我们进一步显示, 在功能中, 平面迷你迷你术彼此相近, 连接这些地理特征的屏障是很小的。 我们还发现, 由梯度下下降的变体所发现的最小化器可以与单一弯曲联系起来。 我们观察到在神经网络中类似的定性结果, 与双重和激活了模式, 我们确认平坦的原始分析结果中, 也是我们最简单的分析模型中最接近的。