Analyzing geometric properties of high-dimensional loss functions, such as local curvature and the existence of other optima around a certain point in loss space, can help provide a better understanding of the interplay between neural network structure, implementation attributes, and learning performance. In this work, we combine concepts from high-dimensional probability and differential geometry to study how curvature properties in lower-dimensional loss representations depend on those in the original loss space. We show that saddle points in the original space are rarely correctly identified as such in lower-dimensional representations if random projections are used. In such projections, the expected curvature in a lower-dimensional representation is proportional to the mean curvature in the original loss space. Hence, the mean curvature in the original loss space determines if saddle points appear, on average, as either minima, maxima, or almost flat regions. We use the connection between expected curvature and mean curvature (i.e., the normalized Hessian trace) to estimate the trace of Hessians without calculating the Hessian or Hessian-vector products as in Hutchinson's method. Because random projections are not able to correctly identify saddle information, we propose to study projections along Hessian directions that are associated with the largest and smallest principal curvatures. We connect our findings to the ongoing debate on loss landscape flatness and generalizability. Finally, we illustrate our method in numerical experiments on different image classifiers with up to about $7\times 10^6$ parameters.
翻译:分析高维损失函数的几何特性, 如本地曲线和在损失空间某一点周围存在其他偏差, 有助于更好地了解神经网络结构、 执行属性和学习性能之间的相互作用。 在这项工作中, 我们将高维概率和差分几何概念结合起来, 研究低维损失表示中曲线特性如何取决于原始损失空间中的概念。 我们显示, 如果使用随机预测, 原始空间的马鞍点很少被正确地确定为低维显示为低维显示点。 在这种预测中, 低维代表度的预期曲线与原始损失空间中的平均曲度成比例。 因此, 原始损失空间中的平均曲度决定着马鞍点( 无论是微度、 峰值还是几乎平坦的区域 ) 。 我们使用预期曲线和平均曲线之间的连接点连接点( 即正常海珊的踪迹) 来估计赫西亚人的行踪迹, 而没有计算赫希森或赫斯克托克的曲线显示度值值值值与原始损失空间预测的平均值 。 我们无法在海钦森的图表中正确分析中 10 。