在神经网络损失景观中对地方结构与全球结构进行分类 (Taxonomizing local versus global structure in neural network loss landscapes)

Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization). Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data. By considering a range of metrics that attempt to capture different aspects of the loss landscape, we demonstrate that the best test accuracy is obtained when: the loss landscape is globally well-connected; ensembles of trained models are more similar to each other; and models converge to locally smooth regions. We also show that globally poorly-connected landscapes can arise when models are small or when they are trained to lower quality data; and that, if the loss landscape is globally poorly-connected, then training to zero loss can actually lead to worse test accuracy. Based on these results, we develop a simple one-dimensional model with load-like and temperature-like parameters, we introduce the notion of an \emph{effective loss landscape} depending on these parameters, and we interpret our results in terms of a \emph{rugged convexity} of the loss landscape. When viewed through this lens, our detailed empirical results shed light on phases of learning (and consequent double descent behavior), fundamental versus incidental determinants of good generalization, the role of load-like and temperature-like parameters in the learning process, different influences on the loss landscape from model and data, and the relationships between local and global metrics, all topics of recent interest.

翻译：将神经网络模型从其损失的地貌来看,在统计结构学方法方面,其历史悠久。近年来,在机器学习的正常阶段里,它一直受到关注。除其他外,地方度量(如损失地貌的平滑)已经显示与模型的全球特性(如良好的概括化)相关。在这里,我们对数千个神经网络模型的损失地貌结构进行了详细的实证分析,系统不同的学习任务、模型结构以及/或数据的数量/质量。通过考虑一系列试图捕捉损失地貌不同方面的指标,我们证明在机器学习过程中获得了最佳的测试准确性:损失地貌与全球密切相关;经过训练的模型的组合更加相似;模型与当地平滑动区域相近。我们还表明,当模型规模小或当它们被训练为低质量数据时,全球范围内的地貌景观结构结构结构结构会不相干;如果损失地貌模式在全球范围不相干,那么,培训到零损失实际上可以导致更差的测试准确性。根据这些结果,我们开发了一个简单的一维模型,我们从测测算的地平价值和测算结果,我们测测算了这些测算的模型, 的的测算成本和测测测测算结果的的的测算了我们测算测算测算的测算的和测算测算的测算的测算的测算的测算测算的的的的的的的的的测算的测算的测算的测算测算测算的测算的测算的的的测算的测算的的的测算的的的的的测算测算的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的的

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

专知会员服务

60+阅读 · 2020年6月28日

【ICLR 2019】双曲注意力网络，Hyperbolic Attention Network

专知会员服务

84+阅读 · 2020年6月21日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日