Training artificial neural networks requires the optimization of highly non-convex loss functions. Throughout the years, the scientific community has developed an extensive set of tools and architectures that render this optimization task tractable and a general intuition has been developed for choosing hyper parameters that help the models reach minima that generalize well to unseen data. However, for the most part, the difference in trainability in between architectures, tasks and even the gap in network generalization abilities still remain unexplained. Visualization tools have played a key role in uncovering key geometric characteristics of the loss-landscape of ANNs and how they impact trainability and generalization capabilities. However, most visualizations methods proposed so far have been relatively limited in their capabilities since they are of linear nature and only capture features in a limited number of dimensions. We propose the use of the modern dimensionality reduction method PHATE which represents the SOTA in terms of capturing both global and local structures of high-dimensional data. We apply this method to visualize the loss landscape during and after training. Our visualizations reveal differences in training trajectories and generalization capabilities when used to make comparisons between optimization methods, initializations, architectures, and datasets. Given this success we anticipate this method to be used in making informed choices about these aspects of neural networks.
翻译:培养人工神经网络需要优化高度非隐形损失功能。多年来,科学界开发了一套广泛的工具和结构,使这一优化任务可以推广,并开发了一种一般直觉,以选择超强参数,帮助模型达到最接近隐形数据的微型参数。然而,在大部分情况下,结构、任务乃至网络概括能力差异的训练能力差异仍然无法解释。可视化工具在发现非非非隐形国家损失地貌的关键几何特征以及它们如何影响可训练性和一般化能力方面发挥了关键作用。然而,迄今为止,大多数拟议的可视化方法在能力上相对有限,因为它们是线性,仅能捕捉到有限层面的特征。我们提议使用现代的维度降法PHATE,它代表全球和地方高维数据结构的SOTA。我们在培训期间和之后运用这一方法对损失地貌进行视觉化。我们的可视化显示显示显示在培训轨迹和一般化能力方面的差异,在用来对数据优化方法进行对比时,我们使用这些系统进行这种精确化的预测。