The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research. In this paper, based on the earlier work by Luo et al.~\cite{luo2021phase}, we present a phase diagram of initial condensation for two-layer neural networks. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in non-linear learning process that enables neural networks to possess better generalization abilities. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks and their dependence on the choice of hyperparameters related to initialization. Furthermore, we demonstrate in detail the underlying mechanisms by which small initialization leads to condensation at the initial training stage.
翻译:神经网络在初始化的不同尺度下表现出截然不同的行为模式一直是深度学习研究中的一个谜团。本文基于Luo等人~\cite{luo2021phase}的先前工作,提出了一个双层神经网络初始凝聚相图。凝聚是神经网络在训练过程中权重向量集中于孤立方向的现象,这是非线性学习过程中的一种特征,使神经网络具有更好的泛化能力。我们的相图旨在提供关于神经网络动态区域及其与初始化相关的超参数选择依赖性的全面理解。此外,我们详细阐述了小初始化导致初始训练阶段凝聚的潜在机制。