In this paper, we focus on a typical two-phase phenomenon in the learning of multi-layer perceptrons (MLPs), and we aim to explain the reason for the decrease of feature diversity in the first phase. Specifically, people find that, in the training of MLPs, the training loss does not decrease significantly until the second phase. To this end, we further explore the reason why the diversity of features over different samples keeps decreasing in the first phase, which hurts the optimization of MLPs. We explain such a phenomenon in terms of the learning dynamics of MLPs. Furthermore, we theoretically explain why four typical operations can alleviate the decrease of the feature diversity.
翻译:在本文中,我们侧重于学习多层感应器(MLPs)的一个典型的两阶段现象,我们的目标是解释第一阶段特征多样性下降的原因。具体地说,人们发现,在培训MLPs的过程中,培训损失不会显著下降,直到第二阶段。为此,我们进一步探讨不同样本的特征多样性在第一阶段持续下降的原因,这损害了MLPs的优化。我们用MLPs的学习动态来解释这种现象。此外,我们从理论上解释了为什么四种典型的行动可以缓解特征多样性的减少。