Recently, various normalization layers have been proposed to stabilize the training of deep neural networks. Among them, group normalization is a generalization of layer normalization and instance normalization by allowing a degree of freedom in the number of groups it uses. However, to determine the optimal number of groups, trial-and-error-based hyperparameter tuning is required, and such experiments are time-consuming. In this study, we discuss a reasonable method for setting the number of groups. First, we find that the number of groups influences the gradient behavior of the group normalization layer. Based on this observation, we derive the ideal number of groups, which calibrates the gradient scale to facilitate gradient descent optimization. Our proposed number of groups is theoretically grounded, architecture-aware, and can provide a proper value in a layer-wise manner for all layers. The proposed method exhibited improved performance over existing methods in numerous neural network architectures, tasks, and datasets.
翻译:最近,为了稳定深层神经网络的培训,提出了各种正常化层面,以稳定深层神经网络的培训,其中,群体正常化是层级正常化和例例正常化的普遍化,允许其使用的群体数量有一定程度的自由。然而,为了确定群体的最佳数量,需要试验和以高温为基础的超参数调整,这种试验耗费时间。在本研究中,我们讨论了确定群体数目的合理方法。首先,我们发现群体数目影响群体正常化层的梯度行为。根据这一观察,我们得出了理想的团体数目,这些数目调整了梯度比例,以便利梯度下降优化。我们提议的团体数目在理论上是有根据的,符合结构,能够以层次的方式为各个层次提供适当的价值。拟议方法在许多神经网络结构、任务和数据集中,在现行方法上表现得更好。