L2 regularization for weights in neural networks is widely used as a standard training trick. However, L2 regularization for gamma, a trainable parameter of batch normalization, remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this paper, we study whether L2 regularization for gamma is valid. To explore this issue, we consider two approaches: 1) variance control to make the residual network behave like identity mapping and 2) stable optimization through the improvement of effective learning rate. Through two analyses, we specify the desirable and undesirable gamma to apply L2 regularization and propose four guidelines for managing them. In several experiments, we observed the increase and decrease in performance caused by applying L2 regularization to gamma of four categories, which is consistent with our four guidelines. Our proposed guidelines were validated through various tasks and architectures, including variants of residual networks and transformers.
翻译:神经网络重量的L2正规化被广泛用作一种标准的训练技巧,然而,作为可培训的批量正常化参数,伽马语的L2正规化仍然是个未讨论的神秘问题,其应用方式取决于图书馆和从业人员。我们在本文件中研究伽马语的L2正规化是否有效。为了探讨这一问题,我们考虑采取两种办法:(1) 差异控制,使剩余网络的行为像身份绘图一样,(2) 通过提高有效学习率实现稳定的优化。我们通过两个分析,明确了适用L2正规化的可取性和不可取的伽马语,并提出了管理这些特性的四项准则。我们在若干实验中注意到,对四种类别伽马语适用L2正规化,这符合我们的四项准则。我们提出的准则是通过各种任务和结构,包括残余网络和变异器的变异体,得到验证的。