Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scaling-based weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization methods and activation normalization methods. In this work we propose a weight soft-regularization method based on the Oblique manifold. The proposed method uses a loss function which pushes each weight vector to have a norm close to one, i.e. the weight matrix is smoothly steered toward the so-called Oblique manifold. We evaluate our method on the very popular CIFAR-10, CIFAR-100 and ImageNet 2012 datasets using two state-of-the-art architectures, namely the ResNet and wide-ResNet. Our method introduces negligible computational overhead and the results show that it is competitive to the state-of-the-art and in some cases superior to it. Additionally, the results are less sensitive to hyperparameter settings such as batch size and regularization factor.
翻译:革命性神经网络培训可能受到各种问题的影响,如梯度爆炸或消失、基于缩放的重量空间对称和共变变换等。为了解决这些问题,研究人员开发了权重调节方法和启动正常化方法。在这项工作中,我们提议了一种基于 Oblique 多重的轻度软常规化方法。拟议方法使用一种损失函数,将每个重量矢量推向接近于一个的规范,即重量矩阵顺利地转向所谓的 Oblique 多元。我们用两种最先进的结构,即ResNet 和 宽ResNet,来评估我们使用非常流行的CIFAR-10、CIFAR-100 和图像网络2012 数据集的方法。我们的方法引入了微小的计算间接成本,结果显示它对状态具有竞争力,有些则优于它。此外,结果对诸如批量尺寸和规范因素等超参数环境不那么敏感。