Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters. However, naive sharing of parameters poses many challenges such as limited representational power and the vanishing/exploding gradients problem of recursively shared parameters. In this paper, we present a recursive convolution block design and training method, in which a recursively shareable part, or a filter basis, is separated and learned while effectively avoiding the vanishing/exploding gradients problem during training. We show that the unwieldy vanishing/exploding gradients problem can be controlled by enforcing the elements of the filter basis orthonormal, and empirically demonstrate that the proposed orthogonality regularization improves the flow of gradients during training. Experimental results on image classification and object detection show that our approach, unlike previous parameter-sharing approaches, does not trade performance to save parameters and consistently outperforms overparameterized counterpart networks. This superior performance demonstrates that the proposed recursive convolution block design and the orthogonality regularization not only prevent performance degradation, but also consistently improve the representation capability while a significant amount of parameters are recursively shared.
翻译:现代共生神经网络(CNNs)具有巨大的共变区块,因此,为了减少参数数量,提议在这些区块之间反复共享参数,以减少参数数量;然而,天真共享参数带来了许多挑战,如代表性力量有限以及循环共享参数的消失/爆炸梯度问题。在本文中,我们提出了一个循环共变区块设计和培训方法,其中循环共享部分或过滤基础被分离和学习,同时有效避免了在培训期间消失/爆炸梯度问题。我们表明,通过执行过滤基础的元素,可以控制不易消失/爆炸梯度问题,从经验上表明,拟议的或多变性规范化改善了培训过程中的梯度流动。关于图像分类和对象检测的实验结果显示,我们的做法与以往的共生参数共享方法不同,没有为了保存参数而进行交易,而且始终超越了对等化的对等网络。我们这种优异性表现表明,拟议的共生区块的消失/爆炸梯度问题可以通过执行过滤基础或超常态的元素来控制,从经验上证明,拟议的分变变的参数设计和常化参数的变化度参数也不仅防止了常态性变化能力,也只是防止了常态性变化能力。