Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the recurrent matrix by enforcing soft unitary constraints on the factors. Thanks to the small dimensionality of the factors, maintaining these constraints is computationally efficient. Our experimental results on seven standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance. These results in particular show that while there are advantages in having a high dimensional recurrent space, the capacity of the recurrent part of the model can be dramatically reduced.
翻译:我们的工作涉及与经常性神经网络有关的两个重要问题:(1) 神经网络过于独立,(2) 重复的矩阵条件不当,前者增加了学习和培训时间的样本复杂性,后者导致梯度问题消失和爆炸。我们提出了一个灵活的经常性神经网络模型,称为Kronecker 常规单元(KRU)。 Kronecker 常规单元(KRU)通过一个克伦贝克系数的反复矩阵实现RNN的参数效率。它通过对各种因素实施软统一的制约,克服了经常性矩阵的调节不力。由于这些因素规模小,保持这些制约是计算效率高的。我们在七个标准数据集上的实验结果表明,与现有的经常性模型相比, KRU可以将经常性重量矩阵中的参数数量减少3个数量级,而不用交换统计性能。这些结果特别表明,虽然拥有高维度的经常性空间具有优势,但该模型经常部分的容量可以大大降低。