In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-$k$ updates -- or their rank-$k$ approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full $N$-dimensional unitary or orthogonal matrices with a training runtime scaling as $O(kN^2)$. Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting ($k=1$), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. In recurrent neural network settings, projUNN closely matches or exceeds benchmarked results from prior unitary neural networks. Finally, we preliminarily explore projUNN in training orthogonal convolutional neural networks, which are currently unable to outperform state of the art models but can potentially enhance stability and robustness at large depth.
翻译:在与经常或非常深的饲料向前网络学习时,在每一层使用单一矩阵可以非常有效地维持远距离稳定。然而,限制网络参数以统一为单一,通常代价昂贵的参数化或增加培训运行时间。我们提议采用一个高效的方法,其基础是按1-k$的更新,或按1-k$的近似值,在接近最佳的培训运行时保持性能。我们引入了这种方法的两个变式,即直接(projUNN-D)和Tangent(projUN-T),即预测的单向神经网络,可以将全元的一元单一或正方形矩阵参数化,以培训运行时间缩放为O(kN2)美元。我们的方法要么是按最接近的统一矩阵(projUNN-T),要么将低的梯度梯度推到接近最佳的培训时间(projUN-D)。即使在最快速的设置(k=1美元),ProjUNN能够对模型的统一参数进行测试,使其达到与基线执行的可比较的业绩。在常规网络中,最后是经常性的硬性网络,在前的模型上,联合国可能比更接近。