Many types of neural network layers rely on matrix properties such as invertibility or orthogonality. Retaining such properties during optimization with gradient-based stochastic optimizers is a challenging task, which is usually addressed by either reparameterization of the affected parameters or by directly optimizing on the manifold. This work presents a novel approach for training invertible linear layers. In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently. This P$^{4}$Inv update allows keeping track of inverses and determinants without ever explicitly computing them. We show how such invertible blocks improve the mixing and thus the mode separation of the resulting normalizing flows. Furthermore, we outline how the P$^4$ concept can be utilized to retain properties other than invertibility.
翻译:许多类型的神经网络层依赖矩阵属性,如垂直或正方位等。在使用基于梯度的随机优化器优化时保留这些属性是一项艰巨的任务,通常通过对受影响的参数进行重新校准,或直接优化元件来解决这个问题。这项工作为培训可垂直线性层提供了一个新颖的方法。我们不经常地直接优化网络参数,而是培训一阶扰动,并将其添加到实际的重量矩阵中。这个P$4}$Inv 更新使得能够跟踪反向和决定因素,而从未明确计算过它们。我们展示了这种不可忽略的区块如何改善混合,从而将由此产生的正常流模式分离。此外,我们概述了如何利用P$4$的概念来保留非垂直的属性。