Many types of neural network layers rely on matrix properties such as invertibility or orthogonality. Retaining such properties during optimization with gradient-based stochastic optimizers is a challenging task, which is usually addressed by either reparameterization of the affected parameters or by directly optimizing on the manifold. In contrast, this work presents a novel, general approach of preserving matrix properties by using parameterized perturbations. In lieu of directly optimizing the network parameters, the introduced P$^{4}$ update optimizes perturbations and merges them into the actual parameters infrequently such that the desired property is preserved. As a demonstration, we use this concept to preserve invertibility of linear layers during training. This P$^{4}$Inv update allows keeping track of inverses and determinants via rank-one updates and without ever explicitly computing them. We show how such invertible blocks improve the mixing of coupling layers and thus the mode separation of the resulting normalizing flows.
翻译:许多类型的神经网络层依赖矩阵属性,如垂直或正方位等。在使用基于梯度的随机优化器优化时保留这些属性是一项具有挑战性的任务,通常通过对受影响的参数进行重新校准,或直接优化元件来解决这个问题。相反,这项工作提出了使用参数性扰动来保存矩阵属性的新颖、一般方法。采用P$+%4}来代替直接优化网络参数,采用P$更新来优化扰动,将其合并到实际参数中,以保持理想的属性。作为示范,我们使用这一概念来在培训中保持线性层的不可忽略性。P$4}$Inv 更新使得能够通过一级更新和从未明确计算来跟踪反向和决定因素。我们展示了这种不可忽略的区块如何改善组合层的混合,从而将由此产生的正常流程模式分离。