Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned, which could harm the model in the training stability and generalization abilities. In this paper, we systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer. Existing orthogonal treatments on the weights are first investigated. However, these techniques can improve the conditioning but would hurt the performance. To avoid such a side effect, we propose the Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR). The effectiveness of our methods is validated in two applications: decorrelated Batch Normalization (BN) and Global Covariance Pooling (GCP). Extensive experiments on visual recognition demonstrate that our methods can simultaneously improve covariance conditioning and generalization. The combinations with orthogonal weight can further boost the performance. Moreover, we show that our orthogonality techniques can benefit generative models for better latent disentanglement through a series of experiments on various benchmarks. Code is available at: \href{https://github.com/KingJamesSong/OrthoImproveCond}{https://github.com/KingJamesSong/OrthoImproveCond}.
翻译:在神经网络中插入 SVD 元层时, 容易使共变错误, 这会损害培训稳定性和一般化能力的模式。 在本文中, 我们系统地研究如何通过对 SVD 前层强制执行正数调整来改进共变调节。 首先是调查重量的现有正数处理方法。 但是, 这些技术可以改进调试, 但是会损害性能。 为了避免这种副作用, 我们提议了近半正数梯度和最佳学习率( OLR ) 。 我们的方法的有效性在两种应用中得到验证: 与调理学相关的批次正常化( BN) 和全球共变异组合( GCP ) 。 关于视觉识别的广泛实验表明, 我们的方法可以同时改善共变调和一般化。 与正数的组合可以进一步提升性能。 此外, 我们的正数法技术可以通过一系列基准实验, 有利于更潜在分解模型。 代码可以在以下网址上找到: Oror Batch/ Ormabrefr@Khobus/ Ormabs@ Gomes.