Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned, which could harm the model in the training stability and generalization abilities. In this paper, we systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer. Existing orthogonal treatments on the weights are first investigated. However, these techniques can improve the conditioning but would hurt the performance. To avoid such a side effect, we propose the Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR). The effectiveness of our methods is validated in two applications: decorrelated Batch Normalization (BN) and Global Covariance Pooling (GCP). Extensive experiments on visual recognition demonstrate that our methods can simultaneously improve the covariance conditioning and generalization. Moreover, the combinations with orthogonal weight can further boost the performances.
翻译:在神经网络中插入 SVD 元层, 容易使共变错误, 这会损害培训稳定性和一般化能力的模式。 在本文中, 我们系统地研究如何通过对 SVD 前层强制调整来改进共变调节。 首先是调查重量上现有的正方位处理方法。 但是, 这些技术可以改进调试, 但是会伤害性能。 为了避免这种副作用, 我们建议使用 Neest Orthogonal Gradient (NOG) 和 优化学习率 (OLR) 。 我们的方法的有效性在两种应用中得到验证: 与 Decor 批次正常化 (BN) 和全球共变组合 (GCP) 。 关于视觉识别的广泛实验表明, 我们的方法可以同时改进共变调调和概括性。 此外, 与 orphogocial 重量的组合可以进一步提升性能 。