Bilinear pooling achieves great success in fine-grained visual recognition (FGVC). Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features, but some problems, e.g., redundant information and over-fitting, remain to be resolved. In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity. These three regularizers can not only stabilize the second-order information, but also compact the bilinear features and promote model generalization. In MOMN, a core challenge is how to jointly optimize three non-smooth regularizers of different convex properties. To this end, MOMN first formulates them into an augmented Lagrange formula with approximated regularizer constraints. Then, auxiliary variables are introduced to relax different constraints, which allow each regularizer to be solved alternately. Finally, several updating strategies based on gradient descent are designed to obtain consistent convergence and efficient implementation. Consequently, MOMN is implemented with only matrix multiplication, which is well-compatible with GPU acceleration, and the normalized bilinear features are stabilized and discriminative. Experiments on five public benchmarks for FGVC demonstrate that the proposed MOMN is superior to existing normalization-based methods in terms of both accuracy and efficiency. The code is available: https://github.com/mboboGO/MOMN.
翻译:近些方法显示,矩阵权力正常化不仅可以稳定二等信息,还可以压缩双线特征,促进模式的通用化。在MOMN中,一个核心问题是如何联合优化三个不同 convex 属性的非超模规范化器。为此,MOMN首先将它们配制成一个强化的拉格模式,并有大约的正态限制。然后,引入辅助变量,以放松不同的限制,使每个正态化都能够交替解决。最后,一些基于梯度位的更新战略旨在取得一致和高效的实施。因此,MOMN将只采用基质的透明化/透明化标准。