This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a natural extension of the natural gradient method from the Euclidean setting to the manifold setting. We establish the almost-sure global convergence of our proposed method under standard assumptions. Moreover, we show that if the loss function satisfies certain convexity and smoothness conditions and the input-output map satisfies a Riemannian Jacobian stability condition, then our proposed method enjoys a local linear -- or, under the Lipschitz continuity of the Riemannian Jacobian of the input-output map, even quadratic -- rate of convergence. We then prove that the Riemannian Jacobian stability condition will be satisfied by a two-layer fully connected neural network with batch normalization with high probability, provided that the width of the network is sufficiently large. This demonstrates the practical relevance of our convergence rate result. Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones.
翻译:本文研究Riemannian 方块的大规模优化问题,这些方块的客观功能是负逻辑概率损失的有限总和。这些问题出现在各种机器学习和信号处理应用程序中。通过在方块设置中引入渔业信息矩阵的概念,我们提出了一部新颖的Riemannian自然梯度方法,可以被视为自然梯度方法从Euclidean环境向方块的自然延伸。我们在标准假设中确定了我们拟议方法的几乎可以保证的全球趋同。此外,我们表明,如果损失功能满足某些调和平稳条件,而输入输出图满足了Riemannian Jacobian的稳定条件,那么我们拟议的方法就具有局部线性 -- -- 或者,在输入输出速率地图的Riemannian Jabcobian的Lipschitz连续性下,甚至四面线度 -- -- 趋同率。我们随后证明,Riemannian Jacobian 的稳定性状况将由一个两层完全连接的神经网络得到满足,且具有很高的概率,只要网络的宽度足够大。这显示了我们所拟议的聚合率的模型应用结果的优势。