We propose an efficient numerical method for computing natural gradient descent directions with respect to a generic metric in the state space. Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem. Hence, instead of calculating, storing, or inverting the information matrix directly, we apply efficient methods from numerical linear algebra to solve this least-squares problem. We treat both scenarios where the derivative of the state variable with respect to the parameter is either explicitly known or implicitly given through constraints. We apply the QR decomposition to solve the least-squares problem in the former case and utilize the adjoint-state method to compute the natural gradient descent direction in the latter case. As a result, we can reliably compute several natural gradient descents, including the Wasserstein natural gradient, for a large-scale parameter space with thousands of dimensions, which was believed to be out of reach. Finally, our numerical results shed light on the qualitative differences among the standard gradient descent method and various natural gradient descent methods based on different metric spaces in large-scale nonconvex optimization problems.
翻译:我们建议了一种高效的数字方法,用于计算国家空间通用度量的自然梯度下降方向。我们的技术依赖于以自然梯度方向作为解决标准最小平方问题的方法。因此,我们不用直接计算、储存或颠倒信息矩阵,而是采用数字线性代数的有效方法来解决这一最小平方问题。我们处理两种情况,即国家变量相对于参数的衍生物要么明确已知,要么通过限制间接给出。我们运用QR分解法来解决前一种情况中最不平方的问题,并利用联合状态方法计算后一种情况中的自然梯度下降方向。结果,我们可以可靠地计算出数种自然梯度下降,包括瓦塞斯坦自然梯度,用于具有数千个维度的大型参数空间,据认为该参数无法到达。最后,我们的数字结果揭示了标准梯度下降法和各种自然梯度下降方法之间的质量差异,其基础是大规模非凝固度优化问题的不同指标空间。