This report studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second-order and adaptive gradient optimizers, e.g., BFGS, Gaussian-Newton algorithm, natural gradient descent, AdaGrad, etc. Our analyses reveal the efficiency and reliability differences among a wide range of preconditioner fitting methods, from closed-form to iterative solutions, using Hessian-vector products or stochastic gradients only, with Hessian fittings in the Euclidean space, the manifold of symmetric positive definite (SPL) matrices, to a variety of Lie groups. The most intriguing discovery is that the Hessian fitting itself as an optimization problem is strongly convex under mild conditions in certain general Lie groups. This discovery turns Hessian fitting into a well-behaved Lie group optimization problem and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large-scale stochastic optimizations.
翻译:暂无翻译