We consider an extension of the Newton-MR algorithm for nonconvex unconstrained optimization to the settings where Hessian information is approximated. Under a particular noise model on the Hessian matrix, we investigate the iteration and operation complexities of this variant to achieve appropriate sub-optimality criteria in several nonconvex settings. We do this by first considering functions that satisfy the (generalized) Polyak-\L ojasiewicz condition, a special sub-class of nonconvex functions. We show that, under certain conditions, our algorithm achieves global linear convergence rate. We then consider more general nonconvex settings where the rate to obtain first order sub-optimality is shown to be sub-linear. In all these settings, we show that our algorithm converges regardless of the degree of approximation of the Hessian as well as the accuracy of the solution to the sub-problem. Finally, we compare the performance of our algorithm with several alternatives on a few machine learning problems.
翻译:暂无翻译