We study a variant of Newton's method for empirical risk minimization, where at each iteration of the optimization algorithm, we replace the gradient and Hessian of the objective function by robust estimators taken from existing literature on robust mean estimation for multivariate data. After proving a general theorem about the convergence of successive iterates to a small ball around the population-level minimizer, we study consequences of our theory in generalized linear models, when data are generated from Huber's epsilon-contamination model and/or heavy-tailed distributions. We also propose an algorithm for obtaining robust Newton directions based on the conjugate gradient method, which may be more appropriate for high-dimensional settings, and provide conjectures about the convergence of the resulting algorithm. Compared to the robust gradient descent algorithm proposed by Prasad et al. (2020), our algorithm enjoys the faster rates of convergence for successive iterates often achieved by second-order algorithms for convex problems, i.e., quadratic convergence in a neighborhood of the optimum, with a stepsize that may be chosen adaptively via backtracking linesearch.
翻译:我们研究了牛顿实验风险最小化方法的变式, 即优化算法的每次迭代时, 我们用现有文献中关于稳健平均估计多变量数据的现有文献中的稳健估计值来取代目标函数的梯度和赫西恩。 在证明了关于连续迭代与人口层最小化器周围小球相融合的一般理论之后, 我们研究了我们理论在通用线性模型中的影响, 当数据来自Huber的 epsilon- contaclation 模型和/或重尾细分布时, 我们又提出了一种基于共振梯度法获得稳健牛顿方向的算法, 这可能更适合高维度设置, 并且提供了由此产生的算法趋同的假设。 与普拉萨德等人( 2020年) 提出的稳健的梯度下降算法相比, 我们的算法具有较快的趋同率, 连续迭代数往往是通过二次测算法就 convex问题( e. dectracticl) 产生的。 我们还提议, 在最佳的邻区段内, 可以通过回溯测法选择一个步骤。