In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton-LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.
翻译:在二阶优化中,潜在的瓶颈可以用来计算每个迭代最优化函数的赫塞然矩阵。随机的草图已经出现,成为构建赫塞然估算值的强大技术,可以用来执行近似牛顿步骤。这涉及随机的草图矩阵的乘法,它引入了草图计算成本与优化算法趋同率之间的权衡。一个理论上可取但实际上过于昂贵的选择是使用一个密集的高斯素描矩阵,它产生对精确牛顿步骤的不偏倚估计,并提供强有力的不依赖问题的趋同保证。我们表明,高斯素描图矩阵可以大幅松动,大大降低草图的计算成本,而不会大大影响其趋同性。这个称为牛顿-LESS的方法基于最近引入的草图技术:Levariage Sparsized (LESS) 嵌入。我们证明,牛顿-LESS拥有与高斯嵌入的随机随机步骤几乎相同的本地趋同率率率率率率。我们显示,高斯座的草图矩阵可以大幅地进行快速的缩化,但最终导致一个固定的平整级的平整。