Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of an SDP feasibility problem and a Newton update using the Nesterov-Todd direction, respectively. Finally, we extend the optimal diagonal preconditioning algorithm to an adaptive setting and compare its empirical performance at reducing the condition number and speeding up convergence for regression and classification problems with that of another adaptive preconditioning technique, namely batch normalization, that is essential in training machine learning models.
翻译:在优化和机器学习方面,对等先决条件一直是一种主机技术,它常常减少设计的条件数或赫森矩阵,从而加速趋同。然而,对各种对等先决条件程序如何改善先决条件矩阵的条件数以及如何将其转化为优化的精确分析是罕见的。在本文件中,我们首先根据柱形标准偏差及其使用随机矩阵理论对条件数的影响,对流行的对等先决条件技术进行分析。然后我们确定一类设计矩阵,其条件数可因这一程序而大大降低。然后我们研究最佳的对等先决条件问题,以改善任何全层矩阵的条件数,并提供双剖面算法和可能的削减算法,使用美元(grac{{1unpsilon})进行精度改进。我们首先分析一种基于柱形标准偏差及其对条件数的影响。我们用Nessterov-Todd方向分别进行牛顿更新。最后,我们将最佳的对等式假设算法扩展为适应性设置和比较其实验性业绩的问题,即降低基本条件的升级率和机级的升级率,即降低基本条件的升级率和机级,即加速的升级的升级的进度。