It might be inadequate for the line search technique for Newton's method to use only one floating point number. A column vector of the same size as the gradient might be better than a mere float number to accelerate each of the gradient elements with different rates. Moreover, a square matrix of the same order as the Hessian matrix might be helpful to correct the Hessian matrix. Chiang applied something between a column vector and a square matrix, namely a diagonal matrix, to accelerate the gradient and further proposed a faster gradient variant called quadratic gradient. In this paper, we present a new way to build a new version of the quadratic gradient. This new quadratic gradient doesn't satisfy the convergence conditions of the fixed Hessian Newton's method. However, experimental results show that it sometimes has a better performance than the original one in convergence rate. Also, Chiang speculates that there might be a relation between the Hessian matrix and the learning rate for the first-order gradient descent method. We prove that the floating number $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ can be a good learning rate of the gradient methods, where $\epsilon$ is a number to avoid division by zero and $\lambda_i$ the eigenvalues of the Hessian matrix.
翻译:在牛顿迭代法的线搜索技术中,仅使用一个浮点数可能是不够的。一个与梯度相同大小的列向量可能比仅使用一个浮点数更好,加速每个梯度元素的速度并使用不同的速率。此外,一个与海森矩阵相同阶数的方阵可能有助于校正海森矩阵。Chiang使用了介于列向量和方阵之间的东西,即对角矩阵,来加速梯度,并进一步提出了一个更快的梯度变体“二次梯度”。在本文中,我们提出了一种构建新版本二次梯度的新方法。这种新的二次梯度不满足固定海森牛顿法的收敛条件。但是,实验结果表明,它有时比原来的性能更好。此外,Chiang推测海森矩阵与一阶梯度下降方法的学习率之间可能存在关系。我们证明了浮点数$\frac{1}{\epsilon + max \{| \lambda_i | \}}$可以是梯度方法的良好学习率,其中$\epsilon$是一个避免除以零的数字,$\lambda_i$是海森矩阵的特征值。