We consider standard gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimising a penalised ridge criterion in linear regression. While it is well known that conjugate gradients exhibit fast numerical convergence, the statistical properties of their iterates are more difficult to assess due to inherent non-linearities and dependencies. On the other hand, standard gradient flow is a linear method with well-known regularising properties when stopped early. By an explicit non-standard error decomposition we are able to bound the prediction error for conjugate gradient iterates by a corresponding prediction error of gradient flow at transformed iteration indices. This way, the risk along the entire regularisation path of conjugate gradient iterations can be compared to that for regularisation paths of standard linear methods like gradient flow and ridge regression. In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor. Numerical examples show the similarity of the regularisation paths in practice.
翻译:我们考虑将标准梯度下降、梯度流和共轭梯度作为线性回归中最小化惩罚性岭准则的迭代算法。尽管共轭梯度以其快速的数值收敛性而广为人知,但由于其固有的非线性和依赖性,其迭代的统计特性较难评估。另一方面,标准梯度流是一种线性方法,在提前停止时具有众所周知的正则化特性。通过一种显式的非标准误差分解,我们能够将共轭梯度迭代的预测误差,以梯度流在变换后的迭代索引处的相应预测误差为界。通过这种方式,共轭梯度迭代整个正则化路径上的风险,可以与标准线性方法(如梯度流和岭回归)的正则化路径风险进行比较。特别地,最优共轭梯度迭代与梯度流和岭回归最优解具有相同的优化特性,仅相差一个常数因子。数值示例展示了实践中正则化路径的相似性。