Deep unfolding is a promising deep-learning technique, whose network architecture is based on expanding the recursive structure of existing iterative algorithms. Although convergence acceleration is a remarkable advantage of deep unfolding, its theoretical aspects have not been revealed yet. The first half of this study details the theoretical analysis of the convergence acceleration in deep-unfolded gradient descent (DUGD) whose trainable parameters are step sizes. We propose a plausible interpretation of the learned step-size parameters in DUGD by introducing the principle of Chebyshev steps derived from Chebyshev polynomials. The use of Chebyshev steps in gradient descent (GD) enables us to bound the spectral radius of a matrix governing the convergence speed of GD, leading to a tight upper bound on the convergence rate. The convergence rate of GD using Chebyshev steps is shown to be asymptotically optimal, although it has no momentum terms. We also show that Chebyshev steps numerically explain the learned step-size parameters in DUGD well. In the second half of the study, %we apply the theory of Chebyshev steps and Chebyshev-periodical successive over-relaxation (Chebyshev-PSOR) is proposed for accelerating linear/nonlinear fixed-point iterations. Theoretical analysis and numerical experiments indicate that Chebyshev-PSOR exhibits significantly faster convergence for various examples such as Jacobi method and proximal gradient methods.
翻译:深层展示是一个很有希望的深层学习技术,其网络架构以扩大现有迭代算法的递归结构为基础。尽管趋同加速是深度发展的一个显著优势,但其理论方面尚未显现。本研究的上半部分详细介绍了对深层未翻的梯度下降(DUGD)加速趋同的理论分析,其可训练参数为步骤大小。我们建议对DUGD中学习的分级幅度参数进行可信的解释,方法是采用Chebyshev 从Chebyshev 多元诺米亚尔中得出的Chebyshev步骤原则。在梯度下降(GD)中使用Chebyshev 步骤,使我们能够将制约GD趋同速度的矩阵的光谱半径约束起来,导致在趋同率上拉紧的上限。GD使用Chebyshev 步骤的趋同率在理论上是最佳的,尽管它没有动力术语。我们还表明Chebyshev 梯度梯度梯度梯度下降(%we) 理论用于Cheby Chewshev 的递归正期间的递增率分析。