Nesterov's accelerated gradient (AG) is a popular technique to optimize objective functions comprising two components: a convex loss and a penalty function. While AG methods perform well for convex penalties, such as the LASSO, convergence issues may arise when it is applied to nonconvex penalties, such as SCAD. A recent proposal generalizes Nesterov's AG method to the nonconvex setting. The proposed algorithm requires specification of several hyperparameters for its practical application. Aside from some general conditions, there is no explicit rule for selecting the hyperparameters, and how different selection can affect convergence of the algorithm. In this article, we propose a hyperparameter setting based on the complexity upper bound to accelerate convergence, and consider the application of this nonconvex AG algorithm to high-dimensional linear and logistic sparse learning problems. We further establish the rate of convergence and present a simple and useful bound to characterize our proposed optimal damping sequence. Simulation studies show that convergence can be made, on average, considerably faster than that of the conventional proximal gradient algorithm. Our experiments also show that the proposed method generally outperforms the current state-of-the-art methods in terms of signal recovery.
翻译:Nesterov的加速梯度( AG) 是优化由两个组成部分构成的客观功能的一种流行技术: 螺旋损失和惩罚功能。 AG方法在诸如 LASSO 等螺旋惩罚方面效果良好,但在对非convex 惩罚,例如SCAD 适用时,可能会出现趋同问题。 最近的一项提案将Nesterov 的AG方法概括为非convex 设置。 提议的算法要求为实际应用规定若干超参数。 除了某些一般条件外, 在选择超参数和不同的选择会如何影响算法的趋同方面没有明确的规则。 在本条中,我们提议基于复杂性的超参数设置,以加速趋同,并考虑将这种非convex AG算法应用于高尺寸线性和后勤稀少的学习问题。 我们进一步确定趋同率, 并提出一个简单和有用的约束, 以描述我们拟议的最佳牵线序列。 模拟研究显示, 平均而言, 趋同速度可以大大加快, 以及不同的选择会如何影响算法的趋同。 我们的实验还显示, 拟议的方法在目前的状态下, 也显示, 恢复方法一般的信号比目前恢复方法要快。