协调减少DC最小化的源码方法 (Coordinate Descent Methods for DC Minimization)

Difference-of-Convex (DC) minimization, referring to the problem of minimizing the difference of two convex functions, has been found rich applications in statistical learning and studied extensively for decades. However, existing methods are primarily based on multi-stage convex relaxation, only leading to weak optimality of critical points. This paper proposes a coordinate descent method for minimizing DC functions based on sequential nonconvex approximation. Our approach iteratively solves a nonconvex one-dimensional subproblem globally, and it is guaranteed to converge to a coordinate-wise stationary point. We prove that this new optimality condition is always stronger than the critical point condition and the directional point condition when the objective function is weakly convex. For comparisons, we also include a naive variant of coordinate descent methods based on sequential convex approximation in our study. When the objective function satisfies an additional regularity condition called \emph{sharpness}, coordinate descent methods with an appropriate initialization converge \emph{linearly} to the optimal solution set. Also, for many applications of interest, we show that the nonconvex one-dimensional subproblem can be computed exactly and efficiently using a breakpoint searching method. We present some discussions and extensions of our proposed method. Finally, we have conducted extensive experiments on several statistical learning tasks to show the superiority of our approach. Keywords: Coordinate Descent, DC Minimization, DC Programming, Difference-of-Convex Programs, Nonconvex Optimization, Sparse Optimization, Binary Optimization.

翻译：Convex (DC) 最小化差异, 指将两个 convex 函数的区别最小化的问题, 在统计学习中发现有丰富的应用,并进行了数十年的广泛研究。但是, 现有方法主要基于多阶段 convex 放松, 只导致临界点的最佳性弱。本文提出一个协调的下降方法, 以连续的非 convex 近似为基础, 将DC 函数最小化。我们的方法反复解决一个非 convex 单维次问题, 并且保证会与一个协调的固定点汇合。此外, 我们证明, 在目标函数弱化时, 新的最佳性条件总是比关键点条件和方向性点条件强。为了比较, 我们还在研究中包含一个基于顺序的 convex 近似的基点的协调性下降方法。当目标函数满足一个额外的常规性条件, 称为 \emph{sharrp}, 将下降方法与一个适当的初始化方法相协调。另外, 对于许多应用来说, 我们展示了非convex 的底平级方案的升级方法。

相关内容

坐标下降

关注 0

坐标下降法（coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系.

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日