Difference-of-Convex (DC) minimization, referring to the problem of minimizing the difference of two convex functions, has been found rich applications in statistical learning and studied extensively for decades. However, existing methods are primarily based on multi-stage convex relaxation, only leading to weak optimality of critical points. This paper proposes a coordinate descent method for minimizing DC functions based on sequential nonconvex approximation. Our approach iteratively solves a nonconvex one-dimensional subproblem globally, and it is guaranteed to converge to a coordinate-wise stationary point. We prove that this new optimality condition is always stronger than the critical point condition and the directional point condition when the objective function is weakly convex. For comparisons, we also include a naive variant of coordinate descent methods based on sequential convex approximation in our study. When the objective function satisfies an additional regularity condition called \textit{sharpness}, coordinate descent methods with an appropriate initialization converge \textit{linearly} to the optimal solution set. Also, for many applications of interest, we show that the nonconvex one-dimensional subproblem can be computed exactly and efficiently using a breakpoint searching method. Finally, we have conducted extensive experiments on several statistical learning tasks to show the superiority of our approach. Keywords: Coordinate Descent, DC Minimization, DC Programming, Difference-of-Convex Programs, Nonconvex Optimization, Sparse Optimization, Binary Optimization.
翻译:Convex (DC) 差异最小化, 指将两个 convex 函数的区别最小化的问题, 在统计学习中发现有丰富的应用,并进行了数十年的广泛研究。 但是, 现有方法主要基于多阶段 convex 放松, 只能导致临界点的最佳性弱。 本文建议了一种协调的下降方法, 以连续的非 convex 近似为基础, 将DC 函数最小化。 我们的方法反复解决了一个非 convex 单维次问题, 并且保证它会汇集到一个协调的 slipy 固定点。 我们还证明, 当目标功能满足一个额外的常规性条件, 称为\ textit{sharrpness} 时, 这个新的优化性条件总是比关键点条件和方向性点更强, 当目标函数为弱化时, 我们发现, 当目标函数的连接点和方向性点条件总是比关键点条件和方向性点条件更强。 为了比较, 我们还包括一个天真的协调的下降性实验方法。 当目标函数满足了我们 的直径直径直线性 。