In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain their empirical success for large scale problems. In this paper we first demonstrate that already for very simple problems and even when the optimal solution lies on a low-dimensional face of the polytope, such dependence on the dimension cannot be avoided in worst case. We then revisit the addition of a strict complementarity assumption already considered in Wolfe's classical book \cite{Wolfe1970}, and prove that under this condition, the Frank-Wolfe method with away-steps and line-search converges linearly with rate that depends explicitly only on the dimension of the optimal face. We motivate strict complementarity by proving that it implies sparsity-robustness of optimal solutions to noise.
翻译:近年来,实践证明,古典的弗兰克-沃夫算法(又称有条件梯度算法)的简单修改,以对二次曲线和紧凑的多面形进行平稳的二次曲线最小化,与线性速度趋同,假设客观功能具有二次增长特性。然而,这些方法的速率明确取决于问题的方方面面,无法解释其在大规模问题上的经验成功与否。在本文中,我们首先证明,对于非常简单的问题,即使最佳的解决方案在于多面体的低维度,这种对维度的依赖在最坏的情况下是无法避免的。我们接着回顾沃尔夫的经典书\cite{沃尔夫1970}已经考虑过的一个严格的互补假设,并证明在此条件下,弗兰克-沃夫的离步和线搜索方法与纯度的速率相趋同,而这只明显取决于最佳面的维度。我们通过证明它意味着对噪声的最佳解决方案的紧张性-坏坏度来鼓励严格的互补性。