The Frank-Wolfe algorithm has seen a resurgence in popularity due to its ability to efficiently solve constrained optimization problems in machine learning and high-dimensional statistics. As such, there is much interest in establishing when the algorithm may possess a "linear" $O(\log(1/\epsilon))$ dimension-free iteration complexity comparable to projected gradient descent. In this paper, we provide a general technique for establishing domain specific and easy-to-estimate lower bounds for Frank-Wolfe and its variants using the metric entropy of the domain. Most notably, we show that a dimension-free linear upper bound must fail not only in the worst case, but in the \emph{average case}: for a Gaussian or spherical random polytope in $\mathbb{R}^d$ with $\mathrm{poly}(d)$ vertices, Frank-Wolfe requires up to $\tilde\Omega(d)$ iterations to achieve a $O(1/d)$ error bound, with high probability. We also establish this phenomenon for the nuclear norm ball. The link with metric entropy also has interesting positive implications for conditional gradient algorithms in statistics, such as gradient boosting and matching pursuit. In particular, we show that it is possible to extract fast-decaying upper bounds on the excess risk directly from an analysis of the underlying optimization procedure.
翻译:Frank- Wolfe 算法由于能够有效解决机器学习和高维统计数据中限制优化的问题,因此出现了受欢迎程度的回升。 因此,人们非常有兴趣确定算法何时可能拥有“线性”$O( log( 1/\ epsilon) 美元) 的无维迭变异复杂性, 与预测的梯度下降相仿。 在本文中, 我们提供了一个通用技术, 用于确定 Frank- Wolfe 及其变体的域域特定和容易估计的下界, 使用域域内的立体。 最显著的是, 我们显示, 不仅在最坏的情况下, 而且在\ emph{ 平均案例 } 中, 无维度线性线性线性线性约束必须失败 : 对于 $\ maysb{ 美元 (R ⁇ d) 的高或 球性随机的多频性多频性多频性多频性多频性, 和 $\ mathrm{poly} (d), Frank- Wolfefe, ladelex relaction to laction to lax lax supduful roduction the expetion expecition exption expetions.