Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $\theta$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.
翻译:机器学习中的许多基本问题可以通过 convex program 来制定, 也就是每个$_ i $是一个 convex, Lipschitz 函数在$\\ tal$的子集坐标上支持 lipschitz 函数 。 这个问题的一个常见方法, 以随机梯度下降为示例, 涉及在每次循环中取样一个 $f_ i 术语, 以取得进展。 这个方法关键地依赖于美元美元和美元之间的统一概念, 由它们的条件编号正式捕获 。 在这个工作中, 我们给出一种算法, 将上述的 conexx 公式最小化成 $\ i 美元- i 美元在$\ 美元全局坐标 { O (\ sum_ i= 1\ d\ i log (1 /\ epsilon) 中支持的精确度。 一种常见的方法, 在条件编号上没有假设 。 之前最好的算法是标准的平面值 $_ i$, i 美元, 以其条件值 listal decrial deal termal terminal termational decal cal termation ral ral procal) ral procal procal rocal rocal) roqualxxxxxxxxxxxxxx.