Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space. In this paper, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed \emph{Low-Rank Gradient Descent} (LRGD) algorithm finds an $\epsilon$-approximate stationary point of a $p$-dimensional function by first identifying $r \leq p$ significant directions, and then estimating the true $p$-dimensional gradient at every iteration by computing directional derivatives only along those $r$ directions. We establish that the "directional oracle complexities" of LRGD for strongly convex and non-convex objective functions are $\mathcal{O}(r \log(1/\epsilon) + rp)$ and $\mathcal{O}(r/\epsilon^2 + rp)$, respectively. When $r \ll p$, these complexities are smaller than the known complexities of $\mathcal{O}(p \log(1/\epsilon))$ and $\mathcal{O}(p/\epsilon^2)$ of {\gd} in the strongly convex and non-convex settings, respectively. Thus, LRGD significantly reduces the computational cost of gradient-based methods for sufficiently low-rank functions. In the course of our analysis, we also formally define and characterize the classes of exact and approximately low-rank functions.
翻译:最近的几项实证研究表明, 重要的机器学习任务, 例如培训深神经网络, 呈现低级结构, 损失函数在输入空间的几个方向上差异很大。 在本文中, 我们利用这种低级结构来降低基于罐体梯度的方法, 如梯度下移( GD ) 的高计算成本 。 我们提议的 LGD 算法为 $\ mathcal{ (r\ log (1/\ epsilon) + rp), 以美元为基值的固定点, 首先是识别$r\leq p 的重要方向, 然后在每次循环中通过计算方向衍生物来估算真正的美元基值。 我们确定, “ LGD 的“ 方向或触地复杂度”, 强烈的 convex 和非convex 目标函数是 {O} (rlog) (rlogy (rlog) rlogy) 和 ral_ ral_ ral_ rx 。 当我们所知道的 rx 的 ral_ rx ral 的 ral 的 和 rx rx 的 rx 的 的 ral ral 的 。 ral 和 ral r) 的 。