We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale machine learning, it is well-known for possibly experiencing slow convergence rates due to the high variance from uniform sampling. On the other hand, importance sampling can significantly decrease the variance but is usually difficult to implement because computing the sampling probabilities requires additional passes over the data, in which case standard gradient descent (GD) could be used instead. In this paper, we introduce an algorithm that approximately samples $T$ gradients of dimension $d$ from nearly the optimal importance sampling distribution for a robust regression problem over $n$ rows. Thus our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data. Our techniques also extend to performing importance sampling for second-order optimization.
翻译:我们采用数据结构,通过抽样梯度(概率与其规范成正比),即重要取样,解决稳健的回归。虽然SGD被广泛用于大规模机器学习,但众所周知,由于统一取样的差别很大,它可能出现缓慢的趋同率。另一方面,重要取样可以显著减少差异,但通常难以执行,因为计算抽样概率需要额外通过数据,在这种情况下,可以使用标准梯度下降(GD)作为替代。在本文中,我们采用了一种算法,从近乎最佳重要性取样分布中提取约$T的维度梯度($d)的样本,以便在一行以上出现稳健的回归问题。因此,我们的算法在使用亚线性空间的同时有效地运行价值取样的SGD级梯级梯级梯级梯($T)步骤,同时对数据进行单次传输。我们的技术还扩大到进行第二级优化的重要取样。