In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i) computational cost; (ii) memory overhead. In particular, the memory issue precludes a large volume of prior algorithms that are based on batch optimization technique. To remedy the problem, we propose to learn sparse models such as Lasso in an online manner where in each iteration, only one randomly chosen sample is revealed to update a sparse iterate. Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient. Perhaps amazingly, we find that with the same parameter, sparsity promoted by batch methods is not preserved in online fashion. We analyze such interesting phenomenon and illustrate some effective variants including mini-batch methods and a hard thresholding based stochastic gradient algorithm. Extensive experiments are carried out on a public dataset which supports our findings and algorithms.
翻译:在这项工作中,我们考虑在大规模环境下学习稀少的模型,其中样本数量和特征维度可以增长成千上百万或数十亿。在这种具有挑战性的假设下,出现了两个直接的问题:(一) 计算成本;(二) 记忆管理。特别是,记忆问题排除了大量先前基于批量优化技术的算法。为了纠正这一问题,我们提议以在线方式学习诸如Lasso等稀少的模型,在每次迭代中,只透露一个随机选择的样本以更新一个稀有的迭代。因此,记忆成本独立于样本大小和一个样本的梯度评估是有效的。也许令人惊讶的是,我们发现用同样的参数,批量方法推动的宽度没有以在线方式保存。我们分析这种有趣的现象,并展示一些有效的变种,包括微型批量方法和一个硬阈值的斜度梯度算法。在支持我们发现和算法的公共数据集上进行了广泛的实验。