Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Nevertheless, existing safe screening methods are limited to the sequential setting. In this paper, we propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively, which can achieve significant speedup without any loss of accuracy by simultaneously enjoying the sparsity of the model and dataset. To the best of our knowledge, this is the first work of distributed safe dynamic screening method. Theoretically, we prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely. Finally, extensive experimental results on benchmark datasets confirm the superiority of our proposed method.
翻译:安全筛选是一种常用的方法,通过放弃无系数的不活动特征来加速高维模型。然而,现有的安全筛选方法仅限于顺序设置。在本文件中,我们建议对孔径常规化模型采用一种新的分散式动态安全筛选(DDSS)方法,并将其分别应用于共享和分布式模版结构,通过同时享受模型和数据集的广度,可以大大加快速度,而不会丧失准确性。据我们所知,这是分散式安全动态筛选方法的首次工作。理论上,我们证明拟议方法在总体复杂性较低的情况下实现了线性趋同率,并且几乎可以肯定地消除有限迭代数中几乎所有不活动特征。最后,关于基准数据集的广泛实验结果证实了我们拟议方法的优越性。