Distributed statistical learning is a common strategy for handling massive data where we divide the learning task into multiple local machines and aggregate the results afterward. However, most existing work considers the case where the samples are divided. In this work, we propose a new algorithm, DDAC-SpAM, that divides features under the high-dimensional sparse additive model. The new algorithm contains three steps: divide, decorrelate, and conquer. We show that after the decorrelation operation, every local estimator can recover the sparsity pattern for each additive component consistently without imposing strict constraints to the correlation structure among variables. Theoretical analysis of the aggregated estimator and empirical results on synthetic and real data illustrate that the DDAC-SpAM algorithm is effective and competitive in fitting sparse additive models.
翻译:分散的统计学习是处理大量数据的共同战略,我们将学习任务分成多部本地机器,然后汇总结果。然而,大多数现有工作都考虑样本分开的情况。在这项工作中,我们建议采用新的算法DDAC-SpAM,在高维稀疏添加型模式下划分特征。新的算法包含三个步骤:分裂、装饰和征服。我们显示,在装饰操作后,每个本地估计器都可以持续恢复每个添加成分的弥散模式,而不对变量的关联结构施加严格的限制。对合成和真实数据汇总估计器的理论分析和经验结果表明,DDAC-SpAM算法在适合稀有添加型的模型中是有效和有竞争力的。