Distributed sparse learning for high dimensional parameters has attached vast attentions due to its wide application in prediction and classification in diverse fields of machine learning. Existing distributed sparse regression usually takes an average way to ensemble the local results produced by distributed machines, which enjoys low communication cost but is statistical inefficient. To address this problem, we proposed a new Weighted AVerage Estimate (WAVE) for high-dimensional regressions. The WAVE is a solution to a weighted least-square loss with an adaptive $L_1$ penalty, in which the $L_1$ penalty controls the sparsity and the weight promotes the statistical efficiency. It can not only achieve a balance between the statistical and communication efficiency, but also reach a faster rate than the average estimate with a very low communication cost, requiring the local machines delivering two vectors to the master merely. The consistency of parameter estimation and model selection is also provided, which guarantees the safety of using WAVE in the distributed system. The consistency also provides a way to make hypothisis testing on the parameter. Moreover, WAVE is robust to the heterogeneous distributed samples with varied mean and covariance across machines, which has been verified by the asymptotic normality under such conditions. Other competitors, however, do not own this property. The effectiveness of WAVE is further illustrated by extensive numerical studies and real data analyses.
翻译:由于在机器学习的不同领域广泛应用高维参数的预测和分类方法,分散的低位回归通常以平均的方式将分布式机器产生的当地结果合并起来,这些机器的通信成本低,但统计效率低。为了解决这个问题,我们提议为高维回归提供一个新的加权AVerage AVER(WAVE) 。WAVE是加权最低方损失的一种解决办法,其调整性罚款为1美元。在这种办法中,1美元罚款控制着空间和重量提高统计效率。WAVE不仅能够实现统计与通信效率之间的平衡,而且比平均估计速度要快,通信成本也非常低,只需要当地机器将两个矢量只交给主人。还提供参数估计和模型选择的一致性,保证在分布式系统中使用WAVE的安全性。这种一致性也提供了一个对参数进行假设性测试的途径。此外,WAVE还能够以不同平均和真实性的方式对分布式样本进行更稳健。它不仅能够实现统计效率的平衡,而且通信效率也比平均和真实性平均估计速度要高于平均和真实性。