In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the end machines have limited bandwidth and power, and thus a tight communication budget. In this work we focus on distributed learning of a sparse linear regression model, under severe communication constraints. We propose several two round distributed schemes, whose communication per machine is sublinear in the data dimension. In our schemes, individual machines compute debiased lasso estimators, but send to the fusion center only very few values. On the theoretical front, we analyze one of these schemes and prove that with high probability it achieves exact support recovery at low signal to noise ratios, where individual machines fail to recover the support. We show in simulations that our scheme works as well as, and in some cases better, than more communication intensive approaches.
翻译:在多个领域,统计任务在分布式设置中进行,数据分为数台终端机器,这些机器连接到聚合中心。在各种应用中,终端机器的带宽和功率有限,因此通信预算紧张。在这项工作中,我们的重点是在严重的通信限制下,分散学习稀薄的线性回归模型。我们建议了两个回合分布式计划,其每台机器的通信在数据层面是次线性的。在我们的计划中,单个机器计算了脱偏的lasso测量仪,但发送到聚合中心的价值却很少。在理论方面,我们分析了其中的一个计划,并证明极有可能在低的信号到噪声比率的情况下,在单个机器无法恢复支持的情况下,实现准确的支持性恢复。我们在模拟中显示我们的计划是起作用的,有时甚至更好,比通信密集的方法更好。